How Magic a Bullet Is Machine Learning for Credit Analysis? An Exploration with FinTech Lending Data
Advocates of FinTech lending argue that it enables lenders to predict loan outcomes more accurately by employing complex analytical tools, such as machine learning (ML) methods. The authors of this paper apply ML methods, specifically random forests and stochastic gradient boosting, to loan-level data from the largest FinTech lender to assess whether these methods produce predictions of default on future loans that are substantially more accurate than the predictions of standard regression models.
This study also examines which input variables are influential across different models and offers an intuitive presentation of their relationship with the loan outcome according to the ML models. In addition, it investigates whether having more data—additional observations and additional input variables—helps the ML methods more than the regular regression models. This paper also explores if the use of ML methods tends to produce more accurate or more favorable ratings of borrowers who have specific combinations of observed characteristics or are from locales with better or worse economic conditions.
Key Findings
- The machine learning (ML) methods prove superior to the benchmark logistic model more so in their ability to separate defaulted loans from the rest of the loans through ordinal ranking than in the accuracy of their numerical predictions of the probability of default.
- The ML models improve default predictions much more for out-of-sample loans originated around the same time than for loans originated in later months due to data drift and model drift, that is, the changing distribution of the variables over time and their relationship with the default outcome.
- At a far enough horizon, due to data and model drift, the logistic model’s out-of-sample predictions outperform those of the ML models, indicating that caution should be exercised when applying ML methods, especially over a business cycle.
- All the empirical models consistently find that ex post economic conditions faced by borrowers are among the most important variables in explaining the default outcome.
- Across the parametric and ML models, a similar handful of variables (including ex post economic indicators, risk score, number of recent inquiries, and debt-to-income ratio) are consistently found to be key determinants of default.
- More observations help all the models predict more accurately, and tend to help the ML models more, but in many subsample periods, the ML methods’ performance relative to that of the logistic model peaks at about one or a few thousand observations and then diminishes.
- Adding borrower-specific variables, some of which are just moderately informative, mildly improves the ML methods’ relative prediction performance. This suggests that unconventional data are most likely to produce meaningfully more accurate credit ratings only for consumers with little or no credit history, because such data substitute for the absence of the more informative credit variables.
- The study finds little statistically significant evidence that compared with the logistic model, the ML methods generate more accurate predictions of default for subgroups of borrowers categorized by their risk attributes, income, or where they live.
- The authors find suggestive evidence that compared with the standard regression model, the ML methods tend to predict slightly lower estimates of default probability for borrowers with better values for the typical ex ante indicators of default risk, such as a higher credit score or a lower debt-to-income ratio.
- It appears that the ML methods are unlikely to cause a notable disadvantage for borrowers with prime or near-prime credit scores.
Implications
Although the paper’s estimates detect only a moderate gain in the accuracy of default predictions from using ML methods, these findings may be because the data employed in the study are limited to a set of standard credit variables. With access to the additional credit-relevant variables that are available to lenders, such as the payment history on past FinTech loans or more timely loan-repayment information, this paper’s ML models may be able to produce predictions that are substantially more accurate than those of the regular regression models.
Abstract
FinTech online lending to consumers has grown rapidly in the post-crisis era. As argued by its advocates, one key advantage of FinTech lending is that lenders can predict loan outcomes more accurately by employing complex analytical tools, such as machine learning (ML) methods. This study applies ML methods, in particular random forests and stochastic gradient boosting, to loan-level data from the largest FinTech lender of personal loans to assess the extent to which those methods can produce more accurate out-of-sample predictions of default on future loans relative to standard regression models. To explain loan outcomes, this analysis accounts for the economic conditions faced by a borrower after origination, which are typically absent from other ML studies of default. For the given data, the ML methods indeed improve prediction accuracy, but more so over the near horizon than beyond a year. This study then shows that having more data up to, but not beyond, a certain quantity enhances the predictive accuracy of the ML methods relative to that of parametric models. The likely explanation is that there has been data or model drift over time, so that methods that fit more complex models with more data can in fact suffer greater out-of-sample misses. Prediction accuracy rises, but only marginally, with additional standard credit variables beyond the core set, suggesting that unconventional data need to be sufficiently informative as a whole to help consumers with little or no credit history. This study further explores whether the greater functional flexibility of ML methods yields unequal benefit to consumers with different attributes or who reside in locales with varying economic conditions. It finds that the ML methods produce more favorable ratings for different groups of consumers, although those already deemed less risky seem to benefit slightly more on balance.