Viewpoints
Table of Contents
Modeling Mortgage Credit – Issues, Challenges and Thoughts on Future Models
What we have learned
A model can hurt a business just as much as it can benefit a business – if not more. Recently, many have come to understand this in a hard way.
Blindly relying on a model without questioning its validity or denigrating a model when it produces unsatisfactory results are two very common reactions. While these reactions are at the opposite extremes, they both come from the same misguided notion: that a model will/should always produce correct projections.
The collapse of the world financial market starting in 2007 has made virtually all financial models (many of which were working well at one point) obsolete. Naturally, this raises the question of whether these models are useful anymore. Perhaps what is more important for us is that we consider the following questions: What is the purpose of having a model? What are the issues and problems facing modelers in current practice? And, how can credit models help the financial industry once again?
To answer these questions, we have to understand what a credit model really does, what we should expect from a model and how to appropriately apply the model.
Mortgage Credit Models
Generally speaking, just like many other models, a mortgage credit model does two things:
- Explanation of the past credit performance and
- Prediction of the future credit performance.
For business applications, 2 is the ultimate purpose while 1 is to serve 2.
Mortgage credit performance is generally measured in these three aspects:
– Delinquency (Interruption of repayment)
– Default (Termination of a mortgage that is determined to be unrecoverable)
– Loss (Financial loss as a result of default)
While loss is the final measure of the financial consequence of a defaulted loan, delinquency and default have to be studied because of their impact on loss. Unlike mortgage prepayment, which is a single event, credit loss is the result of a process that consists of a series of events. Therefore, it takes multiple models to describe credit loss.
Like many other predictive models, mortgage credit modeling is driven by assumptions. In my opinion, the success of a model is all about making reasonable assumptions, given that the model is correctly implemented. I view empirical data as the most important source for assumptions, and the data does not drive the models alone.
What is OMD?
OMD stands for Observed, Missing and Disruptive. It is probably the most important concept that model developers and model users (CEO, Policy Maker, Investor) need to understand.
The result of almost all non-mechanical events such as human behavior, natural events, social events, and market events, etc., are driven by known factors (predictable) and unknown factors (unpredictable). Unknown factors can be random or non-random (e.g. operational).
Known factors can be easily described by data. When the factor information can be observed and collected, the data for the factor is “Observed.” Otherwise, the data is “Missing.”
For example, on a mortgage loan: the borrower’s down payment percentage, marriage stability, and lifestyle are all known factors that affect the performance of the mortgage loan. Among the three, only the borrower’s down payment percent can be observed and collected, while the other two would not be collectable and are therefore missing.
Unknown factors are the ones that we do not understand or cannot anticipate, such as the COVID-19 pandemic (random) and, consequently, the payment deferral plan by the GSE (operational). They work as “Disruptive” events in the mortgage repayment process that would alter mortgage performance. They are unpredictable. Some of those factors could be difficult to observe or are totally unobservable.
The outcome of almost any non-mechanical event is a combined result of Observed, Missing, and Disruptive, or OMD.
The challenge of modeling these events is having to deal with OMD when there is only O available.
Because of the presence of D, a pure statistical model is destined to fail…. Therefore, do not be too hard on yourself if your model is not working.
Why is your model not working?
“The model is not working!” – It is NORMAL because it is what to be expected!
If your model is working, especially working consistently for a period of time, you are LUCKY! The longer the period is, the luckier you are!
Why so? The most important reason is because you are dealing with OMD!
There are two types of events. One is instant event in which we can see the result almost right away, e.g. coin tossing. The other is time event for which the result won’t come out until after a period of time, e.g. most investments. For time event, obviously, the longer the time, the more uncertain (D – Disruptive) the result. On the other hand, impact of D (Disruptive) is minimal to none for instant event.
For a mortgage loan, performance events are almost all time events. Take early paid-in-full an example; it could any time between the loan is originated to before the loan is matured. Let’s discuss what could make a model go wrong with OMD terminology.
1) O System – Dream System.
In an O system, the data would be complete; meaning all information that related to results is available. There is no missing data (M) or unknown (D-Disruptive) factors in the process. It would be a dream system for a statistician. The typical problem for modeling an event in O System would be over-fit or under-fit. With enough data and validation, an experienced modeler should be able to minimize such problems. But, an O System is an ideal system just for the sake of discussion.
2) OM System – Isolated System.
In this system, there is observed data and missing data. Missing data in a time event are actually consists of both collectable and unknown data.
- Uncollectable – information exists at the time of prediction but cannot be collected. For example, job stability, marriage stability, life style, etc.
- Unknown – information does not exist at the time of projection. For example, interest rate in two years, home price in three years, etc.
In this system, although the data could be missing, the data function is still somewhat understood. In this sense, the OM system can be considered as an isolated system, and therefore it can be described by a statistical model. However, the fact that M is always significant regardless how much data has already been collected make modeling more difficult than expected.
The biggest challenge is the model estimate will always be biased, because of which it is quite normal to see a model runs off track after even a short time period. It would take time and effort to test repeatedly and to review carefully for bias reduction. The most meaningless approach to deal with the bias would be to constantly refit the model.
For a time event, O (Past and Present) is always a subset of the whole OM system (Past, Present and Future). Therefore, whatever developed from O is always biased no matter how complete O is. This is why it is quite typical to see a model run off the rails, even after just a short time period. It would take time and effort to test repeatedly and to review carefully to identify biases. Constantly refitting the model without careful review would only chase the noise but understand the actual bias.
3) OMD System – System of Reality.
OMD is actually the real system to be dealt with. Because of D, it is no longer an isolated system. Therefore it cannot be described by just a statistical model. D (Disruptive) can be categorized to observed and unobserved. A few examples of D are discussed as follows:
- Major Disruptive Event – Financial crisis, pandemic, social unrest, etc. They are generally observable but the impact on the event to be predicted is very difficult to be quantified.
- Human Intervention – For mortgage, servicing policy changes, such as forbearance plan, foreclosure time line, collection strategy, short sale strategy, etc. They can be either observed or unobserved depending on what you can see.
- Environment Changes – Market demand changes, government credit policy changes, etc. They are observable by hard to be quantified. These changes would significantly affect the barrier for refinance, therefore would have big impact on prepayment and default speed.
If you truly understand OMD, you would agree that it is almost impossible to well predict a mortgage event on a consistent basis even though its data has been widely regarded as richest and most standardized data in lending industry.