In a typical linear regression model, you would regress something like yt=b1+b2*xt+e (t is a subscript and e represents error). Many are aware of this concept. However, there are many assumptions that must be made in order to get the math to be correct in OLS. Due to these assumptions, such as constant variance and normality of residual errors, if you yt dependent variable consists of only 0 and 1 or is truncated by any other means, then you will have misspecified your model. Given some series xt and b1 and b2, it is possible to predict yt much larger than 1 or smaller than 0. Two questions arise, how do you correct for this and why does it matter?
The main method to correct for truncated variables is to use a probit or logit model. The concept behind probit and logit is essentially the same except that they use different means to make a correction. In a probit model, instead of estimating the equation above that is yt=b1+b2*xt, you would estimate yt=phi(c1+c2*xt) where phi is the cumulative normal distribution. Since the cumulative normal distribution is bound by 0 to 1, then any values that are chosen within the parentheses can only predict values of y between 0 and 1. The logit model is similar, but instead of weighing by the normal distribution, it uses a calculation based on e^x and natural logs.
And why does it matter? These models are meant to estimate the likelihood that events will take place or not (this is the 0 and 1). The benefit of the normal distribution is that you're actually estimating the probability that a particular event will take place given your independent variables. I can think of two uses of an econometric model, to explain the past and explain the future. Econometric models have their uses in the former, but many times drastically fail in the latter. It is possible that the models are misspecified or that fundamental relationships are not merely absent, but are immeasurable. Furthermore, the models may not be stable which can cause forecasting errors to be too large to be useful. However, I find probit models to be a useful alternative to typical forecasting when events can be categorized as binary.
My primary use of the probit models follows in the footsteps of Wright (06-07) by examining the NBER dataset of business cycle dates. Wright uses a binary variable equal to 1 if there would be a recession within the next x months and 0 otherwise. Originally in my research, I attempted multiple lags forward such as 6,12,16 months ahead, but I eventually focused in on the 12 month as a base to compare alternative models. I continued to investigate additional variables and have settled on a fair group of variables that outperforms Wright's variables while maintaining and important economic significance of each. I continued to use the Fed Funds and the spread of long-term treasuries over the effective fed funds rate. I added the spread of the Fed Funds relative to BAA corporate securities adding a factor that includes the markets tolerance for risky debt (TED isn't as strong as this one) as well as measure of volatility of the S&P500 (% of days in last quarter with a change greater than or less than 1.25%). Finally, I added the most important component which is the money supply measure. Following Paul Kasriel (originally testing Mish's MPrime), I use the long-term change in real monetary base (bank reserves +currency divided by CPI). Positive values of this indicator represent expansionary monetary policy while negative shows that the Fed is tightening the money supply. As Mr. Kasriel notes, this normally happens to curb an expansion and combined with an inverted yield curve represents a particularly powerful indicator of a recession.
Over the 40 years tested (had to exclude the recent ones since no NBER dating exists for the current financial crisis), these variables in a probit model correctly classify 91% of months as whether they are in a recession or not (defined by above 50%). It's recent track record is that the estimated probability of a recession in the next 12 months briefly went above 75% back in 2006 and then came up above 75% again in June of 2007. Nevertheless, I would use a figure above 50% to definitely eliminate any leverage in U.S. equities and possibly cut down size (and an increasing trend in general as a sign to cut down) , above 75% means "Sell Mortimer!" The high values in June 2007 were driven by real monetary base values that were historically low. In previous episodes it barely crosses into negative values, but it has stayed essentially flat since then.
I have been mulling over the implications of this model for several months. In particular, the model estimates the probability of a recession, but the probabilities increase during all financial crises. For example, in times of large volatility or greater credit risk (flight to safety) or future rate cuts, this number increases. I wondered whether it would be appropriate to test times of financial crisis and recession using this model (how to measure, but at least my false positive in 1998 would go away) . However, even if this financial crisis is not called a recession, I wouldn't say this model failed because it was predicting one. A serious financial crisis occurred and a significant stock market correction. Using this model to avoid those situations is much more important than the classification.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment