Why Didn't My Response Model Hold Up in Roll-Out?
By Perry D. Drake

The following article, written by Perry D. Drake, appeared in Inside Direct Mail,
May 2000. It provides marketers with valuable information regarding the build of stable response models.


If you recently built or had built a response model but it failed to perform in roll-out as anticipated, don't feel alone. You are one of many in the same circumstances. Don't give up. Most likely, there is an "easy fix" for what you may have done wrong.

The success or failure of a model has nothing to do with whether it is built on an internal or external file or if it is built to predict response or performance. It has more to do with certain rules being followed. If certain rules and guidelines of model building are not followed, you will fail regardless of the model type. It is that simple.

Following are six guidelines you can use to ensure stability when building any model. I call these 6 guidelines "Modeling with MUSCLE." Adhering to the MUSCLE modeling guidelines will guarantee your models to be robust and hold-up when applied in roll-out.

This article does not delve into building strong response models but rather building a response model to yield gains as forecasted (weak or strong). Tips on maximizing gains is a topic for a future article. It is important to realize that these are two separate issues.

M aterials Treatment Consistency Between Test and Roll-out
If you drastically change your offer or creative approach between test and roll-out, the responders identified in your model will no longer hold true in roll-out. For example, a model built predicting the customer likely to respond to a "soft" risk free offer will differ from a model built for the same product without the risk free offer. A model built on a "soft" offer may identify marginally performing names that are attracted to the risk free terms.

Changes in a creative approach can also cause a subtle shift in the composition of responders. Keep in mind when building a response model, you are not only modeling responders to your product but also the product offer (price, terms, etc.).

U niverse Application
When you build a response model, keep in mind that you are building it on a sample taken from a particular universe of names. Therefore, your forecast based on this response model will only be valid when applied to the same universe of customers the model was built on. For example, you cannot build a model on an entire universe of names, apply the model to only those names within that universe who are over the age of 30 and expect the same forecasted gains. The model will not hold up as originally forecasted.

To guarantee your universe definitions are consistent between test and roll-out, always check to ensure the names your programmer or list shop is pulling meet your specifications. To verify consistency between the names defined at test and at roll-out, compare the distribution of regression model scores. For example, if 10% of the test names the model was built on have a score above .2576, you should expect close to the same percent of names scoring above .2576 at roll-out. If not, you have definitional inconsistencies.

In addition, be aware that your outside list universes may change between test and roll-out without your knowledge. This will be caused by changes in the way a list owner builds their file. If the names on a list you regularly rent were obtained by the list owner in a different way this month versus last month (e.g. a new offer), you can expect the composition of these names to change, resulting in a difference in response to your promotions. Since you have no say in how list owners obtain their customers via the "offer," the best you can do is to stay informed regarding any changes in the list owner's promotional strategies. If you notice a major change in a list owner's offer, you may want to consider rebuilding a new response model or, at the very least, adjusting your forecast. This also applies for compiled lists.

S plit the Sample for Validation
Before building the model, a portion of the sample should be set aside and used to validate the model. This is often called the validation, hold-out or calibration sample. A validation sample allows you to test the model for validity prior to roll-out. Remember, your sample is just that…a sample; and, as such, there will be a certain amount of error associated with the findings. The gains forecasted from the sample the model was built on will tend to over-predict. The degree to which a model will over-predict for a particular segment of names is based on the level of experience the analyst has in modeling that segment for product offers. Scoring the validation sample on the final model will reveal gains more in line with what you can expect in roll-out.

In particular, you will find that the predictive power of the model will lessen slightly based on the validation sample. This is to be expected and is quite acceptable. However, if you find the predicted gains fall by more than 10% from analysis to validation for those names scoring in the top 10%, there is a strong likelihood that the model is unstable. If this is the case, you are strongly advised to re-evaluate the model for problems with multicollinearity and/or insignificant variables.

The importance of a validation sample cannot be stressed enough especially when an analyst has limited experience in modeling a particular product, list or offer. If your budget does not allow for test samples large enough to be split into one for analysis and another for validation, consider boot-strapping as an alternative. But remember, if you really want to give regression modeling a fair chance, do it right - test enough names.

C orrelation Assessment
The major assumption of concern when building a multiple regression model is the assumption of independence. For the responder model to be valid and stable, each predictor variable used in the model must be independent of one another. In other words, each predictor variable in the final model must not exhibit any significant correlation with each other and, therefore, must measure something unique about the customer. If some of the predictor variables in the final model are correlated with one another, the model will not work properly.

Models with such problems are said to exhibit signs of multicollinearity. In such a case, the coefficients or weights associated with the correlated variables may go in the opposite direction from what you would expect. Consider this example: You build a model to predict a customer's likelihood to pay for a product and include both "Household Income" and "Home Value" variables as positive predictors of payment. Since both variables can be considered measures of "household wealth," they are most likely correlated with one another. As such, the final model could very well exhibit signs of multicollinearity. What does this mean? It means that a negative coefficient as opposed to a positive coefficient may be associated with one of these two data elements in the final model. This would of course be incorrect since you would expect the higher a customer's income or the higher a customer's home value, the more likely they are to pay for the product ordered - implying a positive relationship, not negative.

There are many ways to resolve issues of multicollinearity. One option is to is delete one of the correlated data elements from the model. Another approach is to combine the like data elements into a single data element.

L ift and Freeze Customer Attributes at Point-in-Time of the Promotion

When you select the test sample, make certain you also lift the characteristics of the names at the same point-in-time and create a separate file called the "frozen file." When the responses to the test come back in-house, update the "frozen file" only with this information. Do not update the characteristics. The model you build should be based on the customers' characteristics at the time you sent the promotion. Believe it or not, some direct marketers examine test results based on the customers' characteristics at the time of the analysis vs the time of the promotion. If point-in-time analysis is not used, your model will not hold up in roll-out regardless of strict adherence to the other guidelines mentioned in this article.

As an extreme example, suppose you are promoting a certain list of names on your database for a product that only has appeal to couples without children. Assume that between the time of the test and the analysis, many of the couples on this list became new parents. If you analyze the results of the test based on today's customer file, you will be misled into believing that being childless is not that important in distinguishing responders from non-responders (since some who ordered appear to have children). The problem is they did not have children at the time their purchase decision was made. Be careful!

E xamine the P-values When building a model on a test sample, keep in mind it is a sample and, as such, will yield a certain amount of error variance associated with each variable used in the final model. The error variance associated with some of the variables in the model may be so great that it's associated coefficient or weight is no different from a value of "zero." What this means is that the variable is not really adding anything to the model. P-values associated with each variable will tell you just how significant each variable is in the final model. The lower the p-value the more significant that variable is in adding value to the overall strength of the model. A general rule of thumb is that if a variable has a p-value over 10%, you can assume that the variable is not adding anything to the model and you should delete it. P-values are easily denoted on any regression output (even in Microsoft ExcelTM).

Strict adherence to the MUSCLE guideline principles of modeling will guarantee your model will hold up as forecasted in roll-out. Of course this assumes occurrences outside of your control do not impact your ability to achieve the forecasted gains. What do I mean by that? Consider the following example: Suppose your final response model identified Florida residents as a prime target area and as such had a high positive coefficient/weight associated with this region. A devastating hurricane in Florida prior to delivery of the promotion will cause your model to partially fail in roll-out. These Florida names will no longer perform as expected. Luckily, these exceptions are far and few between.

If your model did not hold up as forecasted, odds are you did something wrong: Either the model built used unstable predictors, the universe changed between test and roll-out, some of the predictor variables were correlated with one another, the characteristics upon which the model was based did not reflect the customer's status at the time of the promotion, or you lacked a validation sample upon which to check your model and develop more accurate forecasted gains. Remember, MUSCLE matters when it comes to building a response model. In order to give regression modeling a fair chance of working for your organization, apply the "MUSCLE Modeling" principles.



Return to Listing of All Articles     Return to Home Page