Model drift can cost you. Just ask Zillow.
AI decision making may be the future for more effective business decisions. But when AI changes how it makes decisions, without anyone knowing – it can get you into a world of trouble. That is what likely happened with Zillow Offers where an AI algorithm overestimated the value of houses.
When Zillow used their AI to determine the price to buy homes, they wound up selling the same houses for less than they paid. This resulted in losses of over $500m within a short period of time.
A notable difference between AI and conventional software (without AI) is that once an AI application goes into production, it can give unexpected results. One reason for this is that AI algorithms typically rely on probabilities to determine a correct answer. As a result, some answers will be wrong.
But that’s only a piece of the story. The moment AI goes into the real-world, it faces situations that are different from training. The AI may or may not be able to generalize to situations it has never encountered. But if the real-world environment the AI uses to make decisions changes in fundamental ways, then the AI decays to a point where it gives many incorrect results.
Model drift (also known as model decay) occurs when the AI produces results that “drifts” from the performance during training. Model drift is not uncommon, happens at any time, and requires monitoring from the start.
Even if Zillow rigorously tested the AI in development, it would not have been enough. During development, the housing market was red-hot, boosted by record-low interest rates and other factors. But significant changes were on the way. When the world changes – for example, changes in home buyer preferences due to the COVID pandemic – the AI may no longer be effective.
Every AI faces a wide array of real-world situations that were not seen during development. Before fielding AI, there are three things to do to help ensure the AI will make effective decisions in complex real-world situations.
The first is to field qualify the AI by testing how well it generalizes. Field testing uses different techniques than AI model validation and independent from the AI development team. Generalizability testing uses data from a different distribution than training data. My previous blog titled “Test your AI for the real world” goes into more details on how to do qualification testing.
Second, implement independent validation processes to evaluate the correctness and quality of the AI. Independent Verification and Validation (IV&V) processes are especially important for mission and life-critical software applications, and the same is true for AI. However, AI validation requires approaches and solutions beyond those used for traditional software. In a separate blog, “Will I benefit from an independent model assessment?”, my colleague explains why independent validation is important.
The third is to incorporate continuous monitoring. The real world changes all the time but the AI may not behave within appropriate limits when that happens. By putting in place methods, controls, and tooling that continuously assess the AI, it provides the oversight that is important to have in place to handle these situations.
These three steps ensures performance issues for the AI are not missed. Simply throwing AI over the fence into production is not a recipe for success. Just ask Zillow.
Hi, my name is Theresa Smith. I’m a senior partner, product manager, and technology delivery lead. I have spent the last fifteen years leading product vision and initiatives using Strong Center Design. I build products that solve meaningful problems that people use around the world. Find more of my material on my personal blog, Strong Centers.