How Zillow could have avoided its $500M AI mistake

Share it:

How Zillow could have avoided its $500M AI mistake

Zillow’s reliance on 30-day-old data for real-time decisions proved fatal for its home-buying arm, Zillow Offers, which shut down after a $500 million loss in 2021.

Zillow’s models did not properly predict or adjust to a change in the real estate market, and as a result prices were dropping and Zillow was still buying. Speculation on why the models failed has included:

  • Poor training data
  • Inadequate monitoring
  • Insufficient human interventions

While all of these play a part in fielding and maintaining an AI system, the failure was likely even more fundamental:

Zillow may have used data that was more than 30 days old to make near real-time decisions.

Engineering and Data Science

At SphereOI we do engineering, data science, and analysis along with the AI. Using data that updates every 30 days to make decisions today does not make any sense. Zillow paid a big price; many other AI teams make similar mistakes because they ignore good systems and data engineering practices in favor of using a shiny new algorithm and trusting that algorithm without applying engineering scrutiny.

To get reliable and trustworthy results from an AI system, data analysis should include the distributions, frequency of update, volume of data, any loss in data transformations, and pedigree of data. Our experience working in mission and life critical systems like missile defense, terrorist watchlisting, and residential alerting systems adds another layer of maturity to the AI systems we deploy.

Some initial analysis

The Zillow algorithms are proprietary, but they appear to rely on tax records, zip code, property size, property classification, and to some extent, the Multiple Listing Services (MLS) that real estate professionals use, along with many other features in the data.

Their claim of estimates being within 3% to 4% of the sale price is somewhat overstated – their initial estimates are produced by the algorithm, then the estimate appears to be adjusted based on new information, like a listing price in MLS.

The estimate is further adjusted if there is a sale, using the new sale price. This final adjusted price is what they use to measure accuracy of 3% to 4%, not the initial estimate you see.

The problem is that the data they use to make initial estimates, and even estimates after a house is listed for sale, cannot include what price is offered and accepted. That information is not available to Zillow, or MLS, or even the government tax agencies until the sale has been completed at closing – often more than 30 days later.

A recent example

Here is a recent example. I know this is just one sample, but it illustrates the point, and the same experience has been shared by many others.

  1. An investment property was recently listed for sale.
  2. The Zestimate was $245,000.
  3. The property listed for $241,000, looking for a quick sale.
  4. 2 cash offers were received within 2 days at $231,000 and one was accepted.
  5. The Zestimate was now $241,000.
  6. Another similar condo went on sale the next week for $242,000 and sold immediately.
  7. We do not know what price was negotiated.
  8. About a month later the Zestimate went down to $230,000.


Engineering, not AI

If Zillow, or any investor or homeowner, had purchased this property at $241,000 (under their market estimate) they would have lost more than $10k on this single transaction. The mismatch between data latency and market volatility may have caused Zillow to miss this shift in housing prices. This was as much a failure in engineering and data analysis as it was in AI.

Continuous Monitoring

At SphereOI we develop and deploy continuous monitoring for our AI models in the field. The goal here is to detect and alert when the results from a system are not matching real world data.

These monitoring systems can be complex in themselves. They take many forms including

  • Assessing the distribution of input data to see when those distributions stray significantly from the data used to train the model
  • Comparing predicted outcomes to actual outcomes (when possible) to see if the model is performing within tolerance.

A naive monitoring system in the Zillow case would have missed the market shift as well; the monitoring system probably does not have any “fresher” data than the operational system. In this case, fundamental statistical analysis, data science, and systems engineering were required to avoid making inaccurate estimates and predictions.

Finally

Trustworthy systems of any type, AI systems included, require us to understand the data and domain deeply enough to ensure that the data we have can support the decisions we need.

Scott Pringle

Scott Pringle is an experienced hands-on technology executive trained in applied mathematics and software systems engineering. He is passionate about using first principles to drive innovation and accelerate time to value.