Predicting Movie Ratings: The Math That Won The Netflix Prize


The simplest model for predicting a new rating is to just guess a single number each time. What number to guess? The one that minimizes the prediction error. This happens to be the mean (3.7 stars I think). But while we have a very robust estimate of the rating mean, derived from 100 million examples, our model is too simple, and the error rate is around 1.05 stars (the Netflix “Cinematch” system is off by 0.95 stars; the winning system is off by 0.85 stars).

Let’s make the model a little fancier. We’ll add an offset for each movie and for each user to acknowledge that "The Shawshank Redemption" is better than average and I am more critical than average. The predicted rating according to the new model is the overall mean plus a movie offset plus a user offset. My predicted rating for Shawshank Redemption might be 3.7 (overall mean) + 0.8 (Shawshank offset) – 0.3 (my offset) = 4.2. This new model, which adds 500,000 user parameters and 20,000 movie parameters, improves the error rate to 0.97 stars. Again the parameters are estimated from the training data to minimize the error rate.

Predicting Movie Ratings: The Math That Won The Netflix Prize Saturday, April 26, 2014 @ 5:46pm

Comment