Emmy Awards 2018: can data predict Best Drama?
Data can do anything.
On July 12th, the Emmy Awards 2018 nominees for Outstanding Drama were announced, celebrating the year's most binge-worthy TV. The nominations were as varied as ever, with entries like the retro-stalgic and family-friendly Stranger Things contending with post-watershed heavyweights such as HBO's Game of Thrones.
Topical analysts that we are - and following our World Cup sticker calculations - we thought we'd have a go at predicting the winners.
Despite the variety in the Emmy Awards 2018 lineup, best drama nominees - by definition - must share common attributes. Namely, in their quality and critical acclaim.
But how does one measure quality, precisely?
If you were asked to rank your own favourite shows this year, you might not struggle, but it's in defining how this subjective ranking system works that often proves challenging.
If your favourite drama last year featured dragons, predicting that Game of Thrones would top your list in 2018 would be fairly straightforward, right?
Not exactly.
If the only thing a show had to do to be crowned your personal #1 was to feature a dragon, you'd (probably) be in the minority.
What if two shows fit the bill?
You'd need a second variable. Perhaps you prefer shows with larger casts over more intimate dragon-centric ensembles, or maybe shows with the most profanity always earn your favour.
If more than one show also meets the criteria, or if a second judge enters the equation - as in Emmy-reality - guessing the favourite becomes increasingly complex.
Emmy Awards 2018: How to (try and) guess the winner
In attempting to predict this year's best drama winner, and to cope with the number of variables at play, we used a logistic regression model. Based on the data available, this process works by first assessing whether previous winners share similarities, and then by assessing how closely this year's nominees fit the criteria shared by past victors.
Still following?
Basically, it comes down to a question of which show looks most like a winner.
The criteria used to characterise nominees had to be applied to every entry in the dataset, and numeric descriptors have been used to accomplish this. For example, entries featuring a female lead received a 1 in the Female Lead column – shows which didn’t receive a 0. We also inputted:
- Aggregated review scores from various sites
- Genre flags
- Whether nominated seasons featured the demise of a main cast member
- Their production network
- The number of nominations they had received in other categories
- Several other variables that could be applied across all entries.
Thirty-nine past entries, each described by a total of fifty variables, were used to create the model. A flag of whether each entry had won in its respective year was also inputted to identify these records as our target. The seven 2018 nominees were then included in the dataset, differentiated by a second flag to note that their composition was not to be considered a factor, but that they were to be scored.
The model also assessed which of the variables used to describe our entries were deemed as most important, to avoid drawing false conclusions. An example of this is The Handmaid’s Tale, featuring Elizabeth Moss in the titular role, which won last year’s Outstanding Drama award. Whilst featuring the star is unlikely to damage any shows chances of critical acclaim, the nominated series’ she’s not in don’t become ineligible for the award by default - which the model would hypothetically account for by establishing her impact over all entries assessed.
And the winner is…
The model outputs a propensity for each show to have won in its respective year of entry. This determines a predicted winner for each year, as shown:
Overall, the model correctly predicted 5 out of the past 6 winners – although this isn’t as sterling an endorsement of its capacity to predict this year’s winner as it might first appear.
In 2012 for example, both Breaking Bad and Game of Thrones were predicted as more likely winners than Homeland was. Yet Homeland won:
OK...
This is because Homeland has not gone on to win Outstanding Drama since 2012, whilst Breaking Bad and Game of Thrones have both won twice since 2012. As far as the model is concerned, in 2012, both Breaking Bad and Game of Thrones looked like winners, because they have both since become winners.
In 2018, the frontrunner for Outstanding Drama according to our model is currently The Handmaid’s Tale, now in its second season, followed closely by Game of Thrones, then Stranger Things:
While The Handmaid’s Tale has only won once previously (given that it’s currently only in its second season), it’s also the only current nomination to have a previous victory under its belt, aside from Game of Thrones (as summarised above). Due to the limited size of the dataset, The Handmaid’s Tale in 2018 is deemed to be the most likely winner largely because it most closely fits the description for the show with the highest historical ratio of wins to nominations: The Handmaid’s Tale in 2017.
Similarly, Game of Thrones is the only current entrant to have won twice previously. While it has also lost three times, the model still identifies the series as possessing winning characteristics.
Despite the impact of prior wins on the propensity to win in future, several other variables have also been identified as significant in terms of what you need to take home the gold:
- Whether or not a show is produced by an on-demand service
- Reviewer reactions to a show’s first season
- Whether or not a show is in its inaugural season or has many behind it.
So where do we go from here?
Emmy Awards 2018 predictions... Take Two
Ultimately, regression models were designed to be used with thousands and millions of records - not thirty-nine. Future iterations of the model could be improved - and better validated - if more years of data were collated. This is because of several reasons.
Firstly, less data makes for less accurate variable assessment and weighting. Take network, for example, ten of our nominated seasons were produced by HBO, of which two have won to date. And, the same is true of AMC. Broadly, this would suggest that being produced by HBO or AMC makes no difference to the odds of success. However, this ratio is likely to change if we increased our available data points, meaning that the impact of the network would most likely gain more significance in predicting a winner.
Furthermore, despite the vast quantities of TV watching this approach would likely necessitate, including variables which only describe specific seasons of a show - not the show in general - would improve the validation process and model. If The Handmaid’s Tale in 2018 looks largely dissimilar to The Handmaid’s Tale in 2017 according to the data, e.g. due to variables relating to specific scenes in each season, 2018’s season will only be predicted as the winner if it shares similarities to other winning seasons of other shows - not just to its victorious self.
The winners of the Emmy Awards 2018 will be announced on September 17th
(We plan to build a second model before then, and to make a new prediction using our updated approach, so watch this space.)
Who do you think will take the trophy at the Emmy Awards 2018? Share your predictions below.
References