Thursday, July 24, 2014

Data Analysis for the GFFA

One of my favorite blogs I have discovered recently is FiveThirtyEight. It is a blog started by Nate Silver for data analytics oriented toward a general audience. Today the folks over there published a blog analyzing results of a Star Wars survey. One of the great things about this site is that most of the data they report on is made available via the code sharing site, GitHub.

You can hop over to their site to read the article. It is a great write-up, but I wanted to dig a little deeper. So I pulled the data and did my own analysis, which I am posting here. I didn't have much free time so this is just an initial study. But, I think you will find these preliminary results interesting!

One aspect I was curious about is how both the Star Wars (SW) fan and the Non-SW fan views the major characters in the films. This was a topic of heated debate between Jason Swank and Jimmy Mac of the must-listen Star Wars podcast, Rebel Force Radio. There was a "spirited" debate as to how  recognizable Boba Fett was versus Han Solo or other characters. We now have some hard data to add to the discussion!

In terms of the survey data itself, it comporises a sample of 1,186 participants. Of those surveyed, 79% have seen at least one Star Wars film and 47% identified themselves as fans. So this looks like a healthy sample of data of both fans & non-fans for studying.

First let's start with the fans. Below is my graphic showing how the fans view the characters as requested in the survey:



Many characters come as no surprise in terms of favorability: Han Solo tops the list, followed closely by Yoda and Obi-Wan Kenobi. Luke, Leia and Artoo make a strong showing with Threepio a step behind. Seeing Jar Jar Binks at the bottom of the rankings is also no surprise!

Vader's lower position on the list is a little surprising, but he is a villain so I'd think many would skew to a negative vote in light of that fact. This wouldn't necessarily mean they did not like the character.

What I found most interesting is Boba Fett. Not only is he towards the bottom of this list but that the distribution of favorability is also across the spread evenly across the spectrum. Our perception of him as being such a huge fan favorite may be due to how a vocal a subset of fandom is relative to the broader fan base.

Now let's turn to the non-Star Wars fan:




Wow! Han Solo is much lower on the favorability list even with Harrison Ford's marquee name. In contrast, Luke, Leia and Artoo - despite having not been on screen in thirty years - are still ranked as the most favorable Star Wars characters among non-fans. Vader and Yoda are well up on the list as well. Vader's distribution of favorable and unfavorable ratings again is likely driven by his status as a villain.

You may notice that the last several names have a mostly invisible bar on the right hand side of each stacked bar column. This indicates the survey respondent marked this character as uncertain (not known). Poor Padme, the mother of Luke Skywalker is unknown to a large swath of non-fans. Even Lando! Oh the humanity!

...and then there is Boba Fett. You can see he is among the lesser-knowns and those that do known him are mostly indifferent. This data is hardly a compelling case for the much-rumored Boba Fett spin-off film.

Before I warp up this blog post, I thought I'd show you an interesting comparison of how fans and non-fans ranked the six films.

For the fans in this dataset, the interpretation is clear and predictable. They love the original trilogy with The Empire Strikes Back as their top favorite. The chart below is the mean (average) rating given ranging from 1 (best) to 6 (worst). The light blue bars indicate the prequel films:




For the non-fans, the results are surprisingly different. The Phantom Menace is, on average, ranked higher than Return of the Jedi! Attack of the Clones ranks higher than A New Hope! Wow....