# Overanalyzing TV Shows

**EDIT 2017-08-14**: I’ve fixed all the code so it works and is up to date. Side effects include the text not necessarily matching the data. Whoopsie.

Overanalyzing tv shows has kind of become my jam. So why not totally overdo it.

Note that everything I describe in this blogpost is purely for the lulz, and I don’t pretend there’s any scientific merit to it. I just like throwing maths at data.

After I more or less succesfully plotted all the things, I wanted to go full blown statisticy on the subject. While my knowledge of statistics isn’t nearly as extensive as I’d like it to, I at least know a little about comparing groups. So, that’s pretty much the plan.

## Methodology(ish)

The basic idea is simple: I pull the episode data from trakt.tv and receive a nice `data.frame`

with episode numbers, season number, ratings (0-10) and much more. Assuming that seasons are a decent way of dividing a tv show into groups, I’ll be using them exactly as such. The variable we’re interested in comparing is the rating, provied by the trakt.tv userbase.

Using that information we can perform an analysis of variance (ANOVA), with the `aov()`

function. If it yields any signicificant results, we then perform the `TukeyHSD`

test for post-hoc analysis to find the season comparison with the significant results (i.e. lowest p-value). And yes, I know the p-value in itself is at least a little controversial, and the whole plan might sound completely rubbish to you, but please keep in mind that I’m only doing this for fun.

## Acquiring the Data

This step is pretty simple thanks to the work I did with my tRakt package which allows us to pull the data we want in a few simple steps. I’ll also start by loading some packages I intend on using throughout this blogpost. The idea is that you can aggregate all the code in this post and thereby reliably reproduce my results. Because that’s how I like to science.

```
library(tRakt) # To get the data. Get it via devtools::install_github("jemus42/tRakt")
library(dplyr) # Because convenience
library(ggplot2) # In case of plot
library(grid) # For some ggplot2 theme specs
library(scales) # Primarily for pretty_breaks() in plots
library(broom) # To pretty up output of tests & models
library(knitr) # For simple tables via knitr::kable
library(tadaatoolbox) # For prettier test output
if (is.null(getOption('trakt.client.id'))){
get_trakt_credentials(client.id = "12fc1de7671c7f2fb4a8ac08ba7c9f45b447f4d5bad5e11e3490823d629afdf2")
}
theme_set(theme_tadaa())
```

Now we’re all set. At this point, we can pull the episode data via `trakt.get_all_episodes(slug, season_nums)`

, so if we were to look at Game of Thrones (and we will), we’d call `trakt.get_all_episodes("game-of-thrones", 1:4)`

.

So, why not get started on that.

## Analyzing all the things

### Game of Thrones.

```
got <- trakt.get_all_episodes("game-of-thrones", 1:6)
# Glance at the data
got %>% select(season, episode, rating) %>% head %>% kable
```

season | episode | rating |
---|---|---|

1 | 1 | 8.08501 |

1 | 2 | 8.09403 |

1 | 3 | 8.03229 |

1 | 4 | 8.06916 |

1 | 5 | 8.09534 |

1 | 6 | 8.28193 |

Now that we have the data, we can pump it through `aov()`

and look at the `broom::tidy`

’d data.

Note that I pump the output through `knitr::kable`

, which converts the output to a markdown table suitable for imbedding in this blogpost.

`tadaa_aov(rating ~ season, data = got, print = "markdown") `

Table 1: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 5 | 0.68 | 0.14 | 2.32 | 0.056 | 0.18 | 0.46 | 0.75 |

Residuals | 54 | 3.18 | 0.06 | |||||

Total | 59 | 3.86 | 0.2 |

Well, that’s anticlimactic. Anyways, I assume it’s safe to say that Game of Thrones is pretty consistent with it’s ratings across all seasons. I hope it can keep this up for the following one to five seasons.

A neat way to look at the distributions across multiple groups is via density plots. If you’re unfamiliar with them, just imagine a standard histograms (values on the x, frequencies on the y axis), but with interpolation/smoothing to produce curves. I know that’s probably not the right way to think of it, but should give you an idea of what’s up.

```
ggplot(data = got, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Game of Thrones Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

What I like about this plot is how season 2 seems so neatly grouped around ~8.6 with a bump near the top of the scale, which I assume to be s02e09/s02e10. Especially season 4 on the other hand is pretty spread out, meaning there’s a broader range of ratings, or at least a more even spread of ratings.

### The Simpsons

```
simpsons <- trakt.get_all_episodes("the-simpsons")
tadaa_aov(rating ~ season, data = simpsons, print = "markdown")
```

Table 2: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 28 | 18.52 | 0.66 | 25.84 | < 0.001 | 0.54 | 1.09 | 1 |

Residuals | 610 | 15.61 | 0.03 | |||||

Total | 638 | 34.13 | 0.69 |

Oh, look, a significant value. Nice. Let’s throw `TukeyHSD`

at it to see what sticks.

```
aov(rating ~ season, data = simpsons) %>%
TukeyHSD() %>%
tidy() %>%
filter(adj.p.value < 0.05) %>%
nrow()
```

`## [1] 194`

Welp, that’s quite a bunch of significant values. Let’s look at everything with a `p < 0.00000001`

, because I tried *normal* significance levels and there were still too many hits.

```
aov(rating ~ season, data = simpsons) %>%
TukeyHSD() %>%
tidy() %>%
filter(adj.p.value < 0.00000001) %>%
tadaa_plot_tukey()
```

I think the result is quite convincing. Season ~~19~~ 28 is, statistically speaking, the worst of The Simpsons.

I don’t think a density plot would make much sense here, since there are *so many seasons*, so I’ll just give you a scatterplot.

```
ggplot(data = simpsons, aes(x = epnum, y = rating, colour = season)) +
geom_point(size = 4, colour = "black") +
geom_point(size = 3) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "The Simpsons Episode Ratings per Season",
y = "Rating", x = "Episode (absolute)", colour = "Season") +
theme(legend.position = "top") +
theme(legend.key.size = unit(.3, "cm")) +
theme(legend.text = element_text(size = 8))
```

If you want to take a closer look, try looking it up on tRakt-shiny

### Star Trek: Enterprise

As per request, let’s look how that one’s doing.

```
ste <- trakt.get_all_episodes("star-trek-enterprise")
tadaa_aov(rating ~ season, data = ste, print = "markdown")
```

Table 2: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 3 | 0.53 | 0.18 | 4.74 | < 0.01 | 0.13 | 0.39 | 0.9 |

Residuals | 94 | 3.48 | 0.04 | |||||

Total | 97 | 4.01 | 0.21 |

Once again a significant result. From what I’ve been told, s04 is said to be better than the other seasons, so I guess that’s what we’ll be looking for here.

```
aov(rating ~ season, data = ste) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

Now let’s look at the ratings in density-plot form.

```
ggplot(data = ste, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Star Trek: Enterprise Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

Welp, it really looks like s04 was (significantly) more well-received than the other seasons.

### Futurama

A long standing favorite of mine and also a request. I wonder if the many many cancellations had any effect on season reception.

```
futurama <- trakt.get_all_episodes("futurama")
tadaa_aov(rating ~ season, data = futurama, print = "markdown")
```

Table 3: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 6 | 0.27 | 0.05 | 1.09 | 0.37 | 0.05 | 0.24 | 0.44 |

Residuals | 117 | 4.85 | 0.04 | |||||

Total | 123 | 5.12 | 0.09 |

Assuming we’re working with a \(\alpha = 0.05\) (which I’ll just say we do), that’s a significant result.

```
aov(rating ~ season, data = futurama) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

Welp, at least the first line yields a significant result with the `.05`

cutoff, so I guess you could make an argument that s05 was better than s07, but besides that it seems like Futurama was pretty consistent.

Let’s look at a plot to get a feel for that.

```
ggplot(data = futurama, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Futurama Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

Yip. Seems pretty consistent to me.

### Lost

Oh screw me. Well, let’s do this.

```
lost <- trakt.get_all_episodes("lost-2004")
tadaa_aov(rating ~ season, data = lost)
```

```
## term df sumsq meansq statistic p.value eta.sq.part
## 1 Residuals 114 3.26835512 0.028669782 NA NA NA
## 2 season 5 0.03805877 0.007611755 0.2654975 0.9310704 0.01151059
## 3 Total 119 3.30641390 0.036281537 NA NA NA
## cohens.f power
## 1 NA NA
## 2 0.1079103 0.1170426
## 3 NA NA
```

Huh. So I guess we’ll have to take a closer look.

```
aov(rating ~ season, data = lost) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

I haven’t watched Lost and I don’t really care for it, so make of this what you will.

```
ggplot(data = lost, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Lost Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

### Daria

Everybody loves Daria, so don’t pretend you don’t know what’s up.

```
daria <- trakt.get_all_episodes("daria")
tadaa_aov(rating ~ season, data = daria, print = "markdown")
```

Table 4: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 4 | 0.34 | 0.09 | 0.72 | 0.58 | 0.05 | 0.22 | 0.24 |

Residuals | 60 | 7.1 | 0.12 | |||||

Total | 64 | 7.44 | 0.2 |

Oh, well.

```
aov(rating ~ season, data = daria) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

It’s been a while since I watched Daria, but season 4 seems… oh wait, nevermind. There are several missing values in the dataset, I don’t feel comfortable making assumptions based on insufficient data. I mean, look at this plot of the rating distribution:

```
ggplot(data = daria, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Daria Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

And then look at the vote distribution:

```
ggplot(data = daria, aes(x = votes)) +
geom_histogram(alpha = .6, binwidth = 1) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Daria Episode Votes",
y = "Density", x = "Vote", fill = "Season")
```

The peak at \(x = 0\) means exactly what you think it means: A large number of episodes with no votes at all.

So, that’s a bummer.

### Person of Interest

```
poi <- trakt.get_all_episodes("person-of-interest")
tadaa_aov(rating ~ season, data = poi, print = "markdown")
```

Table 5: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 4 | 0.87 | 0.22 | 3.83 | < 0.01 | 0.14 | 0.4 | 0.9 |

Residuals | 98 | 5.6 | 0.06 | |||||

Total | 102 | 6.47 | 0.28 |

No post-hoc necessary. Courtesy plot:

```
ggplot(data = poi, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Person of Interest Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

### Castle

```
castle <- trakt.get_all_episodes("castle")
tadaa_aov(rating ~ season, data = castle, print = "markdown")
```

Table 6: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 7 | 7.67 | 1.1 | 28.73 | < 0.001 | 0.55 | 1.1 | 1 |

Residuals | 165 | 6.29 | 0.04 | |||||

Total | 172 | 13.97 | 1.13 |

Huh.

```
aov(rating ~ season, data = castle) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

There seems to be quite some disturbance in the force with this one.

```
ggplot(data = castle, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Castle Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

### Parks & Recreation

```
prec <- trakt.get_all_episodes("parks-and-recreation")
tadaa_aov(rating ~ season, data = prec, print = "markdown")
```

Table 7: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 6 | 2.87 | 0.48 | 10.51 | < 0.001 | 0.35 | 0.73 | 1 |

Residuals | 118 | 5.38 | 0.05 | |||||

Total | 124 | 8.25 | 0.52 |

Again? Whow, we’re on a roll.

```
aov(rating ~ season, data = prec) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

What. It looks like almost every season is rated significantly different than every other.

```
ggplot(data = prec, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Parks & Recreation Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

Oh yeah. That show is a mess, consistency-wise.

Interesting how bad s01 seems to be doing.

### Sleepy Hollow

Haven’t seen it, not sure if I will.

```
shollow <- trakt.get_all_episodes("sleepy-hollow")
tadaa_aov(rating ~ season, data = shollow)
```

```
## term df sumsq meansq statistic p.value eta.sq.part
## 1 Residuals 58 1.199994 0.02068955 NA NA NA
## 2 season 3 1.454644 0.48488120 23.43604 4.618261e-10 0.5479631
## 3 Total 61 2.654638 0.50557075 NA NA NA
## cohens.f power
## 1 NA NA
## 2 1.101004 1
## 3 NA NA
```

Well.

```
aov(rating ~ season, data = shollow) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

Okay, with only two seasons there’s not much fun in doing an ANOVA in the first place, a simple t-test would probably have done the trick.

```
ggplot(data = shollow, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Sleepy Hollow Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

The point being: It doesn’t seem to get better.

### Family Guy

To end this, here comes a favorite of mine.

```
fguy <- trakt.get_all_episodes("family-guy")
tadaa_aov(rating ~ season, data = fguy, print = "markdown")
```

Table 8: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 15 | 6.61 | 0.44 | 21.19 | < 0.001 | 0.52 | 1.04 | 1 |

Residuals | 293 | 6.09 | 0.02 | |||||

Total | 308 | 12.71 | 0.46 |

Oh well, I’ll take it.

```
aov(rating ~ season, data = fguy) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

Hm.

```
fguy$rating[fguy$rating == 0] <- NA
ggplot(data = fguy, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Family Guy Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

### Scrubs

As an afterthought, let’s look at the one that got this whole thing started.

```
scrubs <- trakt.get_all_episodes("scrubs")
tadaa_aov(rating ~ season, data = scrubs, print = "markdown")
```

Table 9: **One-Way ANOVA**: Using Type III Sum of Squares

Term | df | SS | MS | F | p | \(\eta^2\) | Cohen’s f | Power |
---|---|---|---|---|---|---|---|---|

season | 8 | 10.83 | 1.35 | 65.33 | < 0.001 | 0.75 | 1.74 | 1 |

Residuals | 172 | 3.57 | 0.02 | |||||

Total | 180 | 14.4 | 1.37 |

That’s… convincing.

```
aov(rating ~ season, data = scrubs) %>%
TukeyHSD() %>%
tidy() %>%
tadaa_plot_tukey()
```

Once again, pretty convincing. Nobody likes s09.

```
ggplot(data = scrubs, aes(x = rating, fill = season)) +
geom_density(alpha = .6) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
labs(title = "Scrubs Episode Ratings per Season",
y = "Density", x = "Rating", fill = "Season")
```

…yip.

## Limitations

Probably the most important thing to note about all of this is that trakt.tv doesn’t nearly have the userbase of other, well established sites like IMDb, so, well, I guess the sample size might be too small, or maybe there are some biases or confounders at work which I can’t really adjust for.

A way around this would probably be to use the `votes`

variable in the episode dataset to normalize the ratings across a show, but I don’t really know a decent way to do that, since the per-episode ratings already represent the average of all the user votes for that episode.

The other problem is that I never once checked for normality within the ratings, which should be given if you want to perform an ANOVA. If the variable you’re looking at is not distributed normally, you should be doing a Kruskal-Wallis, but I’m a lazy man.

Well, there’s that and my whole point about how this is all fun and games, and no serious points shall be made.