With the return of the X-Files in form of a miniseries, I was tempted to catch up on the original run of the show, since I had only seen the occasional episode in the late 90’s or early 00’s (my mom was a big fan). Being me, I already looked up the X-Files episodes ratings on trakt.tv to see if there’s something interesting about them, but I didn’t think there was. However, when I listened to the Incomparable talking about the show, I learned that apparently X-Files can be divided into the “myth arc” and regular, more stand-alone episodes. That’s when I realized I need to get my tv show analysis boots on and try to see what I could do. To my delight, I noticed that the appropriate Wikipedia article neatly marks the myth arc episodes, ready for plucking.

And then I started plucking.

So I’ve been watching Marvel’s Jessica Jones over the past couple days, as one does, and I have opinions and stuff about it. However, since I believe that a plot is worth more than word stuff, I present to you my viewing expierence in data.

## Edit: 2016-12-18 02:13:19

Please note that this analysis is out of date and the code to acquire the data no longer works, since the source website has restructured and I have not found a way to reproduce the old behavior. Also, the current analysis is located at https://worldpenis.tadaa-data.de, so please go there for up to date code and analysis. It’s prettier. And better.

If there’s one thing I just can’t resist, it’s publicly available tabular data containing adequate amounts of numeric values. Naturally, I couldn’t resist the World Penis Data I stumbled upon somewhere over at Reddit.

So, let’s suck that data out of the web and put it into our favorite data structure.

library(tRakt) # install via devtools::install_github("jemus42/tRakt") library(dplyr) library(tidyr) library(ggplot2) get_trakt_credentials(username = "Your Username") slug <- "dig" # Slug from trakt.tv show url trakt.user.ratings(type = "episodes") %>% filter(show.slug == slug) %>% arrange(season, episode) %>% select(rating, season, episode, title) %>% mutate(season = factor(season, ordered = T)) %>% rename(user.rating = rating) %>% left_join((trakt.get_all_episodes(slug) %>% select(rating, title, epnum))) %>% gather("type", value = "rating", user.rating, rating) %>% ggplot(data = ., aes(x = epnum, y = rating, colour = type)) + geom_point(size = 6, colour = "black") + geom_point(size = 5) + ylim(c(5, 10)) + scale_colour_discrete(labels = c("My Rating", "Trakt.

I don’t know if you’ve noticed, but lately I’ve done a lot of stuff with tv shows. Along the way, I noticed some trends with a few shows which seemed quite interesting to me, namely some shows were going straight down the drain, at least as far as their recent ratings are concerned. The projects I’m referring to are these two: 100 Popular Shows on trakt.tv 100 Trending Shows on trakt.

Analyzing TV shows seems to be what I do these days. So I wanted to keep my newfound calling going and sucked the data for about a thousand shows out of the trakt.tv API, which was nice enough to only fail on me, like, twice. So, after some time of intense data pulling, I found myself with the more or less complete data (show info, season info, episode data) for 988 shows (and that’s why I keep referring to 1000(ish)).

As of today, I have my first package published on CRAN. In the grand scheme of things, that’s not really a big deal, since CRAN doesn’t have any quality standards regarding the content of a package, they just verify that the package can be installed and run without breaking horribly. Still, I’m quite happy about this minor achievement. Not because I’m particularly proud of my package, but rather since I consider it as a small verification of my ongoing path to become an R developer that doesn’t embarrass himself more than necessary.

Overanalyzing tv shows has kind of become my jam. So why not totally overdo it. Note that everything I describe in this blogpost is purely for the lulz, and I don’t pretend there’s any scientific merit to it. I just like throwing maths at data. After I more or less succesfully plotted all the things, I wanted to go full blown statisticy on the subject. While my knowledge of statistics isn’t nearly as extensive as I’d like it to, I at least know a little about comparing groups.

It’s been a while since I started working on a set of functions to pull data from trakt.tv. I documented part of the early process in an earlier blogpost, and since then I started aggregating my work into a proper package. Since trakt launched their new APIv2, I started to rewrite and ehance the package a little, also solidifying the whole authentication business. I have not implemented any OAuth2 methods, but since the purpose of this package is to pull a bunch of data and not to perform actions like checkins, I don’t think it’s a big deal.

Remember that last post? No? Good. Then don’t scroll down. Or do. Idunno. One thing I wanted for my more-or-less-automated TV show plots was appropriate colors to differentiate seasons. I assume that’s a problem we can all relate to. Of course in the R and ggplot2 bubble, there’s the RColorBrewerpackage that provides nice and easy color palettes of varying sizes. But that’s boring. Also, repetitive. So let’s fix that.

Stargate SG-1, while probably a mediocre show in the grand scheme of sci-fi shows, it’s the sci-fi show I grew up with, so I tend to enjoy rewatching parts of it occasionally. Well, at least I rewatched it twice so far. The full thing. 10 seasons. Yep. Even those last two. So this time, I wanted to cherry-pick the good™ episodes, and of course efficient cherry-picking in 2014 involves R, the trakt.

Neulich hatte L3viathan seine openpaths-Locationdaten vergisted, und da ich Spaß an R habe und neulich ja schon Dinge zu ebenjenem Anwendungsfall schrob, warf ich dann mal ein paar Dinge drauf. Hier so das Ergebnis. L3vipaths This uses l3vi’s location data. For science shits ‘n giggles. Importing the data in R `r load it read it rename it convert it factor it sort it attach it library(rjson) library(ggplot2) library(maps) library(ggmap)