Learning R with Simpsons

I tried to learn R before this module through Coursera, I wasn’t able to continue to the course after second week as I found it a bit hard. Although one of my favorite character Homer Simpson would say “You tried your best and failed miserably. The lesson is, never try“, with Data Management and Analytics module I have started using/learning R again.


I have started my re-learning progress with CodeSchool‘s Try R online course. It was a good reminder for different features of R and I’ve learnt creating different graphs, using factors etc. during that 8 chapters of R adventure.


After completing that eight chapter I was ready to get real life data and conquer the world with my beautiful data stories. Obviously, it didn’t happen, yet! I have joined a few DBS Analytics Society meetings on Saturdays and started to analyse different data sets with R. Although I could have done most of those analysis in Excel in a short time, this time I am willing to learn R so I am still wrestling with it.

While I was looking for interesting data sets to analyze, I have found that Reddit and Kaggle.com websites were really useful to find different data sets. Also fivethirtyeight.com provides a lot of different data sets in their GitHub account but they are very good to find out everything from a data set so there are not many things that you could add to the story they tell.

For my first attempt to analyze data with R, I have decided to go with Simpsons data from kaggle.com and I could easily say that reading this article by Todd Schneider motivated me too.

Although there are many different outcomes in that article, I have decided to try something different and wanted to check how many times Simpsons Family characters have been used in title of episodes. Then I will try to compare how many people watched those episodes and what is the IMDb rating of the episodes.

