While I was exploring different data sets on kaggle.com, I have seen this data set about Video Game Sales with Ratings. Umesh from kaggle created a great kernel which explores this data set and creates different graphs such as Revenue by Game, revenue by seller etc. in different regions.
This data includes sales figures from different regions such as Japan, US/North America, Europe, Other Sales, and Global Sales. I will use the code from kaggle to create the same graphs in ggplot and will discuss the trends. As Wii Sport has been given with the Wii Console by default, I have excluded this game from the results.
My first graph shows top 10 games from different regions based on sold games units in overall.
Grand Theft Auto V is the best selling game in overall sales figures around the globe. Although Grand Theft Auto V has the top spot with 56.57 million unit sales overall, it includes all different consoles. Second game Super Mario Bros has sold 45.31 million unit sales and it is also used Nintendo consoles such as Wii, DS, 3DS so we can argue that it is one of the best selling games.
There are a few trends that looks interesting from this table. Although GTA V is the best selling game in overall, it is not ranked in top 5 sellers in Japan with 1.42 million units of sales. Pokemon Red/Blue games has sold over 10 millions copies in Japan has become the best performer in top 10. Also, best selling game in North America from top 10 games is Super Mario Bros instead of GTA V. Also Tetris is very popular in North America as 73% of them has been sold in this region.
Read more “Top 10 Video Games in World”
I tried to learn R before this module through Coursera, I wasn’t able to continue to the course after second week as I found it a bit hard. Although one of my favorite character Homer Simpson would say “You tried your best and failed miserably. The lesson is, never try“, with Data Management and Analytics module I have started using/learning R again.
I have started my re-learning progress with CodeSchool‘s Try R online course. It was a good reminder for different features of R and I’ve learnt creating different graphs, using factors etc. during that 8 chapters of R adventure.
After completing that eight chapter I was ready to get real life data and conquer the world with my beautiful data stories. Obviously, it didn’t happen, yet! I have joined a few DBS Analytics Society meetings on Saturdays and started to analyse different data sets with R. Although I could have done most of those analysis in Excel in a short time, this time I am willing to learn R so I am still wrestling with it.
While I was looking for interesting data sets to analyze, I have found that Reddit and Kaggle.com websites were really useful to find different data sets. Also fivethirtyeight.com provides a lot of different data sets in their GitHub account but they are very good to find out everything from a data set so there are not many things that you could add to the story they tell.
For my first attempt to analyze data with R, I have decided to go with Simpsons data from kaggle.com and I could easily say that reading this article by Todd Schneider motivated me too.
Although there are many different outcomes in that article, I have decided to try something different and wanted to check how many times Simpsons Family characters have been used in title of episodes. Then I will try to compare how many people watched those episodes and what is the IMDb rating of the episodes.
Read more “Learning R with Simpsons”
So after a few weeks of Data Management and Analytics class and having been working on with R, I have attended to the DBS Analytics Society meeting on 22nd of October.
Thanks to Darren, we had some pastries for breakfast, eating them while drinking a double shot coffee woke me up on a Saturday morning.
Darren prepared us four different quizzes although I could have finished only two of them in 2 hours, it was a very helpful meeting to practice R with different data sets.
First quiz was about basic R commands and how to use them. It was relatively easier than the second quiz. I have uploaded my code to my Github account with the questions. I got one mistake in my first trial as first question was asking for sum of the output where I gave the output as the answer.
Read more “Secret of the Name “AshleyMadison.com””