While I was exploring different data sets on kaggle.com, I have seen this data set about Video Game Sales with Ratings. Umesh from kaggle created a great kernel which explores this data set and creates different graphs such as Revenue by Game, revenue by seller etc. in different regions.
This data includes sales figures from different regions such as Japan, US/North America, Europe, Other Sales, and Global Sales. I will use the code from kaggle to create the same graphs in ggplot and will discuss the trends. As Wii Sport has been given with the Wii Console by default, I have excluded this game from the results.
My first graph shows top 10 games from different regions based on sold games units in overall.
Grand Theft Auto V is the best selling game in overall sales figures around the globe. Although Grand Theft Auto V has the top spot with 56.57 million unit sales overall, it includes all different consoles. Second game Super Mario Bros has sold 45.31 million unit sales and it is also used Nintendo consoles such as Wii, DS, 3DS so we can argue that it is one of the best selling games.
There are a few trends that looks interesting from this table. Although GTA V is the best selling game in overall, it is not ranked in top 5 sellers in Japan with 1.42 million units of sales. Pokemon Red/Blue games has sold over 10 millions copies in Japan has become the best performer in top 10. Also, best selling game in North America from top 10 games is Super Mario Bros instead of GTA V. Also Tetris is very popular in North America as 73% of them has been sold in this region.
Read more “Top 10 Video Games in World”
Sql stands for Structured Query Language and is being used to query and manipulate relational databases. Most of the Relational Database Management Systems use SQL as standard database language. I will be using MS SQL in these examples and learning process.
Dr Edgar F. Codd is known as the “Codd Father” of the relational databases. He described a relational model for databases in 1970. First SQL appeared in 1974 and IBM has worked to develop the ideas of Codd and released a product System/R. In 1986, IBM developed first prototype of relational database and it was standardized by ANSI.
Capabilities of SELECT statements
SELECT statements can give us a projection, we can get a subset of a column. Secondly, you can filter the number of rows with SELECT and also you can join different tables by primary and foreign keys. It allows to get data from different tables and show as a table.
Basis SELECT statement identifies the columns o be displayed and you also need to add FROM to tell which tables you will get the data from. Read more “Learning SQL – Part 1”
I tried to learn R before this module through Coursera, I wasn’t able to continue to the course after second week as I found it a bit hard. Although one of my favorite character Homer Simpson would say “You tried your best and failed miserably. The lesson is, never try“, with Data Management and Analytics module I have started using/learning R again.
I have started my re-learning progress with CodeSchool‘s Try R online course. It was a good reminder for different features of R and I’ve learnt creating different graphs, using factors etc. during that 8 chapters of R adventure.
After completing that eight chapter I was ready to get real life data and conquer the world with my beautiful data stories. Obviously, it didn’t happen, yet! I have joined a few DBS Analytics Society meetings on Saturdays and started to analyse different data sets with R. Although I could have done most of those analysis in Excel in a short time, this time I am willing to learn R so I am still wrestling with it.
While I was looking for interesting data sets to analyze, I have found that Reddit and Kaggle.com websites were really useful to find different data sets. Also fivethirtyeight.com provides a lot of different data sets in their GitHub account but they are very good to find out everything from a data set so there are not many things that you could add to the story they tell.
For my first attempt to analyze data with R, I have decided to go with Simpsons data from kaggle.com and I could easily say that reading this article by Todd Schneider motivated me too.
Although there are many different outcomes in that article, I have decided to try something different and wanted to check how many times Simpsons Family characters have been used in title of episodes. Then I will try to compare how many people watched those episodes and what is the IMDb rating of the episodes.
Read more “Learning R with Simpsons”