STEAM Games Analysis

Unsupervised Learning, Text Analysis, and Network Science

Authors

Tiffany Sun

John Lim

Surya Rao

Last updated

May 6, 2025

Abstract
There are tens of thousands of video games in the world. Through techniques for unsupervised learning, text analysis, and network science, we hope to uncover similarities, patterns, and connections between video games uploaded to the gaming platform Steam.

Image courtesy of steampowered.com

Introduction

Steam is a platform that hosts video games. Tens of thousands of games are currently on the platform, which all cater towards various users. Steam collects information about each video game, such as name, price, rating, and game description. This data was parsed into the Kaggle Steam Games Dataset (Roman 2022), which we used to begin our guided analysis of the many games on the high-esteemed platform. Some questions we have are:

  1. What trends can we uncover through clustering games based on their characteristics? What can these insights tell us?

  2. What are some prominent patterns within game descriptions? Are games generally positive or negative based on how they are described?

  3. How are popular games connected to each other? What characteristics do they share?

Data

We got our data from Kaggle, a website that serves as a collection of datasets for data scientists and machine learning engineers. The Steam Games Dataset (Roman 2022) utilizes Steam API and Steam Spy to scrape data from thousands of Steam games.The data came in .csv and .json files. After examination of the .csv file, the data were shifted and, unfortunately, unusable. Therefore, we further wrangled the .json file (Carpentries 2025) to obtain game data for this project.

We transposed (Wickham and Henry 2023) the data to get appropriate game data as row entries, and parsed quantitative values as numeric variables. We also removed games with incomplete or duplicate game information and titles in Korean, Chinese, and Japanese.

Analyses

With this dataset limited to games in English, we were then able to:

  1. Find similarities between different games through clustering

  2. Analyze patterns and trends in game descriptions

  3. Explore how games are connected to one another

References

Carpentries, T. (2025), “Processing JSON data (optional),” R for Social Scientists, Available at https://datacarpentry.github.io/r-socialsci/07-json.html.
Roman, M. B. (2022), “Steam games dataset,” Kaggle. https://doi.org/10.34740/KAGGLE/DS/2109585.
Wickham, H., and Henry, L. (2023), “Transpose a list,” Available at https://purrr.tidyverse.org/reference/transpose.html.