STEAM Games Analysis
Unsupervised Learning, Text Analysis, and Network Science

Introduction
Steam is a platform that hosts video games. Tens of thousands of games are currently on the platform, which all cater towards various users. Steam collects information about each video game, such as name, price, rating, and game description. This data was parsed into the Kaggle Steam Games Dataset (Roman 2022), which we used to begin our guided analysis of the many games on the high-esteemed platform. Some questions we have are:
What trends can we uncover through clustering games based on their characteristics? What can these insights tell us?
What are some prominent patterns within game descriptions? Are games generally positive or negative based on how they are described?
How are popular games connected to each other? What characteristics do they share?
Data
We got our data from Kaggle, a website that serves as a collection of datasets for data scientists and machine learning engineers. The Steam Games Dataset (Roman 2022) utilizes Steam API and Steam Spy to scrape data from thousands of Steam games.The data came in .csv and .json files. After examination of the .csv file, the data were shifted and, unfortunately, unusable. Therefore, we further wrangled the .json file (Carpentries 2025) to obtain game data for this project.
We transposed (Wickham and Henry 2023) the data to get appropriate game data as row entries, and parsed quantitative values as numeric variables. We also removed games with incomplete or duplicate game information and titles in Korean, Chinese, and Japanese.
Analyses
With this dataset limited to games in English, we were then able to: