Data Warehousing Final Report, Historic Video Game Sales
Course: Data Warehousing (ISM 6208)
Timeline: April 2026 - April 2026
Project Type: Coursework
SQL Python Pandas
Project Description
Video Game Sales Data Warehouse is an end-to-end data warehousing project that models 44 years of global video game sales data spanning 1980 to 2024, built to answer analytical questions around genre trends, platform competition, and the relationship between critical acclaim and commercial success. The project ingests a 64,016 record dataset sourced from Kaggle and processes it through a two-schema PostgreSQL architecture consisting of a raw staging layer and a star schema data warehouse, following dimensional modeling best practices established by Kimball and Ross. The ETL pipeline is written in Python using SQLAlchemy and pandas, handling extraction from CSV, schema creation, dimension loading with static seeding and dynamic population, and fact table loading at a grain of one row per game, platform, and region. The star schema consists of seven dimension tables and one central fact table, with dim_game implementing SCD Type 2 using effective and expiry dates to track historical changes in game records, and dim_platform enriched with manufacturer, platform type, and console generation metadata derived from a manually constructed lookup table covering 81 unique platforms. Analytical queries are organized across three topics, genre trends over time, Nintendo vs Sony vs Microsoft platform wars, and critic score vs sales correlation, leveraging window functions including RANK and NTILE. Six data visualizations are generated from pgAdmin CSV exports using a dedicated Python matplotlib script, producing publication-ready charts including a diverging bar chart of genre growth across decades, grouped sales comparisons across manufacturers, NTILE quartile platform rankings, and a critic-proof games analysis highlighting high-selling titles despite poor critical reception. The project demonstrates dimensional modeling and star schema design, SCD Type 2 implementation, three-layer ETL pipeline architecture, analytical SQL with window functions, and data visualization with matplotlib and pandas. Final report is in project downloads.
Course Information
Data Warehousing
ISM 6208
This course provides a thorough foundation in designing and implementing enterprise data warehouses for business intelligence and analytics. We'll study fundamental architectures including star schemas, snowflake schemas, and dimensional modeling techniques that optimize data warehouses for analytical queries. The curriculum covers designing fact and dimension tables, managing slowly changing dimensions, and building ETL (Extract, Transform, Load) processes to integrate data from multiple source systems. We'll compare traditional data warehouse approaches with modern data lake architectures, learning the trade-offs and appropriate use cases for each. Significant emphasis is placed on data quality, governance, and metadata management, the foundational work that makes data warehouses valuable for decision-making. Topics also include OLAP cubes, data marts, and performance optimization for complex analytical workloads. The hands-on components involve actually designing warehouse schemas and implementing dimensional models, providing practical experience with the tools and techniques used in production environments. This course complements my other database and analytics coursework by focusing specifically on how to structure and manage data for business intelligence, reporting, and analytical applications at enterprise scale.
View All Courses