Apache Ignite Mid Term Project demo

Course: Big Data for Business Applications (ISM 6562)

Timeline: March 2026 - March 2026

Project Type: Coursework

Technologies Used:
SQL Apache Ignite Distributed Database

Project Description

This is a distributed Apache Ignite OLAP demo built for a Big Data course, showcasing how a grocery store simulation dataset is distributed across a 3-node in-memory cluster using automatic data partitioning. The demo features a star schema with a partitioned fact table containing 45,000 sales transactions across 30 simulated days, alongside replicated dimension tables for products, customers, employees, and time. Built with Docker Compose and Apache Ignite 2.16, the cluster automatically applies the Rendezvous affinity algorithm to partition the fact table across all 3 nodes while replicating smaller dimension tables to each node for co-located joins. The project demonstrates distributed SQL query execution with sub-100ms aggregation performance, fault tolerance through partition backups that allow queries to complete even when a node is taken offline, and the architectural distinction between PARTITIONED and REPLICATED cache modes. Data is loaded via a Python REST loader that communicates with Ignite's HTTP API, and all queries are accessible through Ignite's built-in SQLLine shell without additional tooling.

Project Resources

View Source Code on GitHub

Technologies

SQL Apache Ignite Distributed Database

Ember Your AI portfolio guide

Course Information

Big Data for Business Applications

ISM 6562

This is an intensive, hands-on course focused on managing and analyzing massive datasets using modern big data technologies. The course starts with PostgreSQL fundamentals and distributed SQL, then progresses to NoSQL databases like Cassandra and MongoDB, teaching us when to apply relational versus non-relational approaches. The second half covers big data processing frameworks, Hadoop, MapReduce, and Apache Spark, where we'll build distributed data processing pipelines using PySpark and implement machine learning at scale with SparkML. Everything is containerized using Docker, and we'll work extensively with Git for version control and Linux command line tools. The assessment structure includes two major team projects instead of traditional exams: a midterm comparing different database architectures and a final project building an end-to-end analytics pipeline. We'll also explore stream processing with Kafka and modern cloud data platforms like Databricks. The course emphasizes practical implementation, weekly labs, DataCamp modules, and real-world applications. It bridges the gap between theoretical understanding and the technical skills companies actually need for working with big data in production environments, covering everything from CAP theorem implications to query optimization strategies.

View All Courses

Back to All Projects