This project is an end-to-end data analytics case study designed to reflect a real-world business scenario. The objective is to analyse delivery performance within Olist’s marketplace and identify operational inefficiencies that impact customer satisfaction and revenue.
The project follows a structured workflow starting from raw data ingestion to business decision-making. It combines data engineering practices with analytical reasoning to deliver actionable insights.
Olist experiences inconsistent delivery performance across sellers and regions. Late deliveries may reduce customer satisfaction and impact repeat revenue, but the underlying drivers and financial exposure are unclear.
What operational factors are driving delivery delays, and how do those delays impact customer satisfaction and revenue performance?
The analysis is driven by the following hypotheses:
- H1: Delivery delays are operationally significant
- H2: Delivery delays are concentrated among a minority of sellers
- H3: Longer delivery distances increase the probability of delay
- H4: Late deliveries reduce customer satisfaction
- H5: High-delay sellers create disproportionate revenue exposure
- Primary Stakeholder: Head of Operations
- Secondary Stakeholder: Head of Customer Experience
- Evaluator: BI Manager
This project is structured into multiple phases to reflect a real-world analytics workflow:
- Define business context and hypotheses
- Build data infrastructure and validate schema
- Prepare and transform data into analytical datasets
- Perform hypothesis-driven analysis
- Design a scalable data model for reporting
- Develop dashboards for decision-making
- Deliver business recommendations
- SQL Server (T-SQL)
- Power BI
- Excel (CSV data source)
- Data Modelling (Star Schema)
- ETL / ELT using SQL
- DAX (Power BI)
- Power Query
- Git & GitHub (version control and project structuring)
This project follows a layered data architecture:
A visual architecture diagram will be updated in later stages.
The dataset follows an order-centric relational structure where the orders table acts as the central entity.
- One order can have multiple items (
order_items) - One order can have multiple payments (
order_payments) - One order can have one or more reviews (
order_reviews) - Each order is linked to a customer (
customers) - Each item is associated with a product and fulfilled by a seller
This structure requires careful handling of one-to-many relationships to avoid row duplication during analysis. Aggregation is applied before joining transactional tables to maintain a consistent grain at the order level.
This modelling approach ensures accurate aggregation and prevents revenue inflation caused by row-level duplication.
The dataset consists of multiple relational tables representing Olist’s marketplace operations, including:
- Orders
- Customers
- Sellers
- Order Items
- Payments
- Reviews
- Products
- Geolocation data
These datasets are integrated to build a unified analytical view at the order level.
This project uses the Brazilian E-Commerce Public Dataset by Olist, available on Kaggle.
- Source: Kaggle
- Author: Olist
- Link: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
The dataset contains approximately 100,000 e-commerce orders from 2016 to 2018 across multiple marketplaces in Brazil. It provides a multi-dimensional view of each order, including customer information, seller details, payments, delivery performance, product attributes, and customer reviews.
The data has been anonymised, and sensitive business information has been removed or masked.
Note: This dataset is used strictly for educational and portfolio purposes.
- An order may contain multiple items
- Each item can be fulfilled by different sellers
- Delivery performance must be analysed at the order level
- Geolocation data enables distance-based analysis
These factors influence how the analytical model is designed.
- Average Delivery Time
- On-Time Delivery Rate
- Late Delivery %
- Delivery Delay Days
- Shipping Duration vs Estimated Duration
- Average Review Score
- Low Rating %
- Repeat Purchase Rate (to be derived)
- Customer Lifetime Value (optional)
- Revenue per Order
- Freight Cost %
- Payment Method Distribution
- Seller Performance
- Regional Performance
- Product Category Performance
Final KPI definitions will evolve as the project progresses.
This repository is organised using a branch-based structure to reflect each phase of the project:
production→ Main branch with final outputs and documentationpart-1-strategic-foundation→ Business context, problem definition, hypothesespart-2-data-infrastructure→ Data ingestion, schema validationpart-3-data-preparation→ Data transformation and feature engineeringpart-4-hypothesis-testing→ Analytical queries and validationpart-5-data-modeling→ Star schema design for Power BIpart-6-dashboard→ Power BI reports and visualisationspart-7-case-study→ Executive summary and business recommendations
Each branch contains its own README with detailed explanations, SQL scripts, and supporting documentation for that phase.
- Part I – Strategic Foundation
- Part II – Data Infrastructure Setup
- Part III – Data Preparation & Feature Engineering
- Part IV – Hypothesis Testing
- Validated delay impact across operations, sellers, and distance
- Quantified customer satisfaction impact
- Estimated revenue exposure from delivery inefficiencies
- Part V – Data Modeling for Power BI
- Build star schema for reporting
- Part VI – Dashboard Development
- Part VII – Case Study Writing
This project aims to:
- Identify operational inefficiencies in delivery performance
- Detect high-risk sellers contributing to delays
- Quantify revenue exposure due to poor delivery performance
- Provide actionable insights to improve customer satisfaction
Each branch represents a distinct phase of the project.
- Navigate to individual branches to explore SQL scripts, documentation, and outputs
- The
productionbranch will contain the final consolidated version of the project
This project is being developed iteratively to reflect real-world analytics workflows. Each phase is completed, validated, and version-controlled before moving to the next.
- Build star schema model in Power BI
- Define dimensions and relationships
- Prepare dataset for dashboarding
Abhishek Chakravarty
Data Analyst | BI Developer
I specialise in building end-to-end data solutions that translate complex datasets into clear, actionable business insights. My work focuses on identifying inefficiencies, quantifying impact, and enabling data-driven decision-making at scale.
This project demonstrates:
- Structured, hypothesis-driven analytics
- Scalable data modelling (star schema)
- End-to-end ownership (data ingestion → transformation → insights → dashboarding)
💡 Key Strengths:
- Business-first analytical thinking
- Strong SQL and data modelling expertise
- Insight generation with measurable impact
📊 Tech Stack:
SQL Server • Power BI • DAX • ETL • Data Modelling

