DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS
This project analyzes and predicts late deliveries in a supply chain dataset using machine learning models. The goal is to identify the risk of late delivery for orders based on features like customer, product, shipping, and sales data.
Key components of the project:
- Exploratory Data Analysis (EDA): Understand the dataset, feature distributions, and relationships.
- Data Preprocessing: Handle missing values, encode categorical variables, and extract features from dates.
- Predictive Modeling: Implemented multiple classifiers including Random Forest, Logistic Regression, XGBoost, Decision Tree, and K-Nearest Neighbors to predict
Late_delivery_risk. - Evaluation: Used metrics such as accuracy, recall, F1-score, confusion matrix, and ROC-AUC for performance comparison.
- Feature Importance & Explainability: Analyzed feature contributions using Random Forest importance, LIME, SHAP, and counterfactual explanations with DiCE.
This project provides a complete pipeline for risk prediction in supply chain management, helping stakeholders identify potential late deliveries and optimize operations.
