
Explainable AI Diabetes Prediction Project
Project Description
This project is an introduction to building explainable machine learning systems for healthcare, using the Pima Indians Diabetes Dataset as the foundation. You’ll start by exploring the data. From there, you’ll tackle a real-world challenge in medical datasets: class imbalance. You’ll experiment with three different strategies – random undersampling, random oversampling, and synthetic oversampling (SMOTE) – to see how balancing the data can improve the model’s ability to detect positive (diabetic) cases.
With cleaner and better-balanced data, you’ll build and compare classification models, evaluating them with metrics such as recall for the positive class. The project then goes a step further into model interpretability: you’ll use LIME to explain individual predictions and SHAP to understand global feature importance, making the model’s decisions transparent and trustworthy.
By the end, you won’t just have a model that predicts diabetes risk – you’ll have a full, reproducible notebook that shows how to explore medical data, address imbalance, train models, and explain their predictions.
Please note: This project is intended solely for educational and analytical purposes. It does not constitute medical advice, diagnosis, or treatment, nor should it be used to guide healthcare decisions. The dataset used in this analysis is derived from a specific sample of individuals, and the findings do not generalize to all individuals or communities. Any insights or model predictions should be interpreted with this limitation in mind.
Dataset
Relevant Skills You May Apply
Python Programming and Machine Learning knowledge
Skills You May Gain
Machine Learning, Class imbalance, AI Explainability and Interpretability
Total Time
10 – 15 hours
Milestones
Milestone 1: Exploratory Data Analysis
Milestone 2: Building Machine Learning Models
Milestone 3: Random Undersampling
Milestone 4: Random Oversampling
Milestone 5: Synthetic Minority Oversampling Technique (SMOTE)
Milestone 6: Explainability and Interpretability
Deliverables
Deliverables include a project report highlighting new skills gained and an interactive Python notebook (Jupyter/Google Colab).
