
Developing a Medical Chatbot Using RAG and LLMs
Project Description
Medical chatbots are becoming popular tools for quick symptom checks and basic health guidance, helping users access preliminary medical information quickly and conveniently. They’re transforming digital healthcare and making it more accessible.
In this project, we build a medical chatbot step by step using a Kaggle dataset of diseases and symptoms. We begin with a simple rule-based chatbot that maps keywords to known conditions. Next, we move to a Retrieval-Augmented Generation (RAG) model that uses vector embeddings to retrieve relevant information from a knowledge base. Finally, we fine-tune the LLaMA-2 model using PyTorch and Hugging Face Transformers. This domain-specific fine-tuning helps the model better understand medical terminology and provide more accurate, tailored predictions.
Along the way, we explore the strengths and limitations of each method and gain hands-on experience with widely used tools and techniques such as Transformers, LoRA, and QLoRA. While the chatbot is a valuable learning tool, it is not a substitute for professional medical advice. Please consult a doctor for any health concerns.
Environment
Please implement this project on Kaggle, as it provides free GPU access (up to 30 hours per week) essential for training Large Language Models (LLMs). Moreover, we will be accessing a dataset directly from Kaggle. Setup instructions for creating a Kaggle account and enabling GPU support will be shared with you once you register.
Dataset
This project uses the Predict Disease Symptom dataset from Kaggle.
Relevant Skills You May Apply
Intermediate Python programming skills, NLP
Skills You May Gain
Retrieval-Augmented Generation (RAG), Embeddings, Quantization, Fine-tuning Large Language Models (LLMs), LLM Inferencing
Total Time
Maximum 10 – 20 hours (2 – 4 weeks, ~5 hours/week)
Milestones
Milestone 1: Data Preprocessing
Milestone 2: Rule-Based Chatbot (Cosine Similarity)
Milestone 3: Embeddings + RAG
Milestone 4: Fine-Tuning LLMs
Deliverables
Please submit your completed Python notebook to obtain a certificate of completion.
