NSDC Data Science Project – Sentiment Analysis


Creative Teamwork Meeting Discussion Ideas Concept

Sentiment Analysis of Movie Reviews

Project Description

This project will introduce students to an array of skills as they strive to create a sentiment analysis model to annotate a given review as positive, negative or neutral. Sentiment Analysis leverages both Natural Language Processing (NLP) and Machine Learning (ML) skills – how to represent text in a machine-understandable format so as to classify the text and extract sentiment. We will also cover visualizations and deploying models in the real world.


Dataset

Internet Movie Database (IMDb) Movie Reviews


Relevant Skills You May Apply

Basic Python Programming and NLP understanding


Skills You May Gain

Data Cleaning & Pre-Processing, Data Visualization, Machine Learning Models and NLP Techniques


Total Time

4-6 weeks (2-3 hours per week per person in each team)


Milestones

Milestone 1: Set up Python Notebook, Read Comma-separated Values (CSV) file, Basic Data Pre-Processing and Cleaning (steps will be outlined)
Milestone 2: More Advanced Data Pre-Processing (Tokenizing, Stemming, etc.)
Milestone 3: Building the Machine Learning Classifier
Milestone 4: More Machine Learning Classifiers and Evaluation Metrics & Visualizations


Deliverables

Deliverables include a project report highlighting new skills gained and an interactive Python notebook (Jupyter/Google Colab).