
Sentiment Analysis of Movie Reviews
Project Description
This project will introduce students to an array of skills as they strive to create a sentiment analysis model to annotate a given review as positive, negative or neutral. Sentiment Analysis leverages both Natural Language Processing (NLP) and Machine Learning (ML) skills – how to represent text in a machine-understandable format so as to classify the text and extract sentiment. We will also cover visualizations and deploying models in the real world.
Dataset
Internet Movie Database (IMDb) Movie Reviews (.csv here)
Relevant Skills You May Apply
Basic Python Programming and NLP understanding
Skills You May Gain
Data Cleaning & Pre-Processing, Data Visualization, Machine Learning Models and NLP Techniques
Total Time
4-6 weeks (2-3 hours per week per person in each team)
Milestones
Milestone 1: Set up Python Notebook, Read Comma-separated Values (CSV) file, Basic Data Pre-Processing and Cleaning (steps will be outlined)
Milestone 2: More Advanced Data Pre-Processing (Tokenizing, Stemming, etc.)
Milestone 3: Building the Machine Learning Classifier
Milestone 4: More Machine Learning Classifiers and Evaluation Metrics & Visualizations
Deliverables
Deliverables include a project report highlighting new skills gained and an interactive Python notebook (Jupyter/Google Colab).