NSDC Data Science Flashcards – Data Science Ethics Card #6 – 5 V’s of Big Data


This NSDC Data Science Flashcards series will teach you about the importance of data ethics. This installment of the NSDC Data Science Flashcards series was created by Florence Hudson and Varalika Mahajan. Recordings were done by Lauren Close, Florence Hudson, and Emily Rothenberg. You can find these videos on the NEBDHub Youtube channel.

When we consider the opportunities and challenges to leverage data and data science, there are five basic parameters that we must consider, and that will affect how we work with the data. We call these the 5 Vs of Big Data. The 5 Vs are  Volume, Velocity, Variety, Veracity, and Value.

First, volume. How much data do you have? You could be dealing with megabytes which is millions of bytes of data, or gigabytes which is billions of bytes of data, petabytes, or even brontobytes which is 10 to the 17th bytes of data. 

Next, Velocity. How quickly is the data being captured and how fast is it growing?

Next Variety. There are many types of data. Structured data such as data in a table or spreadsheet from sales revenue or lab results, unstructured data from social media or handwritten notes, different modalities such as audio and video, and image data. 

Next Veracity, which is the uncertainty of the data. Do you know if the data is reliable? Has it been changed since it was created by a trusted source? This is where big data and cybersecurity come together. 

Then, Value. What is the value of the data? Who will use it and why? What problems can it help you solve? How can you use data for good?

Please follow along with the rest of the NSDC Data Science Flashcard series to learn more about data science ethics.