NSDC Data Science Flashcards – Descriptive Statistics #5 - What is the Coefficient of Variation?

This NSDC Data Science Flashcards series will teach you about geospatial analysis, including visualizations, data processing, and applications. This installment of the NSDC Data Science Flashcards series was created and recorded by Emily Rothenberg. You can find these videos on the NEBDHub Youtube channel.

The Coefficient of Variation is a relative measure of variability that expresses the standard deviation as a percentage of the mean. It helps compare the relative variability of datasets with different units or scales.

The formula for the coefficient of variation is as follows:

\[CV = (σ / μ) * 100\] —> CV equals sigma/mu times 100

Where:

– \(CV\) is the coefficient of variation.

– \(σ\) is the standard deviation.

– \(μ\) is the mean.

Let’s look at an Example:

Here, we have two datasets:

Dataset A, which includes the daily temperatures in Celsius for a week

And Dataset B, which includes the daily temperatures in Fahrenheit for the same week

——————

Dataset A: Daily temperatures in Celsius for a week: [15, 18, 17, 16, 19, 16, 17]

Dataset B: Daily temperatures in Fahrenheit for the same week: [59, 64.4, 62.6, 60.8, 66.2, 60.8, 62.6]

How would we solve for the coefficient of variation?

First, let’s find the means of these two datasets.

Dataset A has a mean of 17, while Dataset B has a mean of 61.1.

Next, we need to Calculate the standard deviation for both datasets, if it has not been given to us. If you’re unsure of how to calculate the standard deviation, we recommend you watch our last flashcard as a refresher.

The standard deviation For Dataset A is \(s_A ≈ 1.51\)

The standard deviation for Dataset B, \(s_B ≈ 2.15\)

3. Now that we have the mean and the standard deviation, we can plug those numbers into our Coefficient of Variation formula.

Here, we can see that the CV for Dataset A is 8.88%, while the CV for Dataset B is 3.52%

For Dataset A, \(CV_A = (s_A / x̄_A) * 100 ≈ (1.51 / 17) * 100 ≈ 8.88%\)

For Dataset B, \(CV_B = (s_B / x̄_B) * 100 ≈ (2.15 / 61.1) * 100 ≈ 3.52%\)

What does this mean? Well, Dataset A has a higher coefficient of variation (8.88%) compared to Dataset B (3.52%), which indicates that the temperature values in Dataset A are more variable relative to their mean compared to Dataset B.

In summary, the Coefficient of Variation provides a relative measure of variability, making it easier to compare datasets with different means and units.

Thank you all so much for watching. We welcome you to follow along with the rest of the NSDC Flashcard video series to learn more.

Please follow along with the rest of the NSDC Data Science Flashcard series to learn more about math and probability.