Descriptive Statistics

Data is used to understand and improve nearly every fact of our lives. you can utilize data to make better decisions and accomplish your goals.

Data Types

In general, we have two main data types data types, each with two subgroups.

Quantitative Data

Numeric values that allow mathematical operations.

  1. Discrete Data: Quantitative values that are countable (Number of students in a classroom).
  2. Continuous Data: Quantitative values that can be split into smaller values (Age: Years, Months, Days).
    Continuous Data can take any numeric value including decimal values, and sometimes even negative numbers.

Categorical Data

Labels of a group or a set of items

  1. Ordinal (Ordered): Categorical Values that are ranked. EX. Rating: Poor, Good, Excellent
  2. Nominal (Non Ordered): Categorical Values that don't have ranked order. EX. Gender: Male, Female

Identifying dta types is important, as it allows us to understand the types of analysis that we can perform and the plots we can build.

Introduction To Summary Statistics

Statistics is used to describe quantitative data in two measures

  1. Measure of Center: Give us an idea about the average.
  2. Measure of Spread: Give us an idea about the differ.

When analyzing both discrete and continuous quantitative data, we generally discuss four aspects

  1. Center
  2. Spread
  3. Shape
  4. Outliers

Measure of Center

There are three widely accepted measures of center

  1. Mean
  2. Median
  3. Mode

Mean

Sum of all values divided by the count of the values.

EX. Dataset (5, 3, 8, 15, 48, 9)

$$Mean = \frac{5+3+8+3+15+48+9}{7} = 12.57$$

The Mean isn't always the best measure of center. For this dataset, you can see that the Mean doesn't really seem like it is in the middle of the data at all. THere are only two values (15, 48) above the recorded Mean. It is also a decimal value which will be strange if we are talking about discrete data.

Median

The middle value of the dataset.

Median divides our dataset such that 50% of the values are larger while the remaining 50% are smaller.

$$3, 3, 5, 8, 9, 15, 45$$

$$Median = 8$$

THis is a much better response than $12.57$ reported by the Mean. Not only does $8$ sit in the middle of our dataset, but it also doesn't produce a decimal value.

Mode

The most frequent (common) value in a dataset.

$$ 1, 2, 3, 3, 5, 8, 10 $$ $$Mode = 3$$

If all the values appear the same number of times, we usually say the is no Mode. However, if more than one value appears the most frequent number of times, we count all the values as Modes.


Median for Odd and Even Datasets

An a note on calculating the Median, the actual calculation depends on whether we are working with a dataset with an odd or even number of values.

Ex.

The first thing we should do is order the values from smallest to largest.

  • Odd Dataset $$1, 2, 3, 3, 5, 8, 10$$ $$Median = 3 $$

  • Even Dataset $$1, 2, 3, 3, 5, 8, 10, 15$$ $$Median = \frac{3+5}{2} = 4$$ Because there isn't an exact center value, we will take the Mean of the two values of the center as our Mode.