Descriptive Statistics
Data is used to understand and improve nearly every fact of our lives. you can utilize data to make better decisions and accomplish your goals.
Data Types
In general, we have two main data types data types, each with two subgroups.
Quantitative Data
Numeric values that allow mathematical operations.
- Discrete Data: Quantitative values that are countable (Number of students in a classroom).
- Continuous Data: Quantitative values that can be split into smaller values (Age: Years, Months, Days).
Continuous Data can take any numeric value including decimal values, and sometimes even negative numbers.
Categorical Data
Labels of a group or a set of items
- Ordinal (Ordered): Categorical Values that are ranked. EX. Rating: Poor, Good, Excellent
- Nominal (Non Ordered): Categorical Values that don't have ranked order. EX. Gender: Male, Female
Identifying dta types is important, as it allows us to understand the types of analysis that we can perform and the plots we can build.
Introduction To Summary Statistics
Statistics is used to describe quantitative data in two measures
- Measure of Center: Give us an idea about the average.
- Measure of Spread: Give us an idea about the differ.
When analyzing both discrete and continuous quantitative data, we generally discuss four aspects
- Center
- Spread
- Shape
- Outliers
Measure of Center
There are three widely accepted measures of center
- Mean
- Median
- Mode
Mean
Sum of all values divided by the count of the values.
EX. Dataset (5, 3, 8, 15, 48, 9)
$$Mean = \frac{5+3+8+3+15+48+9}{7} = 12.57$$
The Mean isn't always the best measure of center. For this dataset, you can see that the Mean doesn't really seem like it is in the middle of the data at all. THere are only two values (15, 48) above the recorded Mean. It is also a decimal value which will be strange if we are talking about discrete data.
Median
The middle value of the dataset.
Median divides our dataset such that 50% of the values are larger while the remaining 50% are smaller.
$$3, 3, 5, 8, 9, 15, 45$$
$$Median = 8$$
THis is a much better response than $12.57$ reported by the Mean. Not only does $8$ sit in the middle of our dataset, but it also doesn't produce a decimal value.
Mode
The most frequent (common) value in a dataset.
$$ 1, 2, 3, 3, 5, 8, 10 $$ $$Mode = 3$$
If all the values appear the same number of times, we usually say the is no Mode. However, if more than one value appears the most frequent number of times, we count all the values as Modes.
Median for Odd and Even Datasets
An a note on calculating the Median, the actual calculation depends on whether we are working with a dataset with an odd or even number of values.
Ex.
The first thing we should do is order the values from smallest to largest.
Odd Dataset $$1, 2, 3, 3, 5, 8, 10$$ $$Median = 3 $$
Even Dataset $$1, 2, 3, 3, 5, 8, 10, 15$$ $$Median = \frac{3+5}{2} = 4$$ Because there isn't an exact center value, we will take the Mean of the two values of the center as our Mode.