- EMU Library
- Research Guides
- Datathon 2018
- Statistics

This page houses information for the 2018 Datathon event.

Statistics can you communicate trends in data. You might find some of the following basic statistics to be helpful in your analysis:

- Mean
- Mode
- Median
- Standard deviation
- p-Value

Knowing the average anticipates how much the next entry will be

- To get an average, add all the entries together and divide by the number of entries

In this example, 225 is the average

250 + 300 + 225 = 775 |

775 ÷ 3 = 225 is the mean (or average) |

What is the mean good for?

Sometimes it’s useful to know if something is above or below average.

For example, is your grade above or below the average of the rest of the class?

**Mode = the most repeated number or category**

- To find the mode, write down all the data

The mode is the only measure of average that can be used with nominal data (i.e. categories).

For example, late-night users of the library were classified by faculty as:

- 14% science students
- 32% social science students
- 54% biological sciences students

The median or mean can not be calculated (what would it it be a bio-soci-sci student?).

The mode is biological science students since they are most common.

What is the mode good for?

Mode could be used for scheduling, what hours are most people buying the product, coming in to the store, calling for help?

The mode is most helpful when a single value is repeated much more often than others.

- To find the median: Put all the numbers in order

If there is an odd number of results, the median is the middle number

If there is an even number of results, the median is the mean of the two central numbers

What’s the median good for?

If you have outliers (numbers that are much higher or lower than the rest of your data), those outliers skew an average so that average is not good a describer of the data

Skewed numbers lean more one way than another

Some data is generally reported as a median, such as rent and income

- To find the standard deviation, go to the Khan Academy: https://www.khanacademy.org/math/probability/data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-step

When the values in a dataset are close together the standard deviation is small.

When the values are spread apart the standard deviation is large.

What's the standard deviation good for?

In many datasets, values deviate from the mean due to chance and such datasets are said to display a normal distribution. In a dataset with a normal distribution most of the values are clustered around the mean while a few values tend to be extremely high or extremely low (bell curve). Many natural phenomena display a normal distribution.

When data is more than two standard deviations away from the average, it’s very likely on the ends of a bell curve distribution, and it's telling you that the situation is not normal.

Source: Quora

In hypothesis testing, the null hypothesis = there is no relationship.

A researcher calculates a p-Value, which is the probability of observing an effect given that the null hypothesis is true.

The p-Value is a number between 0 and 1.

A small p-value (typically ≤ 0.05) indicates strong evidence __against__ the null hypothesis, thus showing there is a relationship.This means there is a relationship between the two variables and we are reasonably sure that it will happen again. It's statistically significant.

A large p-Value (typically > 0.05) indicates weak evidence against the null hypothesis, showing there may be no relationship between the variables.

The p-value is different in different fields of study. In some fields, it’s much lower than 5%.

-p-Values close to the .05 cutoff are marginal, so always report your p-values, and let the readers make their own conclusions.

To find the p-Value is complex, go here for more instruction: https://www.wikihow.com/Calculate-P-Value

What's the p-value good for?

If you want to test someone's hypothesis, you could conduct your own tests, and find out whether the other hypothesis is true or false. This is sometimes called validating a claim. It's important in science, because scientific claims need to be repeatable.