Wednesday, February 21, 2018

Probabilities and statistics

First, I would like to say that I am no mathematician and I am only to some extent familiar with probabilities and statistics. I will keep this introduction in a very basic level. There are many better texts about probabilities and statistics. You should probably try to find better sources for these things.

Definitions

You can define probability as ”The extent to which an event is likely to occur, measured by the ratio of the favorable cases to the whole number of cases possible”. Probability can also be defined as ”The likelihood of given event´s occurrence”.

Statistics can be defined as ”A branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data”. It can also be defined as ”A fact or piece of data obtained from a study of a large quantity of numerical data.

Basic things about probabilities

  1. The probability that two events will both occur can never be greater than each probability occurring independently. For example, a likelihood of you meeting a person who is a woman and likes ice-cream is never greater than you meeting a woman.
  2. If two possible events A and B are independent, then the probability that both A and B occur is equal to the product of their independent probabilities. For example, the probability of two consecutive heads in two coin flips is 0.5*0.5=0.25. Sometimes you need to calculate a conditional probability. For example, an event B occurs only if an event A has happened before. The probability of an event A happening will differ from the probability that A will happen if B occurs.
  3. If an event can have many different and distinct outcomes A, B and so on, then the probability that either A or B will occur is equal to the sum of the individual probabilities of A and B, and the sum of the probabilities of all the possible outcomes (A, B, and so on) is 100%. For example, you are throwing a dice with 6 numbers and you want to know what is the probability of you getting either a 1 or 2? The probability of either one happening is 1/6+1/6=1/3.
These three basic laws of probability form much of the basis of the probability theory. You should also remember that you can use inversion many times to make easier calculations for probabilities. For example, to get a probability for throwing the dice and getting 1,2,3,4, or 5 is easier by calculating a probability of not getting 6. The probability of an occurring event is always dependent on the number of ways it can occur. To calculate the ways an event can occur is easier, when you understand combinations and permutations. When you hear someone saying ”an outcome is probable, you really hear that an outcome is probable under some set of hypotheses he or she has about the way the world works. Maybe the most important way to use probabilities into one´s advantage is getting an an expected payoff. You get it by:

Multiplying the probability of each possible outcome by its payoff and add them all up

For example, you flip a coin with a friend and you bet 100$ for tails. The expected payoff is 0.5*100$=50$. You should always try to maximize the expected payoff in whatever you are doing.
Statistics

We have a saying in Finland: A lie, a big lie, and statistics. When there are two different parties like employee or employer organizations, they often interpret the same statistics differently. When this is the case, the truth is found from the middle. You should never take any interpretations of statistics at face value. Different incentives give different interpretations. There are so many ways of misinterpreting statistics that I will not get into them now. I will keep things short.

First, you need to understand a sample space which is the set of all possible outcomes. For example, when you are throwing a dice once your sample space is 1,2,3,4,5, and 6. When you work with the large sample space, you can help yourself by using a one value that describes the average value of the entire sample space.This is called the central tendency. Mean median and mode are ways to describe it. Lets keep this simple and think about the mean only. If you want to find a mean, you have to add up all the values at the data set and then divide them by the number of values you added to sample space. A sample space is an important feature in statistics. This applies especially to things that go with the normal distribution.

Normal distributions are often used to represent random variables whose distributions are not completely known. These distributions do not tell much about individuals. When the data represents bigger groups they work better. A bell curve describes the variation in normal distribution. Most of the observations are close to the mean. Curve slopes symmetrically downward in both sides of the mean. First, the number of observations diminishes fast and then slower, until it is hard to see any changes.

The bigger the sample size compared to the population, the more it reflects the underlying population of being sampled. These choices for the sample should be taken randomly. Otherwise the results are useless. A sample size of 100 in the poll or survey gives a margin of error that is too big for most of the purposes. A sample size of 1000 usually have a margin of error around 3%. Often this is enough. Repeating a survey with the same sample size do not give the same results. You should expect some variation in the results.

There is a difference between statistics and probability. Statistics concerns the inference of probabilities based on observed data. Probability concerns predictions based on fixed probabilities.

Shortly about randomness

A large number of independent random variables should be distributed according to the normal distribution. This is called the central limit theorem. For example, you want to manufacture 1000 screws that weigh 10 grams. You want to add enough metal that leaves each screw weighing 10 grams, when the screws are manufactured. According to the central limit theorem, the weight of your screws should vary according to the normal distribution. Unless this is happening, somebody is probably fabricating the results. There are many random processes in which the results look like a bell curve. For example, people´s heights, how long will they live, etc. There are also some processes in which the normal distribution is useless like damages from natural disasters, etc.

You can be fooled by randomness. Sometimes, random processes look like patterns of data. For example, so called hot hand, in which there is a shooting streak for a basketball player is actually mostly a random pattern. It doesn´t mean there is no skill involved. You just have to concentrate on the long term statistics, instead of short term patterns. It is not easy to separate random streaks from patterns. Among a large group of people, there are always random streaks that look like patterns. Our brains do not understand randomness well. It is better for acknowledging patterns, even when there isn´t any. Humans have a need to be in control of events. Random events do not confirm this need which creates a clash between reality and the need to feel in control.

This is all for now. I will probably add some things to this text at some point of time. This is a big subject and not easy for me.

Sources:

How Not to Be Wrong, Jordan Ellenberg
Drunkard´s Walk, Leonard Mlodinow
A Man for All Markets, Edward O. Thorp

Have a nice end of the week!

-TT

No comments:

Post a Comment