First, I would
like to say that I am no mathematician and I am only to some extent
familiar with probabilities and statistics. I will keep this
introduction in a very basic level. There are many better texts about
probabilities and statistics. You should probably try to find better
sources for these things.
Definitions
You can define
probability as ”The extent to which an event is likely to occur,
measured by the ratio of the favorable cases to the whole number of
cases possible”. Probability can also be defined as ”The
likelihood of given event´s occurrence”.
Statistics can be
defined as ”A branch of mathematics dealing with the collection,
analysis, interpretation, and presentation of masses of numerical
data”. It can also be defined as ”A fact or piece of data
obtained from a study of a large quantity of numerical data.
Basic things about probabilities
- The probability that two events will both occur can never be greater than each probability occurring independently. For example, a likelihood of you meeting a person who is a woman and likes ice-cream is never greater than you meeting a woman.
- If two possible events A and B are independent, then the probability that both A and B occur is equal to the product of their independent probabilities. For example, the probability of two consecutive heads in two coin flips is 0.5*0.5=0.25. Sometimes you need to calculate a conditional probability. For example, an event B occurs only if an event A has happened before. The probability of an event A happening will differ from the probability that A will happen if B occurs.
- If an event can have many different and distinct outcomes A, B and so on, then the probability that either A or B will occur is equal to the sum of the individual probabilities of A and B, and the sum of the probabilities of all the possible outcomes (A, B, and so on) is 100%. For example, you are throwing a dice with 6 numbers and you want to know what is the probability of you getting either a 1 or 2? The probability of either one happening is 1/6+1/6=1/3.
These three basic
laws of probability form much of the basis of the probability theory.
You should also remember that you can use inversion many times to
make easier calculations for probabilities. For example, to get a
probability for throwing the dice and getting 1,2,3,4, or 5 is easier
by calculating a probability of not getting 6. The probability of an
occurring event is always dependent on the number of ways it can
occur. To calculate the ways an event can occur is easier, when you
understand combinations and permutations. When you hear someone
saying ”an outcome is probable, you really hear that an outcome is
probable under some set of hypotheses he or she has about the way the
world works. Maybe the most important way to use probabilities into
one´s advantage is getting an an expected payoff. You get it by:
Multiplying the
probability of each possible outcome by its payoff and add them all
up
For
example, you flip a coin with a friend and you bet 100$ for tails.
The expected payoff is 0.5*100$=50$. You should always try to
maximize the expected payoff in whatever you are doing.
Statistics
We have
a saying in Finland: A lie, a big lie, and statistics. When there are
two different parties like employee or employer organizations, they
often interpret the same statistics differently. When this is the
case, the truth is found from the middle. You should never take any
interpretations of statistics at face value. Different incentives
give different interpretations. There are so many ways of
misinterpreting statistics that I will not get into them now. I will
keep things short.
First,
you need to understand a sample space which is the set of all
possible outcomes. For example, when you are throwing a dice once
your sample space is 1,2,3,4,5, and 6. When you work with the large
sample space, you can help yourself by using a one value that
describes the average value of the entire sample space.This is called
the central tendency. Mean median and mode are ways to describe it.
Lets keep this simple and think about the mean only. If you want to
find a mean, you have to add up all the values at the data set and
then divide them by the number of values you added to sample space. A
sample space is an important feature in statistics. This applies
especially to things that go with the normal distribution.
Normal
distributions are often used to represent random variables whose
distributions are not completely known. These distributions do not
tell much about individuals. When the data represents bigger groups
they work better. A bell curve describes the variation in normal
distribution. Most of the observations are close to the mean. Curve
slopes symmetrically downward in both sides of the mean. First, the
number of observations diminishes fast and then slower, until it is
hard to see any changes.
The
bigger the sample size compared to the population, the more it
reflects the underlying population of being sampled. These choices
for the sample should be taken randomly. Otherwise the results are
useless. A sample size of 100 in the poll or survey gives a margin
of error that is too big for most of the purposes. A sample size of
1000 usually have a margin of error around 3%. Often this is enough.
Repeating a survey with the same sample size do not give the same
results. You should expect some variation in the results.
There
is a difference between statistics and probability. Statistics
concerns the inference of probabilities based on observed data.
Probability concerns predictions based on fixed probabilities.
Shortly about randomness
A large
number of independent random variables should be distributed
according to the normal distribution. This is called the central
limit theorem. For example, you want to manufacture 1000 screws that
weigh 10 grams. You want to add enough metal that leaves each screw
weighing 10 grams, when the screws are manufactured. According to the
central limit theorem, the weight of your screws should vary
according to the normal distribution. Unless this is happening,
somebody is probably fabricating the results. There are many random
processes in which the results look like a bell curve. For example,
people´s heights, how long will they live, etc. There are also some
processes in which the normal distribution is useless like damages
from natural disasters, etc.
You can
be fooled by randomness. Sometimes, random processes look like
patterns of data. For example, so called hot hand, in which there is
a shooting streak for a basketball player is actually mostly a random
pattern. It doesn´t mean there is no skill involved. You just have
to concentrate on the long term statistics, instead of short term
patterns. It is not easy to separate random streaks from patterns.
Among a large group of people, there are always random streaks that
look like patterns. Our brains do not understand randomness well. It
is better for acknowledging patterns, even when there isn´t any.
Humans have a need to be in control of events. Random events do not
confirm this need which creates a clash between reality and the need
to feel in control.
Sources:
How Not to Be Wrong, Jordan Ellenberg
Drunkard´s Walk, Leonard Mlodinow
A Man for All Markets, Edward O. Thorp
How Not to Be Wrong, Jordan Ellenberg
Drunkard´s Walk, Leonard Mlodinow
A Man for All Markets, Edward O. Thorp
Have a
nice end of the week!
-TT
No comments:
Post a Comment