# Probability For Machine Learning (Part 1)

## Random variables, distributions, and moments

### Random variables

- A
*random variable*is a function which assigns a number to*events*in the*sample space*. (A better name might be “random-valued function on the sample space.”)

- We describe the
*probability of an outcome*in terms of the probability of a random variable taking a given value:

### Continuous random variables

- Consider choosing a random number between 0 and 1, where all values are equally likely.
- Since there are an (uncountably)
*infinite*number of values, the probability of any given value is zero:\(Prob(X = x_0) = 0\) - Does not mean event is impossible. Points on a line have “
*measure zero*” - Instead, ask for probability to lie within a given range, e.g.,

### Probability distributions

- More generally, a probability distribution satisfies:
- $P(x) \geq 0$
- $\sum_k p(x_k) = 1, x_k$ discrete
- $\int_{-\infty}^{\infty}p(x)dx=1, x$ continuous

### Cumulative distribution function

- The
*cumulative distribution function*, or CDF, gives the probability that X is less than or equal to a given value.

- It contains (nearly) the same information as the probability density, since

- The CDF is often easier to approximate from empirical data and it is useful since

### Change of variable

- Suppose we wish to change variables, shift the distribution, or consider functions of the random variable. If $x \rightarrow y = y(x)$ then the density in terms of the new variable is given by

which preserves the normalization condition. (This assumes y is an increasing function of x. If not, an absolute value is needed for positivity; and further care is needed if y has critical points.)

\(p(x)dx = g(y)dy\) \(g(y) = \frac{p(x)}{|dy / dx|}\)

### Expectations and moments

The probability distribution defines weighted averages over the sample space, where the weight of each event is equal to its probability. These are called

*expected values*.For the discrete case,

- while for the continuous case,

### Mean of a distribution

- The
*mean*of the distribution is simply the expectation of the random variable itself:

- In the case of an infinite sample space, whether continuous or discrete, the mean is not guaranteed to exist since the integral or the sum might not converge.

### Moments of a distribution

- The
*moments*of a distribution are the expectation of*powers*of the random variable itself.

- If all the moments are known – and if they exist – they can be used to get the expectation of other functions using the
*linearity*of the expectation operator

### Variance and standard deviation

- Of particular interest is the second moment, in combination with the mean, defining the variance:

- The standard deviation, which is the square root of the variance, has the
*same units*as the random variable (e.g., rate of return, dollars, etc.)

### Higher moments characterize properties of a distribution

*Variance*– dispersion measure based on second moment

*Skewness*– asymmetry parameter based on 3rd moments; dimensionless – normalized cumulant

*Kurtosis*– measure of tail “weights” in terms of 4th moments; zero for Gaussian, bounded below by -1.

### Covariance and correlation

- For any
*two*random variables, not necessarily independent or identically distributed, their covariance is defined as

- The correlation is proportional to the covariance,

- Dividing the covariance by the standard deviations makes the correlation a pure number, and

### Summary

- “
*Random variables*” are functions that assign a*number*to events in the sample space. They can be discrete, continuous, or a mix of both. - The
*probability distribution*is*positive*and*sums to one*. - Expected values, or expectations, are weighted averages that use the probabilities as the weights.
- The
*moments*of a distribution, such as the mean, are expectations of various powers of a random variable. They are numbers, not functions. *Variance*,*skewness*, and*kurtosis*are simple functions of the moments that characterize the shape of the probability distribution.- When there are multiple random variables, their
*covariance*and*correlation*are also computed as expectations.

(To be Continued!)

This post is licensed under CC BY 4.0 by the author.