Continuous random variables
A continuous random variable is one that can take on (at least, in principle) a continuous range of real numbers. Here are a few examples.:
- Example A: Find somebody and choose \(X\) to be a very precise measure of their height,
- Example B: Randomly choose a college and choose \(X\) to be the average salary of all the professors.
- Example C: Let \(X\) be the average margin of victory in all Super Bowls with Taylor Swift in attendance as of the year 2050.
The tricky thing is trying to figure out how to describe the distribution of a continuously distributed random variable.
Example
Suppose we pick a number \(X\) uniformly at random from the interval \([-10,10]\). What is the probability that the number lies in the interval \([1,3]\)?
Solution: We simply divide the length of the sub-interval by the length of the larger interval to get
\[P(1<X<3) = \frac{2}{20} = 0.1.\]
Note that we’ve indicated the event using the inequality \(1<X<3\), as we will typically do.
A distribution as a limit
There should be an obvious relationship between the continuous uniform distribution and the discrete uniform distribution. The picture below illustrates this relationship and also helps us see how a continuous distribution might arise as a limit of discrete distributions.
Probability density functions
One common way to define a continuous distribution is as the definite integral against a non-negative function of total integral 1. That is, \(f\) should satisfy
- \(f(x) \geq 0\) for all \(x\in\mathbb R\) and
- \(\displaystyle \int_{-\infty}^{\infty} f(x) \ dx = 1.\)
Then, if \(X\) is a random variable with distribution \(f\), we have \[P(a<X<b) = \int_a^b f(x) \ dx.\]
To generate the uniform distribution over an interval from \(x=A\) to \(x=B\), for example, we could define \[f(x) = \begin{cases} 1/(B-A) & A<x<B \\ 0 & \text{else}.\end{cases}\]
Another example
As another simple example, let’s take \[f(x) = \begin{cases} \frac{3}{10}(x^2+1)(2-x) & 0<x<2 \\ 0 & \text{else}.\end{cases}\] It’s not too hard to show that \[\int_{-\infty}^{\infty} f(x) \ dx = \int_0^2 \frac{3}{10}(x^2+1)(2-x) \ dx = 1.\] If \(X\) has distribution \(f\) and we want to know, for example, \(P(\frac{1}{2}<X<1)\), we can compute directly: \[P(\frac{1}{2} < X < 1) = \int_{1/2}^1 \frac{3}{10}(x^2+1)(2-x) \ dx = \frac{187}{640} = 0.2921875.\]
The exponential distribution
Many important continuous distributions are positive over an unbounded interval. The exponential distribution, for example, has the form
\[f_{\lambda}(x) = \begin{cases} \lambda e^{-\lambda x} & x > 0 \\ 0 & \text{else}.\end{cases}\]
Note that the exponential distribution depends on a parameter, \(\lambda\). No matter what the value of \(\lambda\), though, we have
\[\int_{-\infty}^{\infty} f_{\lambda}(x) \ dx = \int_0^{\infty} \lambda e^{-\lambda x} \ dx = 1.\]
The larger \(\lambda\), the more concentrated the distribution is near zero:
PDF vs CDF
Probability and statistics are rife with acronyms. Here are a couple of important ones:
- PDF or Probability Density Function
- CDF or Cumulative Density Function
You might see Density replaced with Distribution; the preferred reading can be context dependent. The essential difference between PDF and CDF, though, is that PDF is local, while CDF is cumulative. Put another way, the PDF \(f\) is the function that you integrate to get the CDF \(F\):
\[F(x) = \int_{-\infty}^x f(\chi) \ d\chi.\]
The the CDF should always be continuous and non-decreasing with \[\lim_{x\to -\infty} F(x) = 0 \: \text{ and } \: \lim_{x\to\infty} F(x) = 1.\]
Here’s the CDF for the exponential distribution with \(\lambda = 1\):
Mean and variance
I guess we already know mean and variance for discrete random variables:
- \(\displaystyle \mu = \sum_{i} x_i \ p(x_i)\) and
- \(\displaystyle \sigma^2 = \sum_{i} (x_i-\mu)^2 \ p(x_i)\).
There are analogous concepts for continuous random variables:
- \(\displaystyle \mu = \int_{-\infty}^{\infty} x \ f(x) \ dx\)
- \(\displaystyle \sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 \ f(x) \ dx\).
As with discrete distributions, the variance is the square of the standard deviation. Both variance and standard deviation are important:
- The standard deviation has the same units as the data
- The variance is simpler algebraically
- For example, the sum of the variance of independent random variables is additive.
Exponential mean and variance
The mean and variance of the exponential distribution can also be determined using integration by parts:
\[\mu = \int_0^{\infty} \lambda x e^{-\lambda x} \ dx = \frac{1}{\lambda}\] \[\sigma^2 = \int_0^{\infty} \lambda (x-1/\lambda)^2 e^{-\lambda x} \ dx = \frac{1}{\lambda^2}\]
As \(\lambda\) increases, the distribution becomes more concentrated near zero; thus, the mean and variance both decrease to zero.
Which integration techniques do we need to know??
- We all need to know basic integration techniques up to and including \(u\)-substitution.
- Changing bounds of integration in \(u\)-subs is of particular importance.
- We’ll need to understand improper integrals, since the most important continuous distributions are defined over unbounded intervals.
- It’s worth knowing about integration by parts but it’s unlikely that specific IBP problems would occur on a quiz or exam.
- It’s also worth knowing about numerical integration
- Probabilities computed with the normal distribution are computed with numerical techniques.