Measure Theory – B

Abstract Integration

Definition:
If $(\Omega, M, \mu)$ is a measure space, and $s:\Omega \rightarrow [0,\infty)$ is a measurable simple function of the form

$\begin{align*} s &= \sum_{i = 1}^{n} \alpha_i I_{A_i} \end{align*}$

where $\alpha_1,...,\alpha_n$ are the distinct values of $s$ , and if $E \in M$ , we define

$\begin{align*} \int_E sd\mu &= \sum_{i=1}^{n} \alpha_i \mu(A_i \cap E). \end{align*}$

If $f:X \rightarrow [0,\infty]$ is measurable, and $E \in M$ , we define}

$\begin{align*} \int_E f d\mu &= \sup \int_E sd\mu \end{align*}$

the supremum taken over all simple measurable functions $s$ such that $0 \leq s \leq f$ .

Theorem: Lebesgue’s Monotone Convergence Theorem]
Let $\set{f_n}$ be a sequence of measurable functions on $\Omega$ , and suppose that

$0 \leq f_1(x) \leq f_2(x) \leq ... \leq \infty$ for all $x \in \Omega$ ,
$f_n(x) \rightarrow f(x)$ as $n \rightarrow \infty$ , for all $x \in \Omega$ .
Then $f$ is measurable, and

$\begin{align*} \int_{\Omega} f_n d\mu \rightarrow \int _{\Omega} f d \mu \end{align*}$

Theorem:
If $f_n:\Omega \rightarrow [0,\infty]$ is measurable, for $n = 1,2,3,...$ , and

$\begin{align*} f(x) &= \sum_{n = 1}^{\infty}f_n(x) & (x\in \Omega) \\ \text{then,}\\ \int_\Omega f d\mu &= \sum_{n = 1}^{\infty} \int_{\Omega} f_nd\mu \end{align*}$

Theorem:[Fatou’s Lemma]
If $f_n: \Omega \rightarrow [0,\infty]$ is measurable, for each positive integer $n$ , then

$\begin{align*} \int_\Omega \left( \liminf_{n \rightarrow \infty} f_n \right)d\mu \leq \liminf_{n\rightarrow \infty}\int_{\Omega}f_nd\mu \end{align*}$

Definition:
We define $L^1(\mu)$ to be the collection of all complex measurable functions $f$ on $\Omega$ for which

$\begin{align*} \int_{\Omega}\abs{f}d\mu < \infty \end{align*}$

Theorem:[Lebesgue’s Dominated Convergence Theorem] Suppose $\set{f_n}$ is a sequence of complex measurable functions on $\Omega$ such that

$\begin{align*} f(x) &= \lim_{n \rightarrow \infty} f_n(x) \end{align*}$

exists for every $x \in \Omega$ . If there is a function $g \in L^1(\mu)$ such that

$\begin{align*} \abs{f_n(x)} &\leq g(x) & (n = 1,2,3...; x \in \Omega) \end{align*}$

then $f \in L^1(\mu)$ ,

$\lim_{n \rightarrow \infty} &\int_\Omega \abs{f_n - f} d\mu = 0$

and

$\lim_{n \rightarrow \infty} &\int_\Omega f_n d\mu = \int_\Omega fd\mu$

Discrete and Continuous Random Variables} Definition: A random variable

$\begin{align*} X: \Omega \longrightarrow E \end{align*}$

is a measurable function, where $(\Omega, \M, p)$ is a probability space, and $E$ is a measurable space. A random variable does not return a probability, it can be thought of as returning the result of an experiment where we do not know the result ahead of time. Example: How many outcomes are there if we throw five dice?\\ Solution: Let $E_i$ be the set of all possible outcomes for the $i$ th die, so $E_i = \set{1,2,3,4,5,6}$ . The number of outcomes of throwing five die equals the number of ways we can chose an element from each of the $E_i$ . So we get number of outcomes is $6^5.$

Definition: If $X$ is a random variable from $\Omega$ to $\mathbb{R}$ , then the function defined on $\mathbb{R}$ by $F(t) = P(X\leq t)$ is called the \textbf{distribution function} of $X$ . In the vein of measure theory we define the function as follows

$\begin{align*} F(t) &= \int^{t}_{-\infty} X(x)dp \end{align*}$

If $F$ is constant almost everywhere except at the points $a_i$ and we denote $p_i = F(a_i) - F(a_i -)$ , then the collection of $p_i's$ is the traditional probability mass function for a discrete random variable. We have the following properties of $F$ that hold regardless if $X$ is a continuous, discrete, or measurable random variable.

$F$ is non-decreasing;}
$\lim_{t\rightarrow \infty} F(t) = 1;$
$\lim_{t\rightarrow -\infty} F(t) = 0;$
$F$ is right-continuous;

Useful identities involving Distribution Functions:

If we want to calculate $P(X > a)$ we keep in mind that $P(X > a) = 1 - P(X \leq a)$ , so
$\begin{align*} P(X > a) &= 1 - F(a) \end{align*}$
To calculate $P(a < X \leq b), b > a$ , we notice that $\set{a < X \leq b} = \set{X \leq b} - \set{X \leq a}$ , since $\set{X \leq a} \subseteq \set{X \leq b}$ , so we have
$\begin{align*} P(a < X \leq b) = P(X \leq b) - P(X \leq a) = F(b) - F(a) \end{align*}$
If we want to calculate $P(X < a)$ , we need to see that due to the way that $F$ is defined, we need to use the set
$E_n &= \set{X \leq a - \frac{1}{n}}$

So then

$\lim_{n \rightarrow \infty} E_n &= \bigcup_{n = 1}^\infty \set{X \leq a - \frac{1}{n}} = \set{X < a}$

and since the probability function is a continuous function

$\begin{align*} \lim_{n \rightarrow \infty }P(X \leq a - \frac{1}{n}) &= P \left( \bigcup_{n = 1}^{\infty} \left\{ X \leq a - \frac{1}{n} \right\} \right) \\ &= P(X < a), \end{align*}$

and so

$\begin{align*} P(X < a) &= \lim_{n \rightarrow \infty} F \left( a - \frac{1}{n}\right) = F(a-). \end{align*}$

Where we use $F(a-)$ to denote the lefthand limit at the end.
To calculate $P(X \geq a)$ we have
$\begin{align*} P(X \geq a) &= 1 - F(a-) \end{align*}$
To calculate $P(X = a)$ we note that $\set{X = a} = \set{X \leq a} - \set{X < a},$ so
$\begin{align*} P(X = a) &= P(X \leq a) - P(X < a) \\ &= F(a) - F(a-) \end{align*}$

Expected Values and Variances

Definition: The expected value of a random variable $X$ defined on $(\Omega, M, p)$ , a probability space, is

$\begin{align*} E[X] &= \int_\Omega Xdp. \end{align*}$

We also the expected value the \textbf{mean}.

Theorem: Let $X:\Omega \rightarrow \mathbb{R}$ be a random variable on $(\Omega, M, p)$ and let $g:\mathbb{R}\rightarrow \mathbb{R}$ be a real valued function, then $g(X)$ is a random variable, with

$\begin{align*} E[g(X)] &= \int_{\Omega}g(X)dp \end{align*}$

Definition: Let $X$ be a random variable defined on $(\Omega, M, p)$ , a probability space, the variance ( $V[X]$ ) and standard deviation ( $\sigma_X$ ) are defined by

$\begin{align*} V[X] &= E\left[(X - E[X])^2\right] \\ \sigma_X &= \sqrt{E\left[(X - E[X])^2 \right]} \end{align*}$

respectively. \hfill\break \hspace{-1cm}\fbox{\parbox{\textwidth + 1cm}{ \begin{thm} $V[X] = E[X^2] - \left[ E[X] \right]^2$ \end{thm} }}

Theorem: Let $X$ be a random variable defined on $(\Omega, M, p)$ , a probability space with mean $\mu$ . Then $V[X] = 0$ if and only if $X$ is constant with probability 1.

Theorem: Let $X$ be a random variable defined on $(\Omega, M, p)$ , a probability space; then for constants $a,b$ we have

$\begin{align*} V[aX +b] &= a^2 V[X] \\ \sigma_{aX + b} &= \abs{a}\sigma_X \end{align*}$

Moment Generating Functions

Definition: Let $X$ and $Y$ be two random variables defined on $(\Omega, M, p)$ , a probability space and $\omega$ be a given point. If for all $t > 0$ ,

$\begin{align*} P \left( \abs{Y - \omega} \leq t \right) \leq P\left( \abs{X - \omega} \leq t \right) \end{align*}$

then we say that $X$ is more concentrated about $\omega$ than $Y$ .

If we let $\mu = E[X],\mu_X^{(r)} = E\left[ (X - \mu)^r \right]$ , these constructions lead to some important information about random variables we are working with,

$\mu$ – Expected Value
$\mu_X^{(2)}$ – Variance
$\mu_X^{(3)}/\sigma_X^3$ – Skewness, which is a measure of symmetry for a random variable, if it is negative the r.v. is skewed to the left, if it is positive it is skewed to the right, if it is zero then the r.v. is symmetric.
$\mu_X^{(4)}/\sigma_X^4$ – Kurtosis – which is a measure of relative flatness of the distribution. The standard normal distribution has kurtosis of 3, so if $X$ has kurtosis greater than 3 it is more peaked than the standard normal, and if it has kurtosis less than 3 is is flatter.