The Gamma Function is a particular form of integral that is commonly seen in probability problems:
\(\Gamma(\alpha) = \int_{0}^{\infty}x^{\alpha - 1}e^{-x}dx\)
The Gamma Function is computed as a factorial if \(\alpha\) is an integer:
\(\Gamma(\alpha) = (\alpha - 1)!\)
The exponent in the integral can be different:
\(\Gamma(\alpha + 1) = \alpha\Gamma(\alpha) = \int_{0}^{\infty}x^{\alpha}e^{-x}dx\)
Or in general:
\(\Gamma(\alpha + n) = (\alpha+n-1)(\alpha+n-2)...(\alpha+1)\alpha\Gamma(\alpha)\)
Joint Density for Independent Normal Variables
\(X\) is a normally distributed variable with mean, \(\mu\), and known variance, \(\sigma^2\):
\( X_1,..., X_n \overset{iid}{\sim} N(\mu, \sigma^2) \)
The joint density is:
\begin{align}
f_{\theta}(x_1,...,x_n) &= \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{1}{2\sigma^2}(x_i-\mu)^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} \prod_{i=1}^{n} e^{-\frac{1}{2\sigma^2}(x_i-\mu)^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i-\mu)^2} \\
\end{align}
In the exponent, use the following substitution \((x_i-\mu)^2 = x_i^2 - 2x_i\mu + \mu^2\):
\begin{align}
f_{\theta}(x_1,...,x_n) &= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i-\mu)^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 - 2x_i\mu + \mu^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 + \frac{1}{2\sigma^2} \sum_{i=1}^n 2x_i\mu -\frac{1}{2\sigma^2} \sum_{i=1}^n \mu^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 + \frac{\mu}{\sigma^2} \sum_{i=1}^n x_i -\frac{n\mu^2}{2\sigma^2}} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} e^{\frac{\mu}{\sigma^2} \sum_{i=1}^n x_i } e^{-\frac{n\mu^2}{2\sigma^2}} \\
\end{align}
Now set \( T= \sum_{i=1}^n x_i \), then re-arrange the expression to get:
\begin{align}
f_{\theta}(x_1,...,x_n) &= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} e^{\frac{\mu}{\sigma^2} \sum_{i=1}^n x_i } e^{-\frac{n\mu^2}{2\sigma^2}} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} e^{\frac{\mu}{\sigma^2} T } e^{-\frac{n\mu^2}{2\sigma^2}} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{n\mu^2}{2\sigma^2}} e^{\frac{\mu}{\sigma^2} T } e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} \\
\end{align}
This expression can be factorized into \( g_{\mu}(t) = (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{n\mu^2}{2\sigma^2}} e^{\frac{\mu}{\sigma^2} T } \) and \( h(x_1,...,x_n) = e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} \), so \(T\) is a sufficient statistic for \( \mu \).
\( X_1,..., X_n \overset{iid}{\sim} N(\mu, \sigma^2) \)
The joint density is:
\begin{align}
f_{\theta}(x_1,...,x_n) &= \prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{1}{2\sigma^2}(x_i-\mu)^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} \prod_{i=1}^{n} e^{-\frac{1}{2\sigma^2}(x_i-\mu)^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i-\mu)^2} \\
\end{align}
In the exponent, use the following substitution \((x_i-\mu)^2 = x_i^2 - 2x_i\mu + \mu^2\):
\begin{align}
f_{\theta}(x_1,...,x_n) &= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i-\mu)^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 - 2x_i\mu + \mu^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 + \frac{1}{2\sigma^2} \sum_{i=1}^n 2x_i\mu -\frac{1}{2\sigma^2} \sum_{i=1}^n \mu^2} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2 + \frac{\mu}{\sigma^2} \sum_{i=1}^n x_i -\frac{n\mu^2}{2\sigma^2}} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} e^{\frac{\mu}{\sigma^2} \sum_{i=1}^n x_i } e^{-\frac{n\mu^2}{2\sigma^2}} \\
\end{align}
Now set \( T= \sum_{i=1}^n x_i \), then re-arrange the expression to get:
\begin{align}
f_{\theta}(x_1,...,x_n) &= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} e^{\frac{\mu}{\sigma^2} \sum_{i=1}^n x_i } e^{-\frac{n\mu^2}{2\sigma^2}} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} e^{\frac{\mu}{\sigma^2} T } e^{-\frac{n\mu^2}{2\sigma^2}} \\
&= (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{n\mu^2}{2\sigma^2}} e^{\frac{\mu}{\sigma^2} T } e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} \\
\end{align}
This expression can be factorized into \( g_{\mu}(t) = (2 \pi \sigma^2)^\frac{n}{2} e^{-\frac{n\mu^2}{2\sigma^2}} e^{\frac{\mu}{\sigma^2} T } \) and \( h(x_1,...,x_n) = e^{-\frac{1}{2\sigma^2} \sum_{i=1}^n x_i^2} \), so \(T\) is a sufficient statistic for \( \mu \).
Components of the Mean Square Error of an Estimator
If you have a parameter \(\theta\) and an estimate of that parameter \(\hat\theta\), then the mean square error of the estimator is defined as:
$$
MSE[\hat\theta] = E\Big[(\hat\theta - \theta)^2\Big]
$$
The MSE can be broken into two component parts, the variance of the estimator, which measures the precision of the estimator, and the bias of the estimator, which measures the accuracy of the estimator:
\begin{align}
MSE[\hat\theta] &= E\Big[(\hat\theta - \theta)^2\Big] \\
&= E[(\hat\theta - \theta)(\hat\theta - \theta)] \\
&= E\Big[\hat{\theta^2} - 2\hat\theta\theta + \theta^2\Big] \\
&= E\Big[\hat{\theta^2}\Big] - E[2\hat\theta\theta] + E\Big[\theta^2\Big] \\
\end{align}
Since \(\theta\) is a parameter, \(E[\theta] = \theta\), so:
\begin{align}
MSE[\hat\theta] &= E\Big[\hat{\theta^2}\Big] - E[2\hat\theta\theta] + E\Big[\theta^2\Big] \\
&= E\Big[\hat{\theta^2}\Big] - 2\theta E[\hat\theta] + \theta^2 \\
\end{align}
By the definition of variance we know that:
\begin{align}
Var[\hat\theta] &= E\Big[\hat{\theta^2}\Big] - (E[\hat\theta])^2 \\
E\Big[\hat{\theta^2}\Big] &= Var[\hat\theta] + (E[\hat\theta])^2
\end{align}
so we can make a substitution for \(E\Big[\hat{\theta^2}\Big]\) is the equation for the MSE:
\begin{align}
MSE[\hat\theta] &= E\Big[\hat{\theta^2}\Big] - 2\theta E[\hat\theta] + \theta^2 \\
&= Var[\hat\theta] + (E[\hat\theta])^2 - 2\theta E[\hat\theta] + \theta^2 \\
&= Var[\hat\theta] + ((E[\hat\theta]) - \theta)^2 \\
&= Var[\hat\theta] + (B[\hat\theta])^2 \\
\end{align}
\(B[\hat\theta]\) is the bias in the estimator.
$$
MSE[\hat\theta] = E\Big[(\hat\theta - \theta)^2\Big]
$$
The MSE can be broken into two component parts, the variance of the estimator, which measures the precision of the estimator, and the bias of the estimator, which measures the accuracy of the estimator:
\begin{align}
MSE[\hat\theta] &= E\Big[(\hat\theta - \theta)^2\Big] \\
&= E[(\hat\theta - \theta)(\hat\theta - \theta)] \\
&= E\Big[\hat{\theta^2} - 2\hat\theta\theta + \theta^2\Big] \\
&= E\Big[\hat{\theta^2}\Big] - E[2\hat\theta\theta] + E\Big[\theta^2\Big] \\
\end{align}
Since \(\theta\) is a parameter, \(E[\theta] = \theta\), so:
\begin{align}
MSE[\hat\theta] &= E\Big[\hat{\theta^2}\Big] - E[2\hat\theta\theta] + E\Big[\theta^2\Big] \\
&= E\Big[\hat{\theta^2}\Big] - 2\theta E[\hat\theta] + \theta^2 \\
\end{align}
By the definition of variance we know that:
\begin{align}
Var[\hat\theta] &= E\Big[\hat{\theta^2}\Big] - (E[\hat\theta])^2 \\
E\Big[\hat{\theta^2}\Big] &= Var[\hat\theta] + (E[\hat\theta])^2
\end{align}
so we can make a substitution for \(E\Big[\hat{\theta^2}\Big]\) is the equation for the MSE:
\begin{align}
MSE[\hat\theta] &= E\Big[\hat{\theta^2}\Big] - 2\theta E[\hat\theta] + \theta^2 \\
&= Var[\hat\theta] + (E[\hat\theta])^2 - 2\theta E[\hat\theta] + \theta^2 \\
&= Var[\hat\theta] + ((E[\hat\theta]) - \theta)^2 \\
&= Var[\hat\theta] + (B[\hat\theta])^2 \\
\end{align}
\(B[\hat\theta]\) is the bias in the estimator.
MLE for Beta(1, theta)
This shows how to find the maximum likelihood estimator for \(\theta\) for a \(Beta(1, \theta)\) random variable.
The data distribution is:
$$
X \sim Beta(1, \theta) \\
$$
The likelihood function is:
L(\theta | X_1, ..., X_n) &= \prod_{i = 1}^{n} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}X_i^{\alpha - 1}(1 - X_i)^{\beta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}X_i^{1 - 1}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}X_i^{1 - 1}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}X_i^{0}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)} (1 - X_i)^{\theta - 1} \\
\end{align}
Note that \(\Gamma(1) = (1 - 1)! = 0! = 1\),
and \(\frac{\Gamma(1 + \theta)}{\Gamma(\theta)} = \frac{(1 + \theta - 1)!}{(\theta - 1)!} = \frac{(\theta)!}{(\theta - 1)!}= \theta \),
so \( \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)} = \theta \).
\begin{align}
L(\theta | X_1, ..., X_n) &= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \theta (1 - X_i)^{\theta - 1} \\
&= \theta^n \prod_{i = 1}^{n} (1 - X_i)^{\theta - 1} \\
\end{align}
The log-likelihood function is:
\begin{align}
l(\theta | X_1, ..., X_n) &= nlog(\theta) + (\theta - 1)\sum_{i = 1}^{n}log(1 - X_i) \\
&= nlog(\theta) + \theta \sum_{i = 1}^{n}log(1 - X_i) - \sum_{i = 1}^{n}log(1 - X_i) \\
\end{align}
Note that the \(\theta\) does not appear in the last term, so the partial derivative with respect to \(\theta\) will have only two terms.
The first partial derivative of the log-likelihood function is:
$$
\frac{\partial l}{\partial \theta} = \frac{n}{\theta} + \sum_{i = 1}^{n}log(1 - X_i) \\
$$
The maximum likelihood estimator for \(\theta\) is obtained by setting this equal to zero and then solving for \(\theta\):
\begin{align}
0 &= \frac{n}{\theta} + \sum_{i = 1}^{n}log(1 - X_i) \\
-\frac{n}{\theta} &= \sum_{i = 1}^{n}log(1 - X_i) \\
-n &= \theta \sum_{i = 1}^{n}log(1 - X_i) \\
\hat{\theta} &= \frac{-n}{\sum_{i = 1}^{n}log(1 - X_i)} \\
\end{align}
The second partial derivative with respect to \(\theta\) of the log-likelihood function is:
$$
\frac{\partial^{2} l}{\partial \theta^2} = -\frac{n}{\theta^2} \\
$$
The Fisher Information is:
$$
I(\theta) = -E\Bigg[\frac{\partial^{2} l}{\partial \theta^2}\Bigg] = -E\Bigg[-\frac{n}{\theta^2}\Bigg] = \frac{n}{\theta^2}
$$
The variance of the maximum likelihood estimator is:
$$
V[\hat{\theta}] = \frac{1}{I(\theta)} = \frac{\theta^2}{n}
$$
The data distribution is:
$$
X \sim Beta(1, \theta) \\
$$
The likelihood function is:
\begin{align} L(\theta | X_1, ..., X_n) &= \prod_{i = 1}^{n} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}X_i^{\alpha - 1}(1 - X_i)^{\beta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}X_i^{1 - 1}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}X_i^{1 - 1}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}X_i^{0}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)} (1 - X_i)^{\theta - 1} \\
\end{align}
Note that \(\Gamma(1) = (1 - 1)! = 0! = 1\),
and \(\frac{\Gamma(1 + \theta)}{\Gamma(\theta)} = \frac{(1 + \theta - 1)!}{(\theta - 1)!} = \frac{(\theta)!}{(\theta - 1)!}= \theta \),
so \( \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)} = \theta \).
\begin{align}
L(\theta | X_1, ..., X_n) &= \prod_{i = 1}^{n} \frac{\Gamma(1 + \theta)}{\Gamma(1)\Gamma(\theta)}(1 - X_i)^{\theta - 1} \\
&= \prod_{i = 1}^{n} \theta (1 - X_i)^{\theta - 1} \\
&= \theta^n \prod_{i = 1}^{n} (1 - X_i)^{\theta - 1} \\
\end{align}
The log-likelihood function is:
\begin{align}
l(\theta | X_1, ..., X_n) &= nlog(\theta) + (\theta - 1)\sum_{i = 1}^{n}log(1 - X_i) \\
&= nlog(\theta) + \theta \sum_{i = 1}^{n}log(1 - X_i) - \sum_{i = 1}^{n}log(1 - X_i) \\
\end{align}
Note that the \(\theta\) does not appear in the last term, so the partial derivative with respect to \(\theta\) will have only two terms.
The first partial derivative of the log-likelihood function is:
$$
\frac{\partial l}{\partial \theta} = \frac{n}{\theta} + \sum_{i = 1}^{n}log(1 - X_i) \\
$$
The maximum likelihood estimator for \(\theta\) is obtained by setting this equal to zero and then solving for \(\theta\):
\begin{align}
0 &= \frac{n}{\theta} + \sum_{i = 1}^{n}log(1 - X_i) \\
-\frac{n}{\theta} &= \sum_{i = 1}^{n}log(1 - X_i) \\
-n &= \theta \sum_{i = 1}^{n}log(1 - X_i) \\
\hat{\theta} &= \frac{-n}{\sum_{i = 1}^{n}log(1 - X_i)} \\
\end{align}
The second partial derivative with respect to \(\theta\) of the log-likelihood function is:
$$
\frac{\partial^{2} l}{\partial \theta^2} = -\frac{n}{\theta^2} \\
$$
The Fisher Information is:
$$
I(\theta) = -E\Bigg[\frac{\partial^{2} l}{\partial \theta^2}\Bigg] = -E\Bigg[-\frac{n}{\theta^2}\Bigg] = \frac{n}{\theta^2}
$$
The variance of the maximum likelihood estimator is:
$$
V[\hat{\theta}] = \frac{1}{I(\theta)} = \frac{\theta^2}{n}
$$
Use LATEX in Blogger
To use LATEX in Blogger just enable MathJax by adding this code:
<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js">
MathJax.Hub.Config({
extensions: ["tex2jax.js","TeX/AMSmath.js","TeX/AMSsymbols.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
after the header (<head>) in the Blogger template.
To edit the template follow these directions:
- Sign in to Blogger.
- Choose the blog to update.
- In the left menu, click Theme.
- Under “Live on Blog,” click Edit HTML.
- Make the changes you want.
- Click Save theme.
Subscribe to:
Posts (Atom)
Some Gamma Function Notes
The Gamma Function is a particular form of integral that is commonly seen in probability problems: \(\Gamma(\alpha) = \int_{0}^{\infty}x^{\...
-
This shows how to find the maximum likelihood estimator for \(\theta\) for a \(Beta(1, \theta)\) random variable. The data distribution is...
-
\(X\) is a normally distributed variable with mean, \(\mu\), and known variance, \(\sigma^2\): \( X_1,..., X_n \overset{iid}{\sim} N(\mu, \...