Skip to content

Solution for Assignment 2 #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
244 changes: 244 additions & 0 deletions Tanmay/ProbStats1.ipynb

Large diffs are not rendered by default.

170 changes: 170 additions & 0 deletions Tanmay/ProbStats2.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "49cee8ee",
"metadata": {},
"source": [
"#Q1\n",
"\n",
"Bayes' Formula for upadation of probability: \n",
"$$\n",
"P(H|D) = \\dfrac{P(D|H)P(H)}{P(D)}\n",
"$$\n",
"\n",
"The Prior: P(Expert) = 0.01\n",
"Likelihoods: P(3 Bullseyes in 5 throws | Expert) can be computed as $(P(Bullseye | Expert))^3$ = $(0.7)^3(0.3)^2$, since each throw is independent. If he's not an expert, it's $(0.1)^3(0.9)^2$.\n",
"\n",
"Hence, we use the Bayesian Update,\n",
"\n",
"$$\n",
"P(Expert|3\\;Bullseyes\\;in\\;5\\;throws) = \\dfrac{P(3\\;Bullseyes\\;in\\;5\\;throws|Expert)P(Expert)}{P(3\\;Bullseyes\\;in\\;5\\;throws)}\\\\\n",
"= \\dfrac{(0.7)^3\\times(0.3)^2\\times(0.01)}{(0.1)^3\\times(0.9)^2\\times0.99 + (0.7)^3\\times(0.3)^2\n",
"\\times0.01}\\\\\n",
"\\approx 0.27795\n",
"$$\n",
"\n",
"The probability goes from 1% to $\\approx$ 28% based on his performance being way better than what would be expected of an average person.\n",
"If our prior was 20% instead of 1%, our posterior would grow to 0.9050 or $\\approx$ 90.5% since the prior data informs the posterior. Our higher belief in the original hypothesis increases our probability of it being true. "
]
},
{
"cell_type": "markdown",
"id": "78e06a3d",
"metadata": {},
"source": [
"#Q2\n",
"\n",
"Our set of times: ${T_1, T_2, ..., T_n}$\n",
"\n",
"Given that $T_i > 10$ and $T \\sim Exp(\\lambda)$.\n",
"\n",
"Then we must change the stnadard exponential PDF such that $\\int_{10}^{\\infty} k \\times f_T(t) \\: \\mathrm{d}t = 1$.<br/>\n",
"This gives us $g_T(t) = \\dfrac{f_X(x)}{F(\\infty) - F(10)} = \\lambda {\\mathrm{e}}^{-\\lambda(t - 10)}\\\\$\n",
"$$\n",
"L(\\lambda) = {\\lambda}^n \\prod_{i=1}^{n} {\\mathrm{e}}^{-\\lambda(T_i - 10)}\\\\\n",
"l(\\lambda) = \\log (L(\\lambda)) = n\\log (\\lambda) - \\lambda \\sum_{i = 1}^{n} (T_i - 10)\\\\\n",
"$$\n",
"Differentiating this wrt $\\lambda$,\n",
"$$\n",
"\\dfrac{n}{\\lambda} - \\sum_{i = 1}^{n} (T_i - 10) = 0\\\\\n",
"\\hat {\\lambda} = \\dfrac{n}{\\sum_{i = 1}^{n} (T_i - 10)}\\\\\n",
"$$\n",
"\n",
"Ignoring truncation would ofcourse give an incorrect pdf, which would represent the distribution of data not as it happened. Our model would then have a region of the distribution without any observations, and this would take away from the $T_i > 10$ region, giving waiting times less than they're supposed to be.\n",
"\n",
"If the device had a little bit of uncertainty that makes it sometimes start later than 10 minutes, we could model that uncertainty to get better predictions, probably as part of a mixed distribution of the waiting time where the point of truncation is variable.\n",
"\n",
"\n",
"Now if we give $\\lambda$ a $\\gamma (a, b)$ prior, the following changes:\n",
"\n",
"$\\hat \\lambda$ is $\\lambda$ that maximises $f_{Data|\\lambda}(data|t)f_{\\lambda}(t)$, where the former is the likelihood and the latter is the prior.\n",
"\n",
"The MLE only maximises the likelihood. We can use the likelihood function from our previous calculation.\n",
"\n",
"$$\n",
"f_{Data|\\lambda}(data|t)f_{\\lambda}(t) \\propto {\\lambda}^{n + \\alpha - 1}{{\\mathrm{e}}^{({\\dfrac{-\\lambda}{\\beta}})}}{\\prod_{i=1}^{n} {\\mathrm{e}}^{-\\lambda(T_i - 10)}}\n",
"$$\n",
"Taking a logarithm and differentiating, then equating to zero gives us the value:\n",
"\n",
"$\\hat \\lambda_{MAP} = \\dfrac {n + \\alpha - 1}{\\dfrac{1}{\\beta} + \\sum_{i = 1}^{n} (T_i - 10)}$\n",
"\n",
"The difference between $\\hat \\lambda_{MAP}$ and $\\hat \\lambda_{MLE}$ is $\\propto \\alpha,\\;\\beta$.\n",
"This is higher when the historical data is *very* different from the current model.\n",
"\n",
"The prior acts as said historical data, it pulls MAP towards itself, especially if data is scarce.\n",
"We should prefer MAP over MLE when in the data collected so far is low."
]
},
{
"cell_type": "markdown",
"id": "b0a360a0",
"metadata": {},
"source": [
"#Q3\n",
"\n",
"$$\n",
"D_{{KL}}(P \\parallel Q) = \\sum_{i=1}^k p_i \\log \\left( \\frac{p_i}{q_i} \\right)\n",
"$$\n",
"\n",
"We want to show that:\n",
"\n",
"$$\n",
"D_{{KL}}(P \\parallel Q) \\geq 0\n",
"$$\n",
"\n",
"The logarithm function is strictly concave. By Jensen's inequality:\n",
"\n",
"$$\n",
"\\sum_{i=1}^k p_i \\log \\left( \\frac{q_i}{p_i} \\right) \\leq \\log \\left( \\sum_{i=1}^k p_i \\cdot \\frac{q_i}{p_i} \\right) = \\log \\left( \\sum_{i=1}^k q_i \\right) = \\log(1) = 0\n",
"$$\n",
"\n",
"Multiplying both sides by \\(-1\\), we obtain:\n",
"\n",
"$$\n",
"\\sum_{i=1}^k p_i \\log \\left( \\frac{p_i}{q_i} \\right) \\geq 0\n",
"$$\n",
"\n",
"Thus,\n",
"\n",
"$$\n",
"D_{{KL}}(P \\parallel Q) \\geq 0\n",
"$$\n",
"\n",
"\n",
"\n",
"When is \\( D_{\\text{KL}}(P \\parallel Q) = 0 \\) ?\n",
"\n",
"This occurs if and only if:\n",
"\n",
"$$\n",
"p_i = q_i \\quad \\text{for all } i\n",
"$$\n",
"\n",
"That is, \\( P = Q \\). This follows from the strict convexity of the KL divergence.\n",
"\n",
"\n",
"\n",
"Connection to Cross-Entropy\n",
"\n",
"The cross-entropy between distributions \\( P \\) and \\( Q \\) is defined as:\n",
"\n",
"$$\n",
"H(P, Q) = - \\sum_{i=1}^k p_i \\log(q_i)\n",
"$$\n",
"\n",
"The entropy of \\( P \\) is:\n",
"\n",
"$$\n",
"H(P) = - \\sum_{i=1}^k p_i \\log(p_i)\n",
"$$\n",
"\n",
"We can express the KL divergence in terms of entropy and cross-entropy:\n",
"\n",
"$$\n",
"D_{\\text{KL}}(P \\parallel Q) = \\sum_{i=1}^k p_i \\log \\left( \\frac{p_i}{q_i} \\right) = \\sum_{i=1}^k p_i \\log(p_i) - \\sum_{i=1}^k p_i \\log(q_i)\n",
"$$\n",
"\n",
"$$\n",
"D_{\\text{KL}}(P \\parallel Q) = -H(P) + H(P, Q)\n",
"$$\n",
"\n",
"\n",
"Minimizing \\( D_{\\text{KL}}(P \\parallel Q) \\) with respect to \\( Q \\) is equivalent to minimizing the cross-entropy \\( H(P, Q) \\), up to the constant \\( H(P) \\), which depends only on \\( P \\)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.13.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}