{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "11100c44",
   "metadata": {},
   "source": [
    "# Maximum Likelihood Estimation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "346eb02c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "from scipy.stats import lognorm, pareto, expon\n",
    "import numpy as np\n",
    "from scipy.integrate import quad\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "from math import exp"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a0da218",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "\n",
    "Consider a situation where a policymaker is trying to estimate how much revenue\n",
    "a proposed wealth tax will raise.\n",
    "\n",
    "The proposed tax is\n",
    "\n",
    "$$\n",
    "h(w) = \n",
    "    \\begin{cases}\n",
    "    a w                       & \\text{if } w \\leq \\bar w  \\\\\n",
    "    a \\bar{w} + b (w-\\bar{w}) & \\text{if } w > \\bar w  \n",
    "    \\end{cases}\n",
    "$$\n",
    "\n",
    "where $ w $ is wealth."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ecd6228",
   "metadata": {},
   "source": [
    "## \n",
    "\n",
    "For example, if $ a = 0.05 $, $ b = 0.1 $, and $ \\bar w = 2.5 $, this means\n",
    "\n",
    "- a 5% tax on wealth up to 2.5 and  \n",
    "- a 10% tax on wealth in excess of 2.5.  \n",
    "\n",
    "\n",
    "The unit is 100,000, so $ w= 2.5 $ means 250,000 dollars.\n",
    "\n",
    "Let’s go ahead and define $ h $:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8a52c70",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def h(w, a=0.05, b=0.1, w_bar=2.5):\n",
    "    if w <= w_bar:\n",
    "        return a * w\n",
    "    else:\n",
    "        return a * w_bar + b * (w - w_bar)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0fd3927",
   "metadata": {},
   "source": [
    "For a population of size $ N $, where individual $ i $ has wealth $ w_i $, total revenue raised by\n",
    "the tax will be\n",
    "\n",
    "$$\n",
    "T = \\sum_{i=1}^{N} h(w_i)\n",
    "$$\n",
    "\n",
    "We wish to calculate this quantity.\n",
    "\n",
    "The problem we face is that, in most countries, wealth is not observed for all individuals.\n",
    "\n",
    "Collecting and maintaining accurate wealth data for all individuals or households in a country\n",
    "is just too hard.\n",
    "\n",
    "So let’s suppose instead that we obtain a sample $ w_1, w_2, \\cdots, w_n $ telling us the wealth of $ n $ randomly selected individuals.\n",
    "\n",
    "For our exercise we are going to use a sample of $ n = 10,000 $ observations from wealth data in the US in 2016."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fbcfe7cc",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "n = 10_000"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0b0a69e",
   "metadata": {},
   "source": [
    "The data is derived from the\n",
    "[Survey of Consumer Finances](https://en.wikipedia.org/wiki/Survey_of_Consumer_Finances) (SCF).\n",
    "\n",
    "The following code imports this data  and reads it into an array called `sample`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1dcaea91",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "url = 'https://media.githubusercontent.com/media/QuantEcon/high_dim_data/update_scf_noweights/SCF_plus/SCF_plus_mini_no_weights.csv'\n",
    "df = pd.read_csv(url)\n",
    "df = df.dropna()\n",
    "df = df[df['year'] == 2016]\n",
    "df = df.loc[df['n_wealth'] > 1 ]   #restrcting data to net worth > 1\n",
    "rv = df['n_wealth'].sample(n=n, random_state=1234)\n",
    "rv = rv.to_numpy() / 100_000\n",
    "sample = rv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65155af5",
   "metadata": {},
   "source": [
    "Let’s histogram this sample."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8051d8f7",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(-1, 20)\n",
    "density, edges = np.histogram(sample, bins=5000, density=True)\n",
    "prob = density * np.diff(edges)\n",
    "plt.stairs(prob, edges, fill=True, alpha=0.8, label=r\"unit: $\\$100,000$\")\n",
    "plt.ylabel(\"prob\")\n",
    "plt.xlabel(\"net wealth\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb79d229",
   "metadata": {},
   "source": [
    "The histogram shows that many people have very low wealth and a few people have\n",
    "very high wealth.\n",
    "\n",
    "We will take the full population size to be"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4671de62",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "N = 100_000_000"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1c45716",
   "metadata": {},
   "source": [
    "How can we estimate total revenue from the full population using only the sample data?\n",
    "\n",
    "Our plan is to assume that wealth of each individual is a draw from a distribution with density $ f $.\n",
    "\n",
    "If we obtain an estimate of $ f $ we can then approximate $ T $ as follows:\n",
    "\n",
    "\n",
    "<a id='equation-eq-est-rev'></a>\n",
    "$$\n",
    "T = \\sum_{i=1}^{N} h(w_i) \n",
    "      = N \\frac{1}{N} \\sum_{i=1}^{N} h(w_i) \n",
    "      \\approx N \\int_{0}^{\\infty} h(w)f(w) dw \\tag{46.1}\n",
    "$$\n",
    "\n",
    "(The sample mean should be close to the mean by the law of large numbers.)\n",
    "\n",
    "The problem now is: how do we estimate $ f $?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b863e22",
   "metadata": {},
   "source": [
    "## Maximum likelihood estimation\n",
    "\n",
    "[Maximum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation)\n",
    "is a method of estimating an unknown distribution.\n",
    "\n",
    "Maximum likelihood estimation has two steps:\n",
    "\n",
    "1. Guess what the underlying distribution is (e.g., normal with mean $ \\mu $ and\n",
    "  standard deviation $ \\sigma $).  \n",
    "1. Estimate the parameter values (e.g., estimate $ \\mu $ and $ \\sigma $ for the\n",
    "  normal distribution)  \n",
    "\n",
    "\n",
    "One possible assumption for the wealth is that each\n",
    "$ w_i $ is [log-normally distributed](https://en.wikipedia.org/wiki/Log-normal_distribution),\n",
    "with parameters $ \\mu \\in (-\\infty,\\infty) $ and $ \\sigma \\in (0,\\infty) $.\n",
    "\n",
    "(This means that $ \\ln w_i $ is normally distributed with mean $ \\mu $ and standard deviation $ \\sigma $.)\n",
    "\n",
    "You can see that this assumption is not completely unreasonable because, if we\n",
    "histogram log wealth instead of wealth, the picture starts to look something\n",
    "like a bell-shaped curve."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2babe495",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ln_sample = np.log(sample)\n",
    "fig, ax = plt.subplots()\n",
    "ax.hist(ln_sample, density=True, bins=200, histtype='stepfilled', alpha=0.8)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ec51a82",
   "metadata": {},
   "source": [
    "Now our job is to obtain the maximum likelihood estimates of $ \\mu $ and $ \\sigma $, which\n",
    "we denote by $ \\hat{\\mu} $ and $ \\hat{\\sigma} $.\n",
    "\n",
    "These estimates can be found by maximizing the likelihood function given the\n",
    "data.\n",
    "\n",
    "The pdf of a lognormally distributed random variable $ X $ is given by:\n",
    "\n",
    "$$\n",
    "f(x, \\mu, \\sigma) \n",
    "    = \\frac{1}{x}\\frac{1}{\\sigma \\sqrt{2\\pi}} \n",
    "    \\exp\\left(\\frac{-1}{2}\\left(\\frac{\\ln x-\\mu}{\\sigma}\\right)\\right)^2\n",
    "$$\n",
    "\n",
    "For our sample $ w_1, w_2, \\cdots, w_n $, the [likelihood function](https://en.wikipedia.org/wiki/Likelihood_function) is given by\n",
    "\n",
    "$$\n",
    "L(\\mu, \\sigma | w_i) = \\prod_{i=1}^{n} f(w_i, \\mu, \\sigma)\n",
    "$$\n",
    "\n",
    "The likelihood function can be viewed as both\n",
    "\n",
    "- the joint distribution of the sample (which is assumed to be IID) and  \n",
    "- the “likelihood” of parameters $ (\\mu, \\sigma) $ given the data.  \n",
    "\n",
    "\n",
    "Taking logs on both sides gives us the log likelihood function, which is\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "    \\ell(\\mu, \\sigma | w_i) \n",
    "    & = \\ln \\left[ \\prod_{i=1}^{n} f(w_i, \\mu, \\sigma) \\right] \\\\\n",
    "    & = -\\sum_{i=1}^{n} \\ln w_i \n",
    "        - \\frac{n}{2} \\ln(2\\pi) - \\frac{n}{2} \\ln \\sigma^2 - \\frac{1}{2\\sigma^2}\n",
    "            \\sum_{i=1}^n (\\ln w_i - \\mu)^2\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "To find where this function is maximised we find its partial derivatives wrt $ \\mu $ and $ \\sigma ^2 $ and equate them to $ 0 $.\n",
    "\n",
    "Let’s first find the maximum likelihood estimate (MLE) of $ \\mu $\n",
    "\n",
    "$$\n",
    "\\frac{\\delta \\ell}{\\delta \\mu} \n",
    "    = - \\frac{1}{2\\sigma^2} \\times 2 \\sum_{i=1}^n (\\ln w_i - \\mu) = 0 \\\\\n",
    "\\implies \\sum_{i=1}^n \\ln w_i - n \\mu = 0 \\\\\n",
    "\\implies \\hat{\\mu} = \\frac{\\sum_{i=1}^n \\ln w_i}{n}\n",
    "$$\n",
    "\n",
    "Now let’s find the MLE of $ \\sigma $\n",
    "\n",
    "$$\n",
    "\\frac{\\delta \\ell}{\\delta \\sigma^2} \n",
    "    = - \\frac{n}{2\\sigma^2} + \\frac{1}{2\\sigma^4} \n",
    "    \\sum_{i=1}^n (\\ln w_i - \\mu)^2 = 0 \\\\\n",
    "    \\implies \\frac{n}{2\\sigma^2} = \n",
    "    \\frac{1}{2\\sigma^4} \\sum_{i=1}^n (\\ln w_i - \\mu)^2 \\\\\n",
    "    \\implies \\hat{\\sigma} = \n",
    "    \\left( \\frac{\\sum_{i=1}^{n}(\\ln w_i - \\hat{\\mu})^2}{n} \\right)^{1/2}\n",
    "$$\n",
    "\n",
    "Now that we have derived the expressions for $ \\hat{\\mu} $ and $ \\hat{\\sigma} $,\n",
    "let’s compute them for our wealth sample."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe12771e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ_hat = np.mean(ln_sample)\n",
    "μ_hat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5ce0b58",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "num = (ln_sample - μ_hat)**2\n",
    "σ_hat = (np.mean(num))**(1/2)\n",
    "σ_hat"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73cbffc1",
   "metadata": {},
   "source": [
    "Let’s plot the lognormal pdf using the estimated parameters against our sample data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a417f71e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "dist_lognorm = lognorm(σ_hat, scale = exp(μ_hat))\n",
    "x = np.linspace(0,50,10000)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(-1,20)\n",
    "\n",
    "ax.hist(sample, density=True, bins=5_000, histtype='stepfilled', alpha=0.5)\n",
    "ax.plot(x, dist_lognorm.pdf(x), 'k-', lw=0.5, label='lognormal pdf')\n",
    "ax.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d034d2ef",
   "metadata": {},
   "source": [
    "Our estimated lognormal distribution appears to be a reasonable fit for the overall data.\n",
    "\n",
    "We now use [(46.1)](#equation-eq-est-rev) to calculate total revenue.\n",
    "\n",
    "We will compute the integral using numerical integration via SciPy’s\n",
    "[quad](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quad.html)\n",
    "function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aefa773f",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def total_revenue(dist):\n",
    "    integral, _ = quad(lambda x: h(x) * dist.pdf(x), 0, 100_000)\n",
    "    T = N * integral\n",
    "    return T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0efe98c2",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "tr_lognorm = total_revenue(dist_lognorm)\n",
    "tr_lognorm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4618a8be",
   "metadata": {},
   "source": [
    "(Our unit was 100,000 dollars, so this means that actual revenue is 100,000\n",
    "times as large.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad7112e0",
   "metadata": {},
   "source": [
    "## Pareto distribution\n",
    "\n",
    "We mentioned above that using maximum likelihood estimation requires us to make\n",
    "a prior assumption of the underlying distribution.\n",
    "\n",
    "Previously we assumed that the distribution is lognormal.\n",
    "\n",
    "Suppose instead we assume that $ w_i $ are drawn from the\n",
    "[Pareto Distribution](https://en.wikipedia.org/wiki/Pareto_distribution)\n",
    "with parameters $ b $ and $ x_m $.\n",
    "\n",
    "In this case, the maximum likelihood estimates are known to be\n",
    "\n",
    "$$\n",
    "\\hat{b} = \\frac{n}{\\sum_{i=1}^{n} \\ln (w_i/\\hat{x_m})}\n",
    "    \\quad \\text{and} \\quad\n",
    "    \\hat{x}_m = \\min_{i} w_i\n",
    "$$\n",
    "\n",
    "Let’s calculate them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cfdd7772",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "xm_hat = min(sample)\n",
    "xm_hat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a026b166",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "den = np.log(sample/xm_hat)\n",
    "b_hat = 1/np.mean(den)\n",
    "b_hat"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7af5e0c4",
   "metadata": {},
   "source": [
    "Now let’s recompute total revenue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c50894a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "dist_pareto = pareto(b = b_hat, scale = xm_hat)\n",
    "tr_pareto = total_revenue(dist_pareto) \n",
    "tr_pareto"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9584c15",
   "metadata": {},
   "source": [
    "The number is very different!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27efa457",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "tr_pareto / tr_lognorm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fe2cb45",
   "metadata": {},
   "source": [
    "We see that choosing the right distribution is extremely important.\n",
    "\n",
    "Let’s compare the fitted Pareto distribution to the histogram:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b104ff02",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(-1, 20)\n",
    "ax.set_ylim(0,1.75)\n",
    "\n",
    "ax.hist(sample, density=True, bins=5_000, histtype='stepfilled', alpha=0.5)\n",
    "ax.plot(x, dist_pareto.pdf(x), 'k-', lw=0.5, label='Pareto pdf')\n",
    "ax.legend()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1726035c",
   "metadata": {},
   "source": [
    "We observe that in this case the fit for the Pareto distribution is not very\n",
    "good, so we can probably reject it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a13cde0",
   "metadata": {},
   "source": [
    "## What is the best distribution?\n",
    "\n",
    "There is no “best” distribution — every choice we make is an assumption.\n",
    "\n",
    "All we can do is try to pick a distribution that fits the data well.\n",
    "\n",
    "The plots above suggested that the lognormal distribution is optimal.\n",
    "\n",
    "However when we inspect the upper tail (the richest people), the Pareto distribution may be a better fit.\n",
    "\n",
    "To see this, let’s now set a minimum threshold of net worth in our dataset.\n",
    "\n",
    "We set an arbitrary threshold of \\$500,000 and read the data into `sample_tail`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbc80e7b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "df_tail = df.loc[df['n_wealth'] > 500_000 ]\n",
    "df_tail.head()\n",
    "rv_tail = df_tail['n_wealth'].sample(n=10_000, random_state=4321)\n",
    "rv_tail = rv_tail.to_numpy()\n",
    "sample_tail = rv_tail/500_000"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7daaa8c7",
   "metadata": {},
   "source": [
    "Let’s plot this data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28b4e7e1",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(0,50)\n",
    "ax.hist(sample_tail, density=True, bins=500, histtype='stepfilled', alpha=0.8)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "773797ed",
   "metadata": {},
   "source": [
    "Now let’s try fitting some distributions to this data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4709c03a",
   "metadata": {},
   "source": [
    "### Lognormal distribution for the right hand tail\n",
    "\n",
    "Let’s start with the lognormal distribution\n",
    "\n",
    "We estimate the parameters again and plot the density against our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40b17bf6",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ln_sample_tail = np.log(sample_tail)\n",
    "μ_hat_tail = np.mean(ln_sample_tail)\n",
    "num_tail = (ln_sample_tail - μ_hat_tail)**2\n",
    "σ_hat_tail = (np.mean(num_tail))**(1/2)\n",
    "dist_lognorm_tail = lognorm(σ_hat_tail, scale = exp(μ_hat_tail))\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(0,50)\n",
    "ax.hist(sample_tail, density=True, bins=500, histtype='stepfilled', alpha=0.5)\n",
    "ax.plot(x, dist_lognorm_tail.pdf(x), 'k-', lw=0.5, label='lognormal pdf')\n",
    "ax.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95dd6eaa",
   "metadata": {},
   "source": [
    "While the lognormal distribution was a good fit for the entire dataset,\n",
    "it is not a good fit for the right hand tail."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e88cbe4",
   "metadata": {},
   "source": [
    "### Pareto distribution for the right hand tail\n",
    "\n",
    "Let’s now assume the truncated dataset has a Pareto distribution.\n",
    "\n",
    "We estimate the parameters again and plot the density against our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d04f5dc",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "xm_hat_tail = min(sample_tail)\n",
    "den_tail = np.log(sample_tail/xm_hat_tail)\n",
    "b_hat_tail = 1/np.mean(den_tail)\n",
    "dist_pareto_tail = pareto(b = b_hat_tail, scale = xm_hat_tail)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(0, 50)\n",
    "ax.set_ylim(0,0.65)\n",
    "ax.hist(sample_tail, density=True, bins= 500, histtype='stepfilled', alpha=0.5)\n",
    "ax.plot(x, dist_pareto_tail.pdf(x), 'k-', lw=0.5, label='pareto pdf')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "439aaac6",
   "metadata": {},
   "source": [
    "The Pareto distribution is a better fit for the right hand tail of our dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0d3cbd5",
   "metadata": {},
   "source": [
    "### So what is the best distribution?\n",
    "\n",
    "As we said above, there is no “best” distribution — each choice is an\n",
    "assumption.\n",
    "\n",
    "We just have to test what we think are reasonable distributions.\n",
    "\n",
    "One test is to plot the data against the fitted distribution, as we did.\n",
    "\n",
    "There are other more rigorous tests, such as the [Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).\n",
    "\n",
    "We omit such advanced topics (but encourage readers to study them once\n",
    "they have completed these lectures)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16bf9e0c",
   "metadata": {},
   "source": [
    "## Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fadf7437",
   "metadata": {},
   "source": [
    "## Exercise 46.1\n",
    "\n",
    "Suppose we assume wealth is [exponentially](https://en.wikipedia.org/wiki/Exponential_distribution)\n",
    "distributed with parameter $ \\lambda > 0 $.\n",
    "\n",
    "The maximum likelihood estimate of $ \\lambda $ is given by\n",
    "\n",
    "$$\n",
    "\\hat{\\lambda} = \\frac{n}{\\sum_{i=1}^n w_i}\n",
    "$$\n",
    "\n",
    "1. Compute $ \\hat{\\lambda} $ for our initial sample.  \n",
    "1. Use $ \\hat{\\lambda} $ to find the total revenue  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46613402",
   "metadata": {},
   "source": [
    "## Solution to[ Exercise 46.1](https://intro.quantecon.org/#mle_ex1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95126825",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "λ_hat = 1/np.mean(sample)\n",
    "λ_hat"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2bc240b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "dist_exp = expon(scale = 1/λ_hat)\n",
    "tr_expo = total_revenue(dist_exp) \n",
    "tr_expo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75504b30",
   "metadata": {},
   "source": [
    "## Exercise 46.2\n",
    "\n",
    "Plot the exponential distribution against the sample and check if it is a good fit or not."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31c851a3",
   "metadata": {},
   "source": [
    "## Solution to[ Exercise 46.2](https://intro.quantecon.org/#mle_ex2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2fc5a71",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "ax.set_xlim(-1, 20)\n",
    "\n",
    "ax.hist(sample, density=True, bins=5000, histtype='stepfilled', alpha=0.5)\n",
    "ax.plot(x, dist_exp.pdf(x), 'k-', lw=0.5, label='exponential pdf')\n",
    "ax.legend()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a94ab4a",
   "metadata": {},
   "source": [
    "Clearly, this distribution is not a good fit for our data."
   ]
  }
 ],
 "metadata": {
  "date": 1750038405.4036932,
  "filename": "mle.md",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "Maximum Likelihood Estimation"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}