The Poisson Regression model estimates the Poisson population parameter $\lambda _i$ related to the regressor covariate $x_i$. The outcome variable $y_i$ is hence assumed to be drawn from a Poisson distribution. Specifically, the data generating process is:
\[Prob(Y=y_i | x_i) = \dfrac{e^{-\lambda _i}\lambda _i ^{y_i}}{y_i !},\]where $y_i = 0,1,2,…$. The conventional way of modeling $\lambda _i$ is as follows:
\[\ln \lambda _i = x_i '\beta .\]Note that $x_i’$ is a $k-$dimensional transposed-vector. A weel known feature of the Poisson distribution is that:
\[E[y_i|x_i] = Var(y_i|x_i) = \lambda _i .\]Combining the last two equations, we have:
\[E[y_i|x_i] = Var(y_i|x_i) = \lambda _i = e^{x_i ' \beta }.\]We can estimate the $\beta$ by maximizing the log-likelihood function: \(\begin{aligned} L(\beta ) &= \log \left [ \prod _{i=1} ^n \dfrac{e^{-\lambda _i} \lambda _i ^{y_i}}{y_i!}\right ]\\ &= - \sum _{i=1} ^n \lambda _i + \sum _{i=1} ^n y_ix_i'\beta + \sum _{i=1} ^n \ln (y_i!) \end{aligned}\)
Log-likelihood can be maximized by FOC:
\[\dfrac{\partial L(\beta )}{\partial \beta } = \sum _{i=1} ^n (y_i - \lambda _i) x_i = 0\\\]The hessian is:
\[\dfrac{\partial ^2 L(\beta )}{\partial \beta \partial \beta '} = -\sum _{i=1} ^n \lambda _i x_i x_i'\\\]We can simply use Newton’s method to maximize the Log-likelihood function with Hessian Matrix.
Newton’s Method
Refer to Wikipedia for Newton’s method. We will use it to find the solution for $\beta$. In summary, we will write an iterative algorithm that updates $\beta$ as follows:
\[\beta _{t+1} = \beta _t + \dfrac{\partial ^2 L(\beta )}{\partial \beta \partial \beta '} ^{-1}\dfrac{\partial L(\beta )}{\partial \beta }\]Estimation Poisson Regression with Newton’s Method - A Python Example
I use the following Python packages:
import pandas as pd
import numpy as np
from scipy.special import factorial
I import a data set:
data = pd.read_csv('https://stats.idre.ucla.edu/stat/data/poisson_sim.csv')
data['ones'] = 1
data['prog1'] = pd.get_dummies(data.prog).values[:,0]
data['prog2'] = pd.get_dummies(data.prog).values[:,1]
I create a constant variable column called ones. Prog is a categorical variable. Hence, I create dummies and drop the last one. We will use two dummies in the regression prog1 and prog2. In addition, we have math scores that are continuous. We will regress num_awards to ones, prog1, prog2, and math.
The first ten observations in the data set:
| id | num_awards | prog | math | ones | prog1 | prog2 | |
|---|---|---|---|---|---|---|---|
| 0 | 45 | 0 | 3 | 41 | 1 | 0 | 0 |
| 1 | 108 | 0 | 1 | 41 | 1 | 1 | 0 |
| 2 | 15 | 0 | 3 | 44 | 1 | 0 | 0 |
| 3 | 67 | 0 | 3 | 42 | 1 | 0 | 0 |
| 4 | 153 | 0 | 3 | 40 | 1 | 0 | 0 |
| 5 | 51 | 0 | 1 | 42 | 1 | 1 | 0 |
| 6 | 164 | 0 | 3 | 46 | 1 | 0 | 0 |
| 7 | 133 | 0 | 3 | 40 | 1 | 0 | 0 |
| 8 | 2 | 0 | 3 | 33 | 1 | 0 | 0 |
| 9 | 53 | 0 | 3 | 46 | 1 | 0 | 0 |
I define matrix $X$ and $y$ as follows:
regressor_list = ['ones','prog1','prog2','math']
X = data[regressor_list].values
y = data.values[:,1]
I now define four functions, respectively, for $\lambda _i$, $LL(\beta)$, $\dfrac{\partial L(\beta )}{\partial \beta }$, and $\dfrac{\partial ^2 L(\beta )}{\partial \beta \partial \beta ‘}$:
# For Lambda_i
def poisson_lambda(x,beta): return np.exp(-np.dot(x, beta))
# Log-likelihood function
def LL(x, y, beta):
return np.sum(-poisson_lambda(x, beta) + y * np.dot(x, beta) - np.log(factorial(y)))
# Gradient
def LL_D(x, y, beta):
return np.dot((y-poisson_lambda(x, beta)), x)
# Hessian
def LL_D2(x, y, beta):
return -np.dot(np.transpose(poisson_lambda(x,beta).reshape(-1,1) * x), x)
Implementing Newton’s algorithm described above:
max_iter = 100 # Maximum number of iterations
beta = np.array([0,0,0,0]) # Initial guess for beta vector
max_iter = 100
beta = np.array([0,0,0,0])
for i in range(max_iter):
beta_new = beta + np.dot(np.linalg.inv(LL_D2(X,y,beta)), LL_D(X,y,beta)) # Newton's update rule
print(beta_new)
# stopping criteria
if np.sum(np.absolute(beta-beta_new)) < 1e-12:
# Update beta
beta = beta_new
break
beta = beta_new
The result:
[ 2.98299806 0.2125061 -0.26610685 -0.0478888 ]
[ 4.36817904 0.30562538 -0.53865713 -0.06496803]
[ 4.83114677 0.35664338 -0.6906736 -0.06979827]
[ 4.8767946 0.36938328 -0.71369546 -0.07014979]
[ 4.87731508 0.36980898 -0.71404984 -0.0701524 ]
[ 4.87731517 0.36980923 -0.71404992 -0.0701524 ]
[ 4.87731517 0.36980923 -0.71404992 -0.0701524 ]
Solution is:
solution = pd.DataFrame(beta.reshape(1,-1), columns=regressor_list)
solution
| ones | prog1 | prog2 | math | |
|---|---|---|---|---|
| 0 | 4.877315 | 0.369809 | -0.71405 | -0.070152 |