The Poisson Regression model estimates the Poisson population parameter λi related to the regressor covariate xi. The outcome variable yi is hence assumed to be drawn from a Poisson distribution. Specifically, the data generating process is:
Prob(Y=yi|xi)=e−λiλyiiyi!,where yi=0,1,2,…. The conventional way of modeling λi is as follows:
lnλi=x′iβ.Note that x′i is a k−dimensional transposed-vector. A weel known feature of the Poisson distribution is that:
E[yi|xi]=Var(yi|xi)=λi.Combining the last two equations, we have:
E[yi|xi]=Var(yi|xi)=λi=ex′iβ.We can estimate the β by maximizing the log-likelihood function: L(β)=log[n∏i=1e−λiλyiiyi!]=−n∑i=1λi+n∑i=1yix′iβ+n∑i=1ln(yi!)
Log-likelihood can be maximized by FOC:
∂L(β)∂β=n∑i=1(yi−λi)xi=0The hessian is:
∂2L(β)∂β∂β′=−n∑i=1λixix′iWe can simply use Newton’s method to maximize the Log-likelihood function with Hessian Matrix.
Newton’s Method
Refer to Wikipedia for Newton’s method. We will use it to find the solution for β. In summary, we will write an iterative algorithm that updates β as follows:
βt+1=βt+∂2L(β)∂β∂β′−1∂L(β)∂βEstimation Poisson Regression with Newton’s Method - A Python Example
I use the following Python packages:
import pandas as pd
import numpy as np
from scipy.special import factorial
I import a data set:
data = pd.read_csv('https://stats.idre.ucla.edu/stat/data/poisson_sim.csv')
data['ones'] = 1
data['prog1'] = pd.get_dummies(data.prog).values[:,0]
data['prog2'] = pd.get_dummies(data.prog).values[:,1]
I create a constant variable column called ones.
Prog
is a categorical variable. Hence, I create dummies and drop the last one. We will use two dummies in the regression prog1
and prog2
. In addition, we have math
scores that are continuous. We will regress num_awards
to ones
, prog1
, prog2
, and math
.
The first ten observations in the data set:
id | num_awards | prog | math | ones | prog1 | prog2 | |
---|---|---|---|---|---|---|---|
0 | 45 | 0 | 3 | 41 | 1 | 0 | 0 |
1 | 108 | 0 | 1 | 41 | 1 | 1 | 0 |
2 | 15 | 0 | 3 | 44 | 1 | 0 | 0 |
3 | 67 | 0 | 3 | 42 | 1 | 0 | 0 |
4 | 153 | 0 | 3 | 40 | 1 | 0 | 0 |
5 | 51 | 0 | 1 | 42 | 1 | 1 | 0 |
6 | 164 | 0 | 3 | 46 | 1 | 0 | 0 |
7 | 133 | 0 | 3 | 40 | 1 | 0 | 0 |
8 | 2 | 0 | 3 | 33 | 1 | 0 | 0 |
9 | 53 | 0 | 3 | 46 | 1 | 0 | 0 |
I define matrix X and y as follows:
regressor_list = ['ones','prog1','prog2','math']
X = data[regressor_list].values
y = data.values[:,1]
I now define four functions, respectively, for λi, LL(β), ∂L(β)∂β, and ∂2L(β)∂β∂β‘:
# For Lambda_i
def poisson_lambda(x,beta): return np.exp(-np.dot(x, beta))
# Log-likelihood function
def LL(x, y, beta):
return np.sum(-poisson_lambda(x, beta) + y * np.dot(x, beta) - np.log(factorial(y)))
# Gradient
def LL_D(x, y, beta):
return np.dot((y-poisson_lambda(x, beta)), x)
# Hessian
def LL_D2(x, y, beta):
return -np.dot(np.transpose(poisson_lambda(x,beta).reshape(-1,1) * x), x)
Implementing Newton’s algorithm described above:
max_iter = 100 # Maximum number of iterations
beta = np.array([0,0,0,0]) # Initial guess for beta vector
max_iter = 100
beta = np.array([0,0,0,0])
for i in range(max_iter):
beta_new = beta + np.dot(np.linalg.inv(LL_D2(X,y,beta)), LL_D(X,y,beta)) # Newton's update rule
print(beta_new)
# stopping criteria
if np.sum(np.absolute(beta-beta_new)) < 1e-12:
# Update beta
beta = beta_new
break
beta = beta_new
The result:
[ 2.98299806 0.2125061 -0.26610685 -0.0478888 ]
[ 4.36817904 0.30562538 -0.53865713 -0.06496803]
[ 4.83114677 0.35664338 -0.6906736 -0.06979827]
[ 4.8767946 0.36938328 -0.71369546 -0.07014979]
[ 4.87731508 0.36980898 -0.71404984 -0.0701524 ]
[ 4.87731517 0.36980923 -0.71404992 -0.0701524 ]
[ 4.87731517 0.36980923 -0.71404992 -0.0701524 ]
Solution is:
solution = pd.DataFrame(beta.reshape(1,-1), columns=regressor_list)
solution
ones | prog1 | prog2 | math | |
---|---|---|---|---|
0 | 4.877315 | 0.369809 | -0.71405 | -0.070152 |