The Poisson Regression model estimates the Poisson population parameter λi related to the regressor covariate xi. The outcome variable yi is hence assumed to be drawn from a Poisson distribution. Specifically, the data generating process is:

Prob(Y=yi|xi)=eλiλyiiyi!,

where yi=0,1,2,. The conventional way of modeling λi is as follows:

lnλi=xiβ.

Note that xi is a kdimensional transposed-vector. A weel known feature of the Poisson distribution is that:

E[yi|xi]=Var(yi|xi)=λi.

Combining the last two equations, we have:

E[yi|xi]=Var(yi|xi)=λi=exiβ.

We can estimate the β by maximizing the log-likelihood function: L(β)=log[ni=1eλiλyiiyi!]=ni=1λi+ni=1yixiβ+ni=1ln(yi!)

Log-likelihood can be maximized by FOC:

L(β)β=ni=1(yiλi)xi=0

The hessian is:

2L(β)ββ=ni=1λixixi

We can simply use Newton’s method to maximize the Log-likelihood function with Hessian Matrix.

Newton’s Method

Refer to Wikipedia for Newton’s method. We will use it to find the solution for β. In summary, we will write an iterative algorithm that updates β as follows:

βt+1=βt+2L(β)ββ1L(β)β

Estimation Poisson Regression with Newton’s Method - A Python Example

I use the following Python packages:

import pandas as pd
import numpy as np
from scipy.special import factorial

I import a data set:

data = pd.read_csv('https://stats.idre.ucla.edu/stat/data/poisson_sim.csv')
data['ones'] = 1
data['prog1'] = pd.get_dummies(data.prog).values[:,0]
data['prog2'] = pd.get_dummies(data.prog).values[:,1]

I create a constant variable column called ones. Prog is a categorical variable. Hence, I create dummies and drop the last one. We will use two dummies in the regression prog1 and prog2. In addition, we have math scores that are continuous. We will regress num_awards to ones, prog1, prog2, and math.

The first ten observations in the data set:

  id num_awards prog math ones prog1 prog2
0 45 0 3 41 1 0 0
1 108 0 1 41 1 1 0
2 15 0 3 44 1 0 0
3 67 0 3 42 1 0 0
4 153 0 3 40 1 0 0
5 51 0 1 42 1 1 0
6 164 0 3 46 1 0 0
7 133 0 3 40 1 0 0
8 2 0 3 33 1 0 0
9 53 0 3 46 1 0 0

I define matrix X and y as follows:

regressor_list = ['ones','prog1','prog2','math']

X = data[regressor_list].values
y = data.values[:,1]

I now define four functions, respectively, for λi, LL(β), L(β)β, and 2L(β)ββ:

# For Lambda_i
def poisson_lambda(x,beta): return np.exp(-np.dot(x, beta))

# Log-likelihood function
def LL(x, y, beta): 
    return np.sum(-poisson_lambda(x, beta) + y * np.dot(x, beta) - np.log(factorial(y)))

# Gradient
def LL_D(x, y, beta):
    return np.dot((y-poisson_lambda(x, beta)), x)

# Hessian 
def LL_D2(x, y, beta):
    return -np.dot(np.transpose(poisson_lambda(x,beta).reshape(-1,1) * x), x)

Implementing Newton’s algorithm described above:

max_iter = 100 # Maximum number of iterations
beta = np.array([0,0,0,0]) # Initial guess for beta vector

max_iter = 100
beta = np.array([0,0,0,0])
for i in range(max_iter):
    beta_new = beta + np.dot(np.linalg.inv(LL_D2(X,y,beta)), LL_D(X,y,beta)) # Newton's update rule
    print(beta_new)

    # stopping criteria
    if np.sum(np.absolute(beta-beta_new)) < 1e-12:
        # Update beta
        beta = beta_new
        break
    beta = beta_new

The result:

[ 2.98299806  0.2125061  -0.26610685 -0.0478888 ]
[ 4.36817904  0.30562538 -0.53865713 -0.06496803]
[ 4.83114677  0.35664338 -0.6906736  -0.06979827]
[ 4.8767946   0.36938328 -0.71369546 -0.07014979]
[ 4.87731508  0.36980898 -0.71404984 -0.0701524 ]
[ 4.87731517  0.36980923 -0.71404992 -0.0701524 ]
[ 4.87731517  0.36980923 -0.71404992 -0.0701524 ]

Solution is:

solution = pd.DataFrame(beta.reshape(1,-1), columns=regressor_list)
solution
  ones prog1 prog2 math
0 4.877315 0.369809 -0.71405 -0.070152