The Poisson Regression model estimates the Poisson population parameter $\lambda _i$ related to the regressor covariate $x_i$. The outcome variable $y_i$ is hence assumed to be drawn from a Poisson distribution. Specifically, the data generating process is:

\[Prob(Y=y_i | x_i) = \dfrac{e^{-\lambda _i}\lambda _i ^{y_i}}{y_i !},\]

where $y_i = 0,1,2,…$. The conventional way of modeling $\lambda _i$ is as follows:

\[\ln \lambda _i = x_i '\beta .\]

Note that $x_i’$ is a $k-$dimensional transposed-vector. A weel known feature of the Poisson distribution is that:

\[E[y_i|x_i] = Var(y_i|x_i) = \lambda _i .\]

Combining the last two equations, we have:

\[E[y_i|x_i] = Var(y_i|x_i) = \lambda _i = e^{x_i ' \beta }.\]

We can estimate the $\beta$ by maximizing the log-likelihood function: \(\begin{aligned} L(\beta ) &= \log \left [ \prod _{i=1} ^n \dfrac{e^{-\lambda _i} \lambda _i ^{y_i}}{y_i!}\right ]\\ &= - \sum _{i=1} ^n \lambda _i + \sum _{i=1} ^n y_ix_i'\beta + \sum _{i=1} ^n \ln (y_i!) \end{aligned}\)

Log-likelihood can be maximized by FOC:

\[\dfrac{\partial L(\beta )}{\partial \beta } = \sum _{i=1} ^n (y_i - \lambda _i) x_i = 0\\\]

The hessian is:

\[\dfrac{\partial ^2 L(\beta )}{\partial \beta \partial \beta '} = -\sum _{i=1} ^n \lambda _i x_i x_i'\\\]

We can simply use Newton’s method to maximize the Log-likelihood function with Hessian Matrix.

Newton’s Method

Refer to Wikipedia for Newton’s method. We will use it to find the solution for $\beta$. In summary, we will write an iterative algorithm that updates $\beta$ as follows:

\[\beta _{t+1} = \beta _t + \dfrac{\partial ^2 L(\beta )}{\partial \beta \partial \beta '} ^{-1}\dfrac{\partial L(\beta )}{\partial \beta }\]

Estimation Poisson Regression with Newton’s Method - A Python Example

I use the following Python packages:

import pandas as pd
import numpy as np
from scipy.special import factorial

I import a data set:

data = pd.read_csv('https://stats.idre.ucla.edu/stat/data/poisson_sim.csv')
data['ones'] = 1
data['prog1'] = pd.get_dummies(data.prog).values[:,0]
data['prog2'] = pd.get_dummies(data.prog).values[:,1]

I create a constant variable column called ones. Prog is a categorical variable. Hence, I create dummies and drop the last one. We will use two dummies in the regression prog1 and prog2. In addition, we have math scores that are continuous. We will regress num_awards to ones, prog1, prog2, and math.

The first ten observations in the data set:

  id num_awards prog math ones prog1 prog2
0 45 0 3 41 1 0 0
1 108 0 1 41 1 1 0
2 15 0 3 44 1 0 0
3 67 0 3 42 1 0 0
4 153 0 3 40 1 0 0
5 51 0 1 42 1 1 0
6 164 0 3 46 1 0 0
7 133 0 3 40 1 0 0
8 2 0 3 33 1 0 0
9 53 0 3 46 1 0 0

I define matrix $X$ and $y$ as follows:

regressor_list = ['ones','prog1','prog2','math']

X = data[regressor_list].values
y = data.values[:,1]

I now define four functions, respectively, for $\lambda _i$, $LL(\beta)$, $\dfrac{\partial L(\beta )}{\partial \beta }$, and $\dfrac{\partial ^2 L(\beta )}{\partial \beta \partial \beta ‘}$:

# For Lambda_i
def poisson_lambda(x,beta): return np.exp(-np.dot(x, beta))

# Log-likelihood function
def LL(x, y, beta): 
    return np.sum(-poisson_lambda(x, beta) + y * np.dot(x, beta) - np.log(factorial(y)))

# Gradient
def LL_D(x, y, beta):
    return np.dot((y-poisson_lambda(x, beta)), x)

# Hessian 
def LL_D2(x, y, beta):
    return -np.dot(np.transpose(poisson_lambda(x,beta).reshape(-1,1) * x), x)

Implementing Newton’s algorithm described above:

max_iter = 100 # Maximum number of iterations
beta = np.array([0,0,0,0]) # Initial guess for beta vector

max_iter = 100
beta = np.array([0,0,0,0])
for i in range(max_iter):
    beta_new = beta + np.dot(np.linalg.inv(LL_D2(X,y,beta)), LL_D(X,y,beta)) # Newton's update rule
    print(beta_new)

    # stopping criteria
    if np.sum(np.absolute(beta-beta_new)) < 1e-12:
        # Update beta
        beta = beta_new
        break
    beta = beta_new

The result:

[ 2.98299806  0.2125061  -0.26610685 -0.0478888 ]
[ 4.36817904  0.30562538 -0.53865713 -0.06496803]
[ 4.83114677  0.35664338 -0.6906736  -0.06979827]
[ 4.8767946   0.36938328 -0.71369546 -0.07014979]
[ 4.87731508  0.36980898 -0.71404984 -0.0701524 ]
[ 4.87731517  0.36980923 -0.71404992 -0.0701524 ]
[ 4.87731517  0.36980923 -0.71404992 -0.0701524 ]

Solution is:

solution = pd.DataFrame(beta.reshape(1,-1), columns=regressor_list)
solution
  ones prog1 prog2 math
0 4.877315 0.369809 -0.71405 -0.070152