Maximum Likelihood Estimation: Finding the Top of a Hill

Author

T.E.G.

Published

Feb 6, 2018

I think one of the most intuitive descriptions of the maximum likelihood estimation (especially for the beginners) can be found in Long and Freese (2014):

For all but the simplest models, the only way to find the maximum likelihood function is by numerical methods ¹. Numerical methods are the mathematical equivalent of how you would find the top of a hill if you were blindfolded and knew only the slope of the hill at the spot where you are standing and how the slope at that spot is changing which you could figure out by poking your foot in each direction. The search begins with start values corresponding to your location as you start your climb. From the start position, the slope of the likelihood function and the rate of change in the slope determine the next guess for the parameters. The process continues to iterate until the maximum of the likelihood function is found, called, convergence, and the resulting estimates are reported (Long and Freese 2014, 84)

¹ For a quick explanation of the difference between analytical and numerical methods: What’s the difference between analytical and numerical approaches to problems?

Example: Logistic Regression

Data preparation

library(tidyverse)
library(optimx) # or optim depending on the optimization method used, 
                # BFGS is available in both packages
df <- carData::Mroz

outcome <- fct_recode(df$lfp,
               "0" = "no",
               "1" = "yes")
outcome <- as.numeric(as.character(outcome))

predictors <- df %>% 
  select(k5, age, inc) %>%  # selected predictors
  mutate(int=rep(1, nrow(df))) %>% # column of 1s (intercept)
  select(int, everything()) %>% 
  as.matrix()

“The search begins with start values corresponding to your location as you start your climb.”

# Use OLS model coefficients as starting values
lmfit <- lm(outcome ~ predictors[,c(2:4)])
s_val <- lmfit$coefficients

“From the start position, the slope of the likelihood function and the rate of change in the slope determine the next guess for the parameters.”

logLikelihood <- function(vBeta, mX, vY) {
  return(-sum(vY*(mX %*% vBeta - log(1+exp(mX %*% vBeta)))
    + (1-vY)*(-log(1 + exp(mX %*% vBeta)))))  
}

“The process continues to iterate until the maximum of the likelihood function is found, called, convergence,…”

optimization <- optimx(s_val, logLikelihood, method = 'BFGS', 
                       mX = predictors, vY = outcome, hessian=TRUE)

“…and the resulting estimates are reported.”

estimation_optx <- optimization %>%
  select(1:ncol(predictors)) %>% t()
estimation_optx

                               BFGS
X.Intercept.             3.39332484
predictors...c.2.4..k5  -1.31311634
predictors...c.2.4..age -0.05682900
predictors...c.2.4..inc -0.01875491

Compare them with the result of glm function:

summary(glm(lfp ~ k5 + age + inc, df, family = binomial))


Call:
glm(formula = lfp ~ k5 + age + inc, family = binomial, data = df)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.394398   0.515576   6.584 4.59e-11 ***
k5          -1.313316   0.187535  -7.003 2.50e-12 ***
age         -0.056855   0.010991  -5.173 2.31e-07 ***
inc         -0.018751   0.006889  -2.722  0.00649 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1029.75  on 752  degrees of freedom
Residual deviance:  956.75  on 749  degrees of freedom
AIC: 964.75

Number of Fisher Scoring iterations: 4

Here is the wiki page for the BFGS (Broyden–Fletcher–Goldfarb–Shanno algorithm) method, which “belongs to quasi-Newton methods, a class of hill-climbing optimization techniques….”

References

Long, Scott J., and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. Texas: Stata Press.

Citation

BibTeX citation:

@online{2018,
  author = {, T.E.G.},
  title = {Maximum {Likelihood} {Estimation:} {Finding} the {Top} of a
    {Hill}},
  pages = {undefined},
  date = {2018-02-06},
  langid = {en}
}

For attribution, please cite this work as:

T.E.G. 2018. “Maximum Likelihood Estimation: Finding the Top of a Hill.” February 6, 2018.