library(tidyverse)
library(optimx) # or optim depending on the optimization method used,
# BFGS is available in both packages
<- carData::Mroz
df
<- fct_recode(df$lfp,
outcome "0" = "no",
"1" = "yes")
<- as.numeric(as.character(outcome))
outcome
<- df %>%
predictors select(k5, age, inc) %>% # selected predictors
mutate(int=rep(1, nrow(df))) %>% # column of 1s (intercept)
select(int, everything()) %>%
as.matrix()
Maximum Likelihood Estimation: Finding the Top of a Hill
I think one of the most intuitive descriptions of the maximum likelihood estimation (especially for the beginners) can be found in Long and Freese (2014):
For all but the simplest models, the only way to find the maximum likelihood function is by numerical methods 1. Numerical methods are the mathematical equivalent of how you would find the top of a hill if you were blindfolded and knew only the slope of the hill at the spot where you are standing and how the slope at that spot is changing which you could figure out by poking your foot in each direction. The search begins with start values corresponding to your location as you start your climb. From the start position, the slope of the likelihood function and the rate of change in the slope determine the next guess for the parameters. The process continues to iterate until the maximum of the likelihood function is found, called, convergence, and the resulting estimates are reported (Long and Freese 2014, 84)
1 For a quick explanation of the difference between analytical and numerical methods: What’s the difference between analytical and numerical approaches to problems?
Example: Logistic Regression
Data preparation
“The search begins with start values corresponding to your location as you start your climb.”
# Use OLS model coefficients as starting values
<- lm(outcome ~ predictors[,c(2:4)])
lmfit <- lmfit$coefficients s_val
“From the start position, the slope of the likelihood function and the rate of change in the slope determine the next guess for the parameters.”
<- function(vBeta, mX, vY) {
logLikelihood return(-sum(vY*(mX %*% vBeta - log(1+exp(mX %*% vBeta)))
+ (1-vY)*(-log(1 + exp(mX %*% vBeta)))))
}
“The process continues to iterate until the maximum of the likelihood function is found, called, convergence,…”
<- optimx(s_val, logLikelihood, method = 'BFGS',
optimization mX = predictors, vY = outcome, hessian=TRUE)
“…and the resulting estimates are reported.”
<- optimization %>%
estimation_optx select(1:ncol(predictors)) %>% t()
estimation_optx
BFGS
X.Intercept. 3.39332484
predictors...c.2.4..k5 -1.31311634
predictors...c.2.4..age -0.05682900
predictors...c.2.4..inc -0.01875491
Compare them with the result of glm
function:
summary(glm(lfp ~ k5 + age + inc, df, family = binomial))
Call:
glm(formula = lfp ~ k5 + age + inc, family = binomial, data = df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.394398 0.515576 6.584 4.59e-11 ***
k5 -1.313316 0.187535 -7.003 2.50e-12 ***
age -0.056855 0.010991 -5.173 2.31e-07 ***
inc -0.018751 0.006889 -2.722 0.00649 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1029.75 on 752 degrees of freedom
Residual deviance: 956.75 on 749 degrees of freedom
AIC: 964.75
Number of Fisher Scoring iterations: 4
Here is the wiki page for the BFGS
(Broyden–Fletcher–Goldfarb–Shanno algorithm) method, which “belongs to quasi-Newton methods, a class of hill-climbing optimization techniques….”
References
Citation
@online{2018,
author = {, T.E.G.},
title = {Maximum {Likelihood} {Estimation:} {Finding} the {Top} of a
{Hill}},
pages = {undefined},
date = {2018-02-06},
langid = {en}
}