Maximum Likelihood Estimation: Finding the Top of a Hill

Author
Published

Feb 6, 2018

I think one of the most intuitive descriptions of the maximum likelihood estimation (especially for the beginners) can be found in Long and Freese (2014):

For all but the simplest models, the only way to find the maximum likelihood function is by numerical methods 1. Numerical methods are the mathematical equivalent of how you would find the top of a hill if you were blindfolded and knew only the slope of the hill at the spot where you are standing and how the slope at that spot is changing which you could figure out by poking your foot in each direction. The search begins with start values corresponding to your location as you start your climb. From the start position, the slope of the likelihood function and the rate of change in the slope determine the next guess for the parameters. The process continues to iterate until the maximum of the likelihood function is found, called, convergence, and the resulting estimates are reported (Long and Freese 2014, 84)

  • 1 For a quick explanation of the difference between analytical and numerical methods: What’s the difference between analytical and numerical approaches to problems?

  • Example: Logistic Regression

    Data preparation

    library(tidyverse)
    library(optimx) # or optim depending on the optimization method used, 
                    # BFGS is available in both packages
    df <- carData::Mroz
    
    outcome <- fct_recode(df$lfp,
                   "0" = "no",
                   "1" = "yes")
    outcome <- as.numeric(as.character(outcome))
    
    predictors <- df %>% 
      select(k5, age, inc) %>%  # selected predictors
      mutate(int=rep(1, nrow(df))) %>% # column of 1s (intercept)
      select(int, everything()) %>% 
      as.matrix()

    “The search begins with start values corresponding to your location as you start your climb.”

    # Use OLS model coefficients as starting values
    lmfit <- lm(outcome ~ predictors[,c(2:4)])
    s_val <- lmfit$coefficients

    “From the start position, the slope of the likelihood function and the rate of change in the slope determine the next guess for the parameters.”

    logLikelihood <- function(vBeta, mX, vY) {
      return(-sum(vY*(mX %*% vBeta - log(1+exp(mX %*% vBeta)))
        + (1-vY)*(-log(1 + exp(mX %*% vBeta)))))  
    }

    “The process continues to iterate until the maximum of the likelihood function is found, called, convergence,…”

    optimization <- optimx(s_val, logLikelihood, method = 'BFGS', 
                           mX = predictors, vY = outcome, hessian=TRUE)

    “…and the resulting estimates are reported.”

    estimation_optx <- optimization %>%
      select(1:ncol(predictors)) %>% t()
    estimation_optx
                                   BFGS
    X.Intercept.             3.39332484
    predictors...c.2.4..k5  -1.31311634
    predictors...c.2.4..age -0.05682900
    predictors...c.2.4..inc -0.01875491

    Compare them with the result of glm function:

    summary(glm(lfp ~ k5 + age + inc, df, family = binomial))
    
    Call:
    glm(formula = lfp ~ k5 + age + inc, family = binomial, data = df)
    
    Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
    (Intercept)  3.394398   0.515576   6.584 4.59e-11 ***
    k5          -1.313316   0.187535  -7.003 2.50e-12 ***
    age         -0.056855   0.010991  -5.173 2.31e-07 ***
    inc         -0.018751   0.006889  -2.722  0.00649 ** 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 1029.75  on 752  degrees of freedom
    Residual deviance:  956.75  on 749  degrees of freedom
    AIC: 964.75
    
    Number of Fisher Scoring iterations: 4

    Here is the wiki page for the BFGS (Broyden–Fletcher–Goldfarb–Shanno algorithm) method, which “belongs to quasi-Newton methods, a class of hill-climbing optimization techniques….”

    References

    Long, Scott J., and Jeremy Freese. 2014. Regression Models for Categorical Dependent Variables Using Stata. Texas: Stata Press.

    Citation

    BibTeX citation:
    @online{2018,
      author = {, T.E.G.},
      title = {Maximum {Likelihood} {Estimation:} {Finding} the {Top} of a
        {Hill}},
      pages = {undefined},
      date = {2018-02-06},
      langid = {en}
    }
    
    For attribution, please cite this work as:
    T.E.G. 2018. “Maximum Likelihood Estimation: Finding the Top of a Hill.” February 6, 2018.