Function for Fitting Linear Mixed Models with Previous Model Estimates
postHocLM.Rd
Function for fitting a linear (mixed) model as a second-stage model where the response variable itself comes from a previous model fit and has uncertainty associated with it. The response variable is assumed to be a set of estimates from a previous model fit, where each value in the response variable has a posterior MCMC sample of estimates. This function is useful for doing "posthoc" analyses of model estimates (e.g., exploring how species traits relate to species-specific parameter estimates from a multi-species occupancy model). Such analyses are sometimes referred to as "two-stage" analyses.
Usage
postHocLM(formula, data, inits, priors, verbose = FALSE,
n.report = 100, n.samples, n.chains = 1, ...)
Arguments
- formula
a symbolic description of the model to be fit for the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015).
- data
a list containing data necessary for model fitting. Valid tags are
y
andcovs
.y
is a matrix or data frame with first dimension equal to the number of posterior samples of each value in the response variable and the second dimension is equal to the number of values in the response variable. For example, if the response is species-specific covariate effect estimates from a multi-species occupancy model, the rows correspond to the posterior MCMC samples and the columns correspond to species.covs
is a matrix or data frame containing the independent variables used in the model. Note the number of rows ofcovs
should be equal to the number of columns iny
.- inits
a list with each tag corresponding to a parameter name. Valid tags are
beta
,tau.sq
, andsigma.sq
. The value portion of each tag is the parameter's initial value.sigma.sq
is only relevant when including random effects in the model. Seepriors
description for definition of each parameter name. Additionally, the tagfix
can be set toTRUE
to fix the starting values across all chains. Iffix
is not specified (the default), starting values are varied randomly across chains.- priors
a list with each tag corresponding to a parameter name. Valid tags are
beta.normal
,tau.sq.ig
, andsigma.sq.ig
. Regression coefficients (beta
) are assumed to follow a normal distribution. The hyperparameters of the normal distribution are passed as a list of length two with the first and second elements corresponding to the mean and variance of the normal distribution, which are each specified as vectors of length equal to the number of coefficients to be estimated or of length one if priors are the same for all coefficients. If not specified, prior means are set to 0 and prior variances set to 100.tau.sq
is the residual variance, and is assumed to follow an inverse-Gamma distribution. The hyperparameters of the inverse-Gamma distribution are passed as a vector of length two with first and second elements corresponding to the shape and scale parameters, respectively.sigma.sq
are the variances of any random intercepts included in the model, which similarly totau.sq
follow an inverse-Gamma distribution. The hyperparameters of the inverse-Gamma distribution are passed as a list of length two with first and second elements corresponding to the shape and scale parameters, respectively, which are each specified as vectors of length equal to the number of random intercepts or of length one if priors are the same for all random effect variances.- verbose
if
TRUE
, messages about data preparation, model specification, and progress of the sampler are printed to the screen. Otherwise, no messages are printed.- n.report
the interval to report MCMC progress.
- n.samples
the number of posterior samples to collect in each chain. Note that by default, the same number of MCMC samples fit in the first stage model is assumed to be fit for the second stage model. If
n.samples
is specified, it must be a multiple of the number of samples fit in the first stage, otherwise an error will be reported.- n.chains
the number of chains to run in sequence.
- ...
currently no additional arguments
References
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01 .
Author
Jeffrey W. Doser doserjef@msu.edu,
Value
An object of class postHocLM
that is a list comprised of:
- beta.samples
a
coda
object of posterior samples for the regression coefficients.- tau.sq.samples
a
coda
object of posterior samples for the residual variances.- y.hat.samples
a
coda
object of posterior samples of fitted values.- sigma.sq.samples
a
coda
object of posterior samples for the random effect variances if any random intercepts were included in the model.- beta.star.samples
a
coda
object of posterior samples for the random effects. Only included if random intercepts are specified informula
.- rhat
a list of Gelman-Rubin diagnostic values for some of the model parameters.
- ESS
a list of effective sample sizes for some of the model parameters.
- run.time
execution time reported using
proc.time()
.- bayes.R2
a
coda
object of posterior samples of the Bayesian R-squared as a measure of model fit. Note that when random intercepts are included in the model, this is the conditional Bayesian R-squared, not the marginal Bayesian R-squared.
The return object will include additional objects used for subsequent summarization.
Examples
# Simulate Data -----------------------------------------------------------
set.seed(100)
N <- 100
beta <- c(0, 0.5, 1.2)
tau.sq <- 1
p <- length(beta)
X <- matrix(1, nrow = N, ncol = p)
if (p > 1) {
for (i in 2:p) {
X[, i] <- rnorm(N)
} # i
}
mu <- X[, 1] * beta[1] + X[, 2] * beta[2] + X[, 3] * beta[3]
y <- rnorm(N, mu, sqrt(tau.sq))
# Replicate y n.samples times and add a small amount of noise that corresponds
# to uncertainty from a first stage model.
n.samples <- 1000
y <- matrix(y, n.samples, N, byrow = TRUE)
y <- y + rnorm(length(y), 0, 0.25)
# Package data for use with postHocLM -------------------------------------
colnames(X) <- c('int', 'cov.1', 'cov.2')
data.list <- list(y = y, covs = X)
data <- data.list
inits <- list(beta = 0, tau.sq = 1)
priors <- list(beta.normal = list(mean = 0, var = 10000),
tau.sq.ig = c(0.001, 0.001))
# Run the model -----------------------------------------------------------
out <- postHocLM(formula = ~ cov.1 + cov.2,
inits = inits,
data = data.list,
priors = priors,
verbose = FALSE,
n.chains = 1)
summary(out)
#>
#> Call:
#> postHocLM(formula = ~cov.1 + cov.2, data = data.list, inits = inits,
#> priors = priors, verbose = FALSE, n.chains = 1)
#>
#> Samples per Chain: 1000
#> Number of Chains: 1
#> Total Posterior Samples: 1000
#> Run Time (min): 4e-04
#>
#> ----------------------------------------
#> Fixed Effects:
#> ----------------------------------------
#> Mean SD 2.5% 50% 97.5% Rhat ESS
#> (Intercept) 0.0142 0.1079 -0.2008 0.0150 0.2333 NA 976
#> cov.1 0.5709 0.1079 0.3610 0.5712 0.7820 NA 1000
#> cov.2 0.8703 0.1400 0.6010 0.8673 1.1417 NA 1000
#> Residual Variance 1.1126 0.1710 0.8248 1.1007 1.4673 NA 516
#>
#> ----------------------------------------
#> Bayesian R2:
#> ----------------------------------------
#> Mean SD 2.5% 50% 97.5%
#> 0.3949 0.068 0.2649 0.3986 0.5236