
Function for Fitting Linear Mixed Models with Previous Model Estimates
postHocLM.RdFunction for fitting a linear (mixed) model as a second-stage model where the response variable itself comes from a previous model fit and has uncertainty associated with it. The response variable is assumed to be a set of estimates from a previous model fit, where each value in the response variable has a posterior MCMC sample of estimates. This function is useful for doing "posthoc" analyses of model estimates (e.g., exploring how species traits relate to species-specific parameter estimates from a multi-species occupancy model). Such analyses are sometimes referred to as "two-stage" analyses.
Usage
postHocLM(formula, data, inits, priors, verbose = FALSE,
n.report = 100, n.samples, n.chains = 1, ...)Arguments
- formula
a symbolic description of the model to be fit for the model using R's model syntax. Only right-hand side of formula is specified. See example below. Random intercepts are allowed using lme4 syntax (Bates et al. 2015).
- data
a list containing data necessary for model fitting. Valid tags are
yandcovs.yis a matrix or data frame with first dimension equal to the number of posterior samples of each value in the response variable and the second dimension is equal to the number of values in the response variable. For example, if the response is species-specific covariate effect estimates from a multi-species occupancy model, the rows correspond to the posterior MCMC samples and the columns correspond to species.covsis a matrix or data frame containing the independent variables used in the model. Note the number of rows ofcovsshould be equal to the number of columns iny.- inits
a list with each tag corresponding to a parameter name. Valid tags are
beta,tau.sq, andsigma.sq. The value portion of each tag is the parameter's initial value.sigma.sqis only relevant when including random effects in the model. Seepriorsdescription for definition of each parameter name. Additionally, the tagfixcan be set toTRUEto fix the starting values across all chains. Iffixis not specified (the default), starting values are varied randomly across chains.- priors
a list with each tag corresponding to a parameter name. Valid tags are
beta.normal,tau.sq.ig, andsigma.sq.ig. Regression coefficients (beta) are assumed to follow a normal distribution. The hyperparameters of the normal distribution are passed as a list of length two with the first and second elements corresponding to the mean and variance of the normal distribution, which are each specified as vectors of length equal to the number of coefficients to be estimated or of length one if priors are the same for all coefficients. If not specified, prior means are set to 0 and prior variances set to 100.tau.sqis the residual variance, and is assumed to follow an inverse-Gamma distribution. The hyperparameters of the inverse-Gamma distribution are passed as a vector of length two with first and second elements corresponding to the shape and scale parameters, respectively.sigma.sqare the variances of any random intercepts included in the model, which similarly totau.sqfollow an inverse-Gamma distribution. The hyperparameters of the inverse-Gamma distribution are passed as a list of length two with first and second elements corresponding to the shape and scale parameters, respectively, which are each specified as vectors of length equal to the number of random intercepts or of length one if priors are the same for all random effect variances.- verbose
if
TRUE, messages about data preparation, model specification, and progress of the sampler are printed to the screen. Otherwise, no messages are printed.- n.report
the interval to report MCMC progress.
- n.samples
the number of posterior samples to collect in each chain. Note that by default, the same number of MCMC samples fit in the first stage model is assumed to be fit for the second stage model. If
n.samplesis specified, it must be a multiple of the number of samples fit in the first stage, otherwise an error will be reported.- n.chains
the number of chains to run in sequence.
- ...
currently no additional arguments
References
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01 .
Author
Jeffrey W. Doser doserjef@msu.edu,
Value
An object of class postHocLM that is a list comprised of:
- beta.samples
a
codaobject of posterior samples for the regression coefficients.- tau.sq.samples
a
codaobject of posterior samples for the residual variances.- y.hat.samples
a
codaobject of posterior samples of fitted values.- sigma.sq.samples
a
codaobject of posterior samples for the random effect variances if any random intercepts were included in the model.- beta.star.samples
a
codaobject of posterior samples for the random effects. Only included if random intercepts are specified informula.- rhat
a list of Gelman-Rubin diagnostic values for some of the model parameters.
- ESS
a list of effective sample sizes for some of the model parameters.
- run.time
execution time reported using
proc.time().- bayes.R2
a
codaobject of posterior samples of the Bayesian R-squared as a measure of model fit. Note that when random intercepts are included in the model, this is the conditional Bayesian R-squared, not the marginal Bayesian R-squared.
The return object will include additional objects used for subsequent summarization.
Examples
# Simulate Data -----------------------------------------------------------
set.seed(100)
N <- 100
beta <- c(0, 0.5, 1.2)
tau.sq <- 1
p <- length(beta)
X <- matrix(1, nrow = N, ncol = p)
if (p > 1) {
for (i in 2:p) {
X[, i] <- rnorm(N)
} # i
}
mu <- X[, 1] * beta[1] + X[, 2] * beta[2] + X[, 3] * beta[3]
y <- rnorm(N, mu, sqrt(tau.sq))
# Replicate y n.samples times and add a small amount of noise that corresponds
# to uncertainty from a first stage model.
n.samples <- 1000
y <- matrix(y, n.samples, N, byrow = TRUE)
y <- y + rnorm(length(y), 0, 0.25)
# Package data for use with postHocLM -------------------------------------
colnames(X) <- c('int', 'cov.1', 'cov.2')
data.list <- list(y = y, covs = X)
data <- data.list
inits <- list(beta = 0, tau.sq = 1)
priors <- list(beta.normal = list(mean = 0, var = 10000),
tau.sq.ig = c(0.001, 0.001))
# Run the model -----------------------------------------------------------
out <- postHocLM(formula = ~ cov.1 + cov.2,
inits = inits,
data = data.list,
priors = priors,
verbose = FALSE,
n.chains = 1)
summary(out)
#>
#> Call:
#> postHocLM(formula = ~cov.1 + cov.2, data = data.list, inits = inits,
#> priors = priors, verbose = FALSE, n.chains = 1)
#>
#> Samples per Chain: 1000
#> Number of Chains: 1
#> Total Posterior Samples: 1000
#> Run Time (min): 3e-04
#>
#> ----------------------------------------
#> Fixed Effects:
#> ----------------------------------------
#> Mean SD 2.5% 50% 97.5% Rhat ESS
#> (Intercept) 0.0142 0.1079 -0.2008 0.0150 0.2333 NA 976
#> cov.1 0.5709 0.1079 0.3610 0.5712 0.7820 NA 1000
#> cov.2 0.8703 0.1400 0.6010 0.8673 1.1417 NA 1000
#> Residual Variance 1.1126 0.1710 0.8248 1.1007 1.4673 NA 516
#>
#> ----------------------------------------
#> Bayesian R2:
#> ----------------------------------------
#> Mean SD 2.5% 50% 97.5%
#> 0.3949 0.068 0.2649 0.3986 0.5236