Simulate Multi-Species Multi-Season Detection-Nondetection Data

The function simTMsOcc simulates multi-species multi-season detection-nondetection data for simulation studies, power assessments, or function testing. Data can be optionally simulated with a spatial Gaussian Process in the occurrence portion of the model, as well as an option to allow for species correlations using a factor modeling approach. Non-spatial random intercepts can also be included in the detection or occurrence portions of the occupancy model.

Usage

simTMsOcc(J.x, J.y, n.time, n.rep, N, beta, alpha, sp.only = 0, 
    trend = TRUE, psi.RE = list(), p.RE = list(), 
          sp = FALSE, svc.cols = 1, cov.model, 
    sigma.sq, phi, nu, ar1 = FALSE, rho, sigma.sq.t, 
    factor.model = FALSE, n.factors, range.probs, grid, ...)

Arguments

J.x: a single numeric value indicating the number of sites to simulate detection-nondetection data along the horizontal axis. Total number of sites with simulated data is \(J.x \times J.y\).
J.y: a single numeric value indicating the number of sites to simulate detection-nondetection data along the vertical axis. Total number of sites with simulated data is \(J.x \times J.y\).
n.time: a single numeric value indicating the number of primary time periods (denoted T) over which sampling occurs.
n.rep: a numeric matrix indicating the number of replicates at each site during each primary time period. The matrix must have \(J = J.x \times J.y\) rows and T columns, where T is the number of primary time periods (e.g., years or seasons) over which sampling occurs.
N: a single numeric value indicating the number of species to simulate detection-nondetection data.
beta: a numeric matrix with \(N\) rows containing the intercept and regression coefficient parameters for the occurrence portion of the multi-species occupancy model. Each row corresponds to the regression coefficients for a given species.
alpha: a numeric matrix with \(N\) rows containing the intercept and regression coefficient parameters for the detection portion of the multi-species occupancy model. Each row corresponds to the regression coefficients for a given species.
sp.only: a numeric vector specifying which occurrence covariates should only vary over space and not over time. The numbers in the vector correspond to the elements in the vector of regression coefficients (beta). By default, all simulated occurrence covariates are assumed to vary over both space and time.
trend: a logical value. If TRUE, a temporal trend will be used to simulate the detection-nondetection data and the second element of beta is assumed to be the trend parameter. If FALSE no trend is used to simulate the data and all elements of beta (except the first value which is the intercept) correspond to covariate effects.
psi.RE: a list used to specify the non-spatial random intercepts included in the occurrence portion of the model. The list must have two tags: levels and sigma.sq.psi. levels is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept. sigma.sq.psi is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. If not specified, no random effects are included in the occurrence portion of the model.
p.RE: a list used to specify the non-spatial random intercepts included in the detection portion of the model. The list must have two tags: levels and sigma.sq.p. levels is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept. sigma.sq.p is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. If not specified, no random effects are included in the detection portion of the model.
sp: a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to FALSE.
svc.cols: a vector indicating the variables whose effects will be estimated as spatially-varying coefficients. svc.cols is an integer vector with values indicating the order of covariates specified in the model formula (with 1 being the intercept if specified).
cov.model: a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: "exponential", "matern", "spherical", and "gaussian".
sigma.sq: a numeric vector of length \(N\) containing the spatial variance parameter for each species. Ignored when sp = FALSE or when factor.model = TRUE.
phi: a numeric vector of length \(N\) containing the spatial decay parameter for each species. Ignored when sp = FALSE. If factor.model = TRUE, this should be of length n.factors.
nu: a numeric vector of length \(N\) containing the spatial smoothness parameter for each species. Only used when sp = TRUE and cov.model = 'matern'. If factor.model = TRUE, this should be of length n.factors.
ar1: a logical value indicating whether to simulate a temporal random effect with an AR(1) process. By default, set to FALSE.
rho: a vector of N values indicating the AR(1) temporal correlation parameter for each species. Ignored when ar1 = FALSE.
sigma.sq.t: a vector of N values indicating the AR(1) temporal variance parameter for each species. Ignored when ar1 = FALSE.
factor.model: a logical value indicating whether to simulate data following a factor modeling approach that explicitly incoporates species correlations. If sp = TRUE, the latent factors are simulated from independent spatial processes. If sp = FALSE, the latent factors are simulated from standard normal distributions.
n.factors: a single numeric value specifying the number of latent factors to use to simulate the data if factor.model = TRUE.
range.probs: a numeric vector of length N where each value should fall between 0 and 1, and indicates the probability that one of the J spatial locations simulated is within the simulated range of the given species. If set to 1, every species has the potential of being present at each location.
grid: an atomic vector used to specify the grid across which to simulate the latent spatial processes. This argument is used to simulate the underlying spatial processes at a different resolution than the coordinates (e.g., if coordinates are distributed across a grid).
...: currently no additional arguments

Author

Jeffrey W. Doser doserjef@msu.edu,

Value

A list comprised of:

X: a \(J \times T \times p.occ\) numeric array containing the design matrix for the occurrence portion of the occupancy model.
X.p: a four-dimensional numeric array with dimensions corresponding to sites, primary time periods, repeat visits, and number of detection regression coefficients. This is the design matrix used for the detection portion of the occupancy model.
coords: a \(J \times 2\) numeric matrix of coordinates of each occupancy site. Required for spatial models.
w: a \(N \times J\) matrix of the spatial random effects for each species. Only used to simulate data when sp = TRUE. If factor.model = TRUE, the first dimension is n.factors.
psi: a \(N \times J \times T\) array of the occurrence probabilities for each species at each site during each primary time period.
z: a \(N \times J \times T\) array of the latent occurrence status for each species at each site during each primary time period.
p: a N x J x T x max(n.rep) array of the detection probabilities for each species at each site, primary time period, and secondyary replicate combination. Sites with fewer than max(n.rep) replicates will contain NA values.
y: a N x J x T x max(n.rep) array of the raw detection-nondetection data for each species at each site, primary time period, and replicate combination. Sites with fewer than max(n.rep) replicates will contain NA values.
X.p.re: a four-dimensional numeric array containing the levels of any detection random effect included in the model. Only relevant when detection random effects are specified in p.RE.
X.re: a numeric matrix containing the levels of any occurrence random effect included in the model. Only relevant when occurrence random effects are specified in psi.RE.
alpha.star: a numeric matrix where each row contains the simulated detection random effects for each given level of the random effects included in the detection model. Only relevant when detection random effects are included in the model.
beta.star: a numeric matrix where each row contains the simulated occurrence random effects for each given level of the random effects included in the occurrence model. Only relevant when occurrence random effects are included in the model.
eta: a numeric matrix with each row corresponding to species and column corresponding to time period of the AR(1) temporal random effects.

Examples

# Simulate Data -----------------------------------------------------------
set.seed(500)
J.x <- 8
J.y <- 8
J <- J.x * J.y
# Years sampled
n.time <- sample(3:10, J, replace = TRUE)
# n.time <- rep(10, J)
n.time.max <- max(n.time)
# Replicates
n.rep <- matrix(NA, J, max(n.time))
for (j in 1:J) {
  n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE)
  # n.rep[j, 1:n.time[j]] <- rep(4, n.time[j])
}
N <- 7
# Community-level covariate effects
# Occurrence
beta.mean <- c(-3, -0.2, 0.5)
trend <- FALSE
sp.only <- 0
p.occ <- length(beta.mean)
tau.sq.beta <- c(0.6, 1.5, 1.4)
# Detection
alpha.mean <- c(0, 1.2, -1.5)
tau.sq.alpha <- c(1, 0.5, 2.3)
p.det <- length(alpha.mean)
# Random effects
psi.RE <- list()
p.RE <- list()
# Draw species-level effects from community means.
beta <- matrix(NA, nrow = N, ncol = p.occ)
alpha <- matrix(NA, nrow = N, ncol = p.det)
for (i in 1:p.occ) {
  beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i]))
}
for (i in 1:p.det) {
  alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i]))
}
sp <- TRUE
svc.cols <- c(1, 2)
p.svc <- length(svc.cols)
n.factors <- 3
phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3)
factor.model <- TRUE
cov.model <- 'exponential'

dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N,
     beta = beta, alpha = alpha, sp.only = sp.only, trend = trend,
     psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model,
                 svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp,
                 cov.model = cov.model)
str(dat)
#> List of 17
#>  $ X          : num [1:64, 1:10, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ X.p        : num [1:64, 1:10, 1:4, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ coords     : num [1:64, 1:2] 0 0.143 0.286 0.429 0.571 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:64] "1" "2" "3" "4" ...
#>   .. ..$ : NULL
#>  $ coords.full: num [1:64, 1:2] 0 0.143 0.286 0.429 0.571 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Var1" "Var2"
#>  $ w          :List of 2
#>   ..$ : num [1:3, 1:64] -0.916 1.438 -1.272 -0.895 1.235 ...
#>   ..$ : num [1:3, 1:64] -0.212 -0.463 -0.253 0.496 1.375 ...
#>  $ psi        : num [1:7, 1:64, 1:10] 0.0068 0.0979 0.6773 0.0474 0.2151 ...
#>  $ z          : int [1:7, 1:64, 1:10] 0 0 1 0 0 1 0 1 1 1 ...
#>  $ p          : num [1:7, 1:64, 1:10, 1:4] 0.864 0.431 0.68 0.745 0.784 ...
#>  $ y          : int [1:7, 1:64, 1:10, 1:4] 0 0 1 0 0 0 0 0 0 0 ...
#>  $ X.p.re     : logi NA
#>  $ X.re       : logi NA
#>  $ alpha.star : logi NA
#>  $ beta.star  : logi NA
#>  $ lambda     :List of 2
#>   ..$ : num [1:7, 1:3] 1 -0.889 -1.257 1.618 -1.195 ...
#>   ..$ : num [1:7, 1:3] 1 0.328 1.5758 0.1111 0.0119 ...
#>  $ X.w        : num [1:64, 1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ range.ind  : int [1:7, 1:64] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ eta        : num [1:7, 1:10] 0 0 0 0 0 0 0 0 0 0 ...