Simulate Multi-Species Detection-Nondetection Data from Multiple Data Sources

The function simIntMsOcc simulates multi-species detection-nondetection data from multiple data sources for simulation studies, power assessments, or function testing of integrated occupancy models. Data can optionally be simulated with a spatial Gaussian Process on the occurrence process.

Usage

simIntMsOcc(n.data, J.x, J.y, J.obs, n.rep, n.rep.max, N, beta, alpha, psi.RE = list(),
            p.RE = list(), sp = FALSE, svc.cols = 1, cov.model, sigma.sq, phi, nu,
            factor.model = FALSE, n.factors, range.probs, ...)

Arguments

n.data: an integer indicating the number of detection-nondetection data sources to simulate.
J.x: a single numeric value indicating the number of sites across the region of interest along the horizontal axis. Total number of sites across the simulated region of interest is \(J.x \times J.y\).
J.y: a single numeric value indicating the number of sites across the region of interest along the vertical axis. Total number of sites across the simulated region of interest is \(J.x \times J.y\).
J.obs: a numeric vector of length n.data containing the number of sites to simulate each data source at. Data sources can be obtained at completely different sites, the same sites, or anywhere inbetween. Maximum number of sites a given data source is available at is equal to \(J = J.x \times J.y\).
n.rep: a list of length n.data. Each element is a numeric vector with length corresponding to the number of sites that given data source is observed at (in J.obs). Each vector indicates the number of repeat visits at each of the sites for a given data source.
n.rep.max: a vector of numeric values indicating the maximum number of replicate surveys for each data set. This is an optional argument, with its default value set to max(n.rep) for each data set. This can be used to generate data sets with different types of missingness (e.g., simulate data across 20 days (replicate surveys) but sites are only sampled a maximum of ten times each).
N: a numeric vector of length N containing the number of species each data source samples. These can be the same if both data sets sample the same species, or can be different.
beta: a numeric matrix with max(N) rows containing the intercept and regression coefficient parameters for the occurrence portion of the multi-species occupancy model. Each row corresponds to the regression coefficients for a given species.
alpha: a list of length n.data. Each element is a numeric matrix with the rows corresponding to the number of species that data source contains and columns corresponding to the regression coefficients for each data source.
psi.RE: a list used to specify the non-spatial random intercepts included in the occurrence portion of the model. The list must have two tags: levels and sigma.sq.psi. levels is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept. sigma.sq.psi is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. If not specified, no random effects are included in the occurrence portion of the model.
p.RE: this argument is not currently supported. In a later version, this argument will allow for simulating data with detection random effects in the different data sources.
sp: a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to FALSE.
svc.cols: a vector indicating the variables whose effects will be estimated as spatially-varying coefficients. svc.cols is an integer vector with values indicating the order of covariates specified in the model formula (with 1 being the intercept if specified).
cov.model: a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are: "exponential", "matern", "spherical", and "gaussian".
sigma.sq: a numeric vector of length max(N) containing the spatial variance parameter for each species. Ignored when sp = FALSE or when factor.model = TRUE.
phi: a numeric vector of length max(N) containing the spatial decay parameter for each species. Ignored when sp = FALSE. If factor.model = TRUE, this should be of length n.factors.
nu: a numeric vector of length max(N) containing the spatial smoothness parameter for each species. Only used when sp = TRUE and cov.model = 'matern'. If factor.model = TRUE, this should be of length n.factors.
factor.model: a logical value indicating whether to simulate data following a factor modeling approach that explicitly incoporates species correlations. If sp = TRUE, the latent factors are simulated from independent spatial processes. If sp = FALSE, the latent factors are simulated from standard normal distributions.
n.factors: a single numeric value specifying the number of latent factors to use to simulate the data if factor.model = TRUE.
range.probs: a numeric vector of length N where each value should fall between 0 and 1, and indicates the probability that one of the J spatial locations simulated is within the simulated range of the given species. If set to 1, every species has the potential of being present at each location.
...: currently no additional arguments

Author

Jeffrey W. Doser doserjef@msu.edu,

References

Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811

Value

A list comprised of:

X.obs: a numeric design matrix for the occurrence portion of the model. This matrix contains the intercept and regression coefficients for only the observed sites.
X.pred: a numeric design matrix for the occurrence portion of the model at sites where there are no observed data sources.
X.p: a list of design matrices for the detection portions of the integrated multi-species occupancy model. Each element in the list is a design matrix of detection covariates for each data source.
coords.obs: a numeric matrix of coordinates of each observed site. Required for spatial models.
coords.pred: a numeric matrix of coordinates of each site in the study region without any data sources. Only used for spatial models.
w: a species (or factor) x site matrix of the spatial random effects for each species. Only used to simulate data when sp = TRUE. If factor.model = TRUE, the first dimension is n.factors.
w.pred: a matrix of the spatial random random effects for each species (or factor) at locations without any observation.
psi.obs: a species x site matrix of the occurrence probabilities for each species at the observed sites. Note that values are provided for all species, even if some species are only monitored at a subset of these points.
psi.pred: a species x site matrix of the occurrence probabilities for sites without any observations.
z.obs: a species x site matrix of the latent occurrence states at each observed site. Note that values are provided for all species, even if some species are only monitored at a subset of these points.
z.pred: a species x site matrix of the latent occurrence states at each site without any observations.
p: a list of detection probability arrays for each of the n.data data sources. Each array has dimensions corresponding to species, site, and replicate, respectively.
y: a list of arrays of the raw detection-nondetection data for each site and replicate combination for each species in the data set. Each array has dimensions corresponding to species, site, and replicate, respectively.

Examples

set.seed(91)
J.x <- 10
J.y <- 10
# Total number of data sources across the study region
J.all <- J.x * J.y
# Number of data sources.
n.data <- 2
# Sites for each data source.
J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE)
n.rep <- list()
n.rep[[1]] <- rep(3, J.obs[1])
n.rep[[2]] <- rep(4, J.obs[2])

# Number of species observed in each data source
N <- c(8, 3)

# Community-level covariate effects
# Occurrence
beta.mean <- c(0.2, 0.5)
p.occ <- length(beta.mean)
tau.sq.beta <- c(0.4, 0.3)
# Detection
# Detection covariates
alpha.mean <- list()
tau.sq.alpha <- list()
# Number of detection parameters in each data source
p.det.long <- c(4, 3)
for (i in 1:n.data) {
  alpha.mean[[i]] <- runif(p.det.long[i], -1, 1)
  tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1)
}
# Random effects
psi.RE <- list()
p.RE <- list()
beta <- matrix(NA, nrow = max(N), ncol = p.occ)
for (i in 1:p.occ) {
  beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i]))
}
alpha <- list()
for (i in 1:n.data) {
  alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i])
  for (t in 1:p.det.long[i]) {
    alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t])
  }
}
sp <- FALSE
factor.model <- FALSE
# Simulate occupancy data
dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y,
       J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha,
             psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model,
                   n.factors = n.factors)
str(dat)
#> List of 21
#>  $ X.obs      : num [1:57, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ X.pred     : num [1:43, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ X.p        :List of 2
#>   ..$ : num [1:32, 1:3, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
#>   ..$ : num [1:35, 1:4, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ coords.obs : num [1:57, 1:2] 0 0.222 0.333 0.444 0.667 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Var1" "Var2"
#>  $ coords.pred: num [1:43, 1:2] 0.111 0.556 0.778 1 0.222 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Var1" "Var2"
#>  $ w.obs      : logi NA
#>  $ w.pred     : logi NA
#>  $ psi.obs    : num [1:8, 1:57] 0.44 0.534 0.403 0.405 0.529 ...
#>  $ psi.pred   : num [1:8, 1:43] 0.432 0.52 0.386 0.39 0.526 ...
#>  $ z.obs      : num [1:8, 1:57] 1 1 1 0 1 1 0 0 0 1 ...
#>  $ z.pred     : num [1:8, 1:43] 1 1 1 1 1 1 0 0 1 1 ...
#>  $ p          :List of 2
#>   ..$ : num [1:8, 1:32, 1:3] 0.0497 0.7712 0.3459 0.0243 0.4086 ...
#>   ..$ : num [1:3, 1:35, 1:4] 0.484 0.813 0.694 0.423 0.704 ...
#>  $ y          :List of 2
#>   ..$ : int [1:8, 1:32, 1:3] 0 1 1 0 0 0 0 0 0 1 ...
#>   ..$ : int [1:3, 1:35, 1:4] 0 0 0 1 0 0 0 0 0 1 ...
#>  $ sites      :List of 2
#>   ..$ : num [1:32] 1 2 3 4 5 6 9 10 12 13 ...
#>   ..$ : num [1:35] 1 3 7 8 10 11 14 15 17 19 ...
#>  $ X.p.re     : logi NA
#>  $ X.re.obs   : logi NA
#>  $ X.re.pred  : logi NA
#>  $ alpha.star : logi NA
#>  $ beta.star  : logi NA
#>  $ lambda     : logi NA
#>  $ species    :List of 2
#>   ..$ : int [1:8] 1 2 3 4 5 6 7 8
#>   ..$ : int [1:3] 3 4 7