Simulate Multi-Species Detection-Nondetection Data from Multiple Data Sources
simIntMsOcc.Rd
The function simIntMsOcc
simulates multi-species detection-nondetection data from multiple data sources for simulation studies, power assessments, or function testing of integrated occupancy models. Data can optionally be simulated with a spatial Gaussian Process on the occurrence process.
Arguments
- n.data
an integer indicating the number of detection-nondetection data sources to simulate.
- J.x
a single numeric value indicating the number of sites across the region of interest along the horizontal axis. Total number of sites across the simulated region of interest is \(J.x \times J.y\).
- J.y
a single numeric value indicating the number of sites across the region of interest along the vertical axis. Total number of sites across the simulated region of interest is \(J.x \times J.y\).
- J.obs
a numeric vector of length
n.data
containing the number of sites to simulate each data source at. Data sources can be obtained at completely different sites, the same sites, or anywhere inbetween. Maximum number of sites a given data source is available at is equal to \(J = J.x \times J.y\).- n.rep
a list of length
n.data
. Each element is a numeric vector with length corresponding to the number of sites that given data source is observed at (inJ.obs
). Each vector indicates the number of repeat visits at each of the sites for a given data source.- n.rep.max
a vector of numeric values indicating the maximum number of replicate surveys for each data set. This is an optional argument, with its default value set to
max(n.rep)
for each data set. This can be used to generate data sets with different types of missingness (e.g., simulate data across 20 days (replicate surveys) but sites are only sampled a maximum of ten times each).- N
a numeric vector of length
N
containing the number of species each data source samples. These can be the same if both data sets sample the same species, or can be different.- beta
a numeric matrix with
max(N)
rows containing the intercept and regression coefficient parameters for the occurrence portion of the multi-species occupancy model. Each row corresponds to the regression coefficients for a given species.- alpha
a list of length
n.data
. Each element is a numeric matrix with the rows corresponding to the number of species that data source contains and columns corresponding to the regression coefficients for each data source.- psi.RE
a list used to specify the non-spatial random intercepts included in the occurrence portion of the model. The list must have two tags:
levels
andsigma.sq.psi
.levels
is a vector of length equal to the number of distinct random intercepts to include in the model and contains the number of levels there are in each intercept.sigma.sq.psi
is a vector of length equal to the number of distinct random intercepts to include in the model and contains the variances for each random effect. If not specified, no random effects are included in the occurrence portion of the model.- p.RE
this argument is not currently supported. In a later version, this argument will allow for simulating data with detection random effects in the different data sources.
- sp
a logical value indicating whether to simulate a spatially-explicit occupancy model with a Gaussian process. By default set to
FALSE
.- svc.cols
a vector indicating the variables whose effects will be estimated as spatially-varying coefficients.
svc.cols
is an integer vector with values indicating the order of covariates specified in the model formula (with 1 being the intercept if specified).- cov.model
a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the latent occurrence values. Supported covariance model key words are:
"exponential"
,"matern"
,"spherical"
, and"gaussian"
.- sigma.sq
a numeric vector of length
max(N)
containing the spatial variance parameter for each species. Ignored whensp = FALSE
or whenfactor.model = TRUE
.- phi
a numeric vector of length
max(N)
containing the spatial decay parameter for each species. Ignored whensp = FALSE
. Iffactor.model = TRUE
, this should be of lengthn.factors
.- nu
a numeric vector of length
max(N)
containing the spatial smoothness parameter for each species. Only used whensp = TRUE
andcov.model = 'matern'
. Iffactor.model = TRUE
, this should be of lengthn.factors
.- factor.model
a logical value indicating whether to simulate data following a factor modeling approach that explicitly incoporates species correlations. If
sp = TRUE
, the latent factors are simulated from independent spatial processes. Ifsp = FALSE
, the latent factors are simulated from standard normal distributions.- n.factors
a single numeric value specifying the number of latent factors to use to simulate the data if
factor.model = TRUE
.- range.probs
a numeric vector of length
N
where each value should fall between 0 and 1, and indicates the probability that one of theJ
spatial locations simulated is within the simulated range of the given species. If set to 1, every species has the potential of being present at each location.- ...
currently no additional arguments
Author
Jeffrey W. Doser doserjef@msu.edu,
References
Doser, J. W., Leuenberger, W., Sillett, T. S., Hallworth, M. T. & Zipkin, E. F. (2022). Integrated community occupancy models: A framework to assess occurrence and biodiversity dynamics using multiple data sources. Methods in Ecology and Evolution, 00, 1-14. doi:10.1111/2041-210X.13811
Value
A list comprised of:
- X.obs
a numeric design matrix for the occurrence portion of the model. This matrix contains the intercept and regression coefficients for only the observed sites.
- X.pred
a numeric design matrix for the occurrence portion of the model at sites where there are no observed data sources.
- X.p
a list of design matrices for the detection portions of the integrated multi-species occupancy model. Each element in the list is a design matrix of detection covariates for each data source.
- coords.obs
a numeric matrix of coordinates of each observed site. Required for spatial models.
- coords.pred
a numeric matrix of coordinates of each site in the study region without any data sources. Only used for spatial models.
- w
a species (or factor) x site matrix of the spatial random effects for each species. Only used to simulate data when
sp = TRUE
. Iffactor.model = TRUE
, the first dimension isn.factors
.- w.pred
a matrix of the spatial random random effects for each species (or factor) at locations without any observation.
- psi.obs
a species x site matrix of the occurrence probabilities for each species at the observed sites. Note that values are provided for all species, even if some species are only monitored at a subset of these points.
- psi.pred
a species x site matrix of the occurrence probabilities for sites without any observations.
- z.obs
a species x site matrix of the latent occurrence states at each observed site. Note that values are provided for all species, even if some species are only monitored at a subset of these points.
- z.pred
a species x site matrix of the latent occurrence states at each site without any observations.
- p
a list of detection probability arrays for each of the
n.data
data sources. Each array has dimensions corresponding to species, site, and replicate, respectively.- y
a list of arrays of the raw detection-nondetection data for each site and replicate combination for each species in the data set. Each array has dimensions corresponding to species, site, and replicate, respectively.
Examples
set.seed(91)
J.x <- 10
J.y <- 10
# Total number of data sources across the study region
J.all <- J.x * J.y
# Number of data sources.
n.data <- 2
# Sites for each data source.
J.obs <- sample(ceiling(0.2 * J.all):ceiling(0.5 * J.all), n.data, replace = TRUE)
n.rep <- list()
n.rep[[1]] <- rep(3, J.obs[1])
n.rep[[2]] <- rep(4, J.obs[2])
# Number of species observed in each data source
N <- c(8, 3)
# Community-level covariate effects
# Occurrence
beta.mean <- c(0.2, 0.5)
p.occ <- length(beta.mean)
tau.sq.beta <- c(0.4, 0.3)
# Detection
# Detection covariates
alpha.mean <- list()
tau.sq.alpha <- list()
# Number of detection parameters in each data source
p.det.long <- c(4, 3)
for (i in 1:n.data) {
alpha.mean[[i]] <- runif(p.det.long[i], -1, 1)
tau.sq.alpha[[i]] <- runif(p.det.long[i], 0.1, 1)
}
# Random effects
psi.RE <- list()
p.RE <- list()
beta <- matrix(NA, nrow = max(N), ncol = p.occ)
for (i in 1:p.occ) {
beta[, i] <- rnorm(max(N), beta.mean[i], sqrt(tau.sq.beta[i]))
}
alpha <- list()
for (i in 1:n.data) {
alpha[[i]] <- matrix(NA, nrow = N[i], ncol = p.det.long[i])
for (t in 1:p.det.long[i]) {
alpha[[i]][, t] <- rnorm(N[i], alpha.mean[[i]][t], sqrt(tau.sq.alpha[[i]])[t])
}
}
sp <- FALSE
factor.model <- FALSE
# Simulate occupancy data
dat <- simIntMsOcc(n.data = n.data, J.x = J.x, J.y = J.y,
J.obs = J.obs, n.rep = n.rep, N = N, beta = beta, alpha = alpha,
psi.RE = psi.RE, p.RE = p.RE, sp = sp, factor.model = factor.model,
n.factors = n.factors)
str(dat)
#> List of 21
#> $ X.obs : num [1:57, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#> $ X.pred : num [1:43, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#> $ X.p :List of 2
#> ..$ : num [1:32, 1:3, 1:4] 1 1 1 1 1 1 1 1 1 1 ...
#> ..$ : num [1:35, 1:4, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
#> $ coords.obs : num [1:57, 1:2] 0 0.222 0.333 0.444 0.667 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "Var1" "Var2"
#> $ coords.pred: num [1:43, 1:2] 0.111 0.556 0.778 1 0.222 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "Var1" "Var2"
#> $ w.obs : logi NA
#> $ w.pred : logi NA
#> $ psi.obs : num [1:8, 1:57] 0.44 0.534 0.403 0.405 0.529 ...
#> $ psi.pred : num [1:8, 1:43] 0.432 0.52 0.386 0.39 0.526 ...
#> $ z.obs : num [1:8, 1:57] 1 1 1 0 1 1 0 0 0 1 ...
#> $ z.pred : num [1:8, 1:43] 1 1 1 1 1 1 0 0 1 1 ...
#> $ p :List of 2
#> ..$ : num [1:8, 1:32, 1:3] 0.0497 0.7712 0.3459 0.0243 0.4086 ...
#> ..$ : num [1:3, 1:35, 1:4] 0.484 0.813 0.694 0.423 0.704 ...
#> $ y :List of 2
#> ..$ : int [1:8, 1:32, 1:3] 0 1 1 0 0 0 0 0 0 1 ...
#> ..$ : int [1:3, 1:35, 1:4] 0 0 0 1 0 0 0 0 0 1 ...
#> $ sites :List of 2
#> ..$ : num [1:32] 1 2 3 4 5 6 9 10 12 13 ...
#> ..$ : num [1:35] 1 3 7 8 10 11 14 15 17 19 ...
#> $ X.p.re : logi NA
#> $ X.re.obs : logi NA
#> $ X.re.pred : logi NA
#> $ alpha.star : logi NA
#> $ beta.star : logi NA
#> $ lambda : logi NA
#> $ species :List of 2
#> ..$ : int [1:8] 1 2 3 4 5 6 7 8
#> ..$ : int [1:3] 3 4 7