Integrated multi-species occupancy models in spOccupancy
Jeffrey W. Doser
February 20, 2023
Source:vignettes/integratedMultispecies.Rmd
integratedMultispecies.Rmd
Introduction
This vignette presents new functionality in v0.6.0 to fit an
integrated multi-species occupancy model (i.e., a multi-species
occupancy model with multiple data sources) with the
intMsPGOcc()
function. This is a single-season version of
the “integrated community occupancy model” developed by Doser et al. (2022). Often, multiple
detection-nondetection data sources are available to study the
occurrence and distribution of multiple species of interest. Such data
can arise from citizen science checklists, point count surveys, camera
traps, and autonomous acoustic recording units. A model that combines
multiple multi-species data sources together in a single modeling
framework has the potential to provide improved ecological inference as
a result of increased sample sizes, increased spatial extent, and/or
increased ability to account for sampling biases in individual data
sources (Miller et al. 2019). Integrated
multi-species occupancy models seek to combine multiple sources of
multi-species detection-nondetection data in a single hierarchical
modeling framework to provide improved inference on multiple species
occurrence patterns. The data sources may more not be replicated,
allowing for use of so-called “single-visit” data within the integrated
modeilng approach. Further, data sources can be used in the modeling
framework even if they do not all sample the entire species community.
For a more thorough description of integrated multi-species occupancy
models, see Doser et al. (2022).
Functionality for integrated multi-species occupancy models in
spOccupancy
is under active development, and the full suite
of spOccupancy
tools (e.g., cross-validation, posterior
predictive checks) are not currently available for the
intMsPGOcc()
function, and currently only a single-season,
non-spatial model is available. In this vignette, we will present a
brief overview of how to fit integrated multi-species occupancy models
using the current functionality in spOccupancy
. We will
update this vignette as we add in new functionality to fit spatial
models, as well as perform cross-validation and do posterior predictive
checks.
First we load spOccupancy
.
Example data sets: Foliage-gleaning birds at Hubbard Brook and Bartlett Forests
As an exmaple data set, we will use data from point count surveys on
twelve foliage-gleaning bird species in the Hubbard Brook Experimental
Forest (HBEF) and the National Ecological Observatory Network (NEON) at
Bartlett Forest. These are two forest patches in the White Mountain
National Forest in New Hampshire, USA (see Figure 1 in Doser et al. (2022)]. Both data sources are
provided as part of the spOccupancy
package in the objects
hbef2015
and neon2015
.
In the Hubbard Brook data set, observers performed standard point count surveys at 373 sites over three replicates, each of 10 minutes in length and with a detection radius of 100m. In the data set provided here, we converted these data to detection-nondetection data. Some sites were not visited for all three replicates. The NEON data are collected at 80 point count sites in Bartlett Forest using a removal protocol with three time periods, resulting in replicated detection-nondetection data that can be used in an occupancy modeling framework (after reducing counts to binary detection indicators). In this case, the two data sources do not overlap spatially, but rather provide information on two areas within a larger forested region of interest (i.e., White Mountain National Forest).
List of 4
$ y : num [1:12, 1:373, 1:3] 0 0 0 0 0 1 0 0 0 0 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:12] "AMRE" "BAWW" "BHVI" "BLBW" ...
.. ..$ : chr [1:373] "1" "2" "3" "4" ...
.. ..$ : chr [1:3] "1" "2" "3"
$ occ.covs: num [1:373, 1] 475 494 546 587 588 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Elevation"
$ det.covs:List of 2
..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
$ coords : num [1:373, 1:2] 280000 280000 280000 280001 280000 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:373] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "X" "Y"
str(neon2015)
List of 4
$ y : num [1:12, 1:80, 1:3] 0 0 0 0 0 0 1 0 0 0 ...
$ occ.covs: num [1:80, 1] 390 425 443 382 441 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Elevation"
$ det.covs:List of 2
..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
$ coords : num [1:80, 1:2] 318472 318722 318972 318472 318722 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:80] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "X" "Y"
The objects hbef2015
and neon2015
contain
data formatted for fitting a multi-species occupancy model in
spOccupancy
. Briefly, the object y
is a
three-dimensional array of the detection-nondetection data, with
dimensions corresponding to species, site, and replicate.
occ.covs
is a matrix or data frame with rows corresponding
to sites and columns corresponding to the possible covariates for
modeling occupancy probability. det.covs
is a list where
each element of the list consists of the possible covariates for
modeling detection probability, where site-level covariates are
specified as a one-dimensional vector while observation-level covariates
are specified as a two-dimensional matrix with rows corresponding to
sites and columns corresponding to replicates. Lastly, the
coords
object consists of the spatial coordinates for each
site. These are only necessary for spatially-explicit models in
spOccupancy
(note there is no current implementation of
spatially-explicit integrated multi-species occupancy models).
Next, let’s take a look at the total observations for each species in each data source.
# Species four-letter codes
sp.names <- dimnames(hbef2015$y)[[1]]
(sp.sums.hbef <- apply(hbef2015$y, 1, sum, na.rm = TRUE))
AMRE BAWW BHVI BLBW BLPW BTBW BTNW CAWA MAWA NAWA OVEN REVI
4 44 66 420 66 619 615 27 94 26 588 613
# Assign species names to the NEON data.
dimnames(neon2015$y)[[1]] <- sp.names
(sp.sums.neon <- apply(neon2015$y, 1, sum, na.rm = TRUE))
AMRE BAWW BHVI BLBW BLPW BTBW BTNW CAWA MAWA NAWA OVEN REVI
8 15 12 23 0 22 45 2 3 0 62 60
We see that BLPW and NAWA do not have any observations in the NEON data. Additionally, AMRE is only detected a total of 12 times between the two data sources. Here we will remove BLPW and NAWA from the NEON data set and will remove AMRE from both data sources, and thus will model a total of 11 species across the two data sources.
neon.obs.sp.indx <- which(sp.sums.neon > 0 & sp.names != 'AMRE')
hbef.obs.sp.indx <- which(sp.sums.hbef > 0 & sp.names != 'AMRE')
hbef2015$y <- hbef2015$y[hbef.obs.sp.indx, , ]
neon2015$y <- neon2015$y[neon.obs.sp.indx, , ]
str(hbef2015)
List of 4
$ y : num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
.. ..$ : chr [1:373] "1" "2" "3" "4" ...
.. ..$ : chr [1:3] "1" "2" "3"
$ occ.covs: num [1:373, 1] 475 494 546 587 588 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Elevation"
$ det.covs:List of 2
..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
$ coords : num [1:373, 1:2] 280000 280000 280000 280001 280000 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:373] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "X" "Y"
str(neon2015)
List of 4
$ y : num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
.. ..$ : NULL
.. ..$ : NULL
$ occ.covs: num [1:80, 1] 390 425 443 382 441 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Elevation"
$ det.covs:List of 2
..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
$ coords : num [1:80, 1:2] 318472 318722 318972 318472 318722 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:80] "1" "2" "3" "4" ...
.. ..$ : chr [1:2] "X" "Y"
Notice that now the two data sources have a different number of species.
Basic model description
An integrated multi-species occupancy model takes the same form as a multi-species occupancy model, with separate sub-models to describe the ecological process (occupancy) from the observation process. The only difference is that now there are multiple observation sub-models for each data source included in the model. The ecological sub-model of an integrated multi-species occupancy model is identical to that of a regular multi-species occupancy model. Let \(z_{i, j}\) be the true presence (1) or absence (0) of a species \(i\) at site \(j\), with \(j = 1, \dots, J\) and \(i = 1, \dots, N\). We model the latent occurrence of each species according to
\[\begin{equation} \begin{split} &z_{i, j} \sim \text{Bernoulli}(\psi_{i, j}), \\ &\text{logit}(\psi_{i, j}) = \boldsymbol{x}^{\top}_{j}\boldsymbol{\beta}_i, \end{split} \end{equation}\]
where \(\psi_{i, j}\) is the probability of occurrence of species \(i\) at site \(j\), which is a function of site-specific covariates \(\boldsymbol{X}\) and a vector of species-specific regression coefficients (\(\boldsymbol{\beta}_i\)). The regression coefficients in multi-species occupancy models are envisioned as random effects arising from a common community level distribution. Specifically, we have,
\[\begin{equation} \boldsymbol{\beta}_i \sim \text{Normal}(\boldsymbol{\mu}_{\beta}, \boldsymbol{T}_{\beta}), \end{equation}\]
where \(\boldsymbol{\mu}_{\beta}\) is a vector of community level mean effects for each occurrence covariate effect (including the intercept) and \(\boldsymbol{T}_{\beta}\) is a diagonal matrix with diagonal elements \(\boldsymbol{\tau}^2_{\beta}\) that represent the variability of each occurrence covariate effect among species in the community.
We do not directly observe \(z_{i, j}\), but rather we observe an imperfect representation of the latent occurrence process. In integrated models, we have \(r = 1, \dots, R\) distinct sources of data that are all imperfect representations of a single, shared occurrence process. Let \(y_{r, i, a, k}\) be the observed detection (1) or nondetection (0) of species \(i\) in data set \(r\) at site \(a\) during replicate \(k\). Note that a data source may have data for all \(i = 1, \dots, N\) species in the community, or only a subset of the species (of course, each species modeled must have data from at least one data source). Because different data sources have different variables influencing the observation process, we envision a separate detection model for each data source that is defined conditional on a single, shared ecological process described above. More specifically, for data source \(r\) we have
\[\begin{equation} \begin{split} &y_{r, i, a, k} \sim \text{Bernoulli}(p_{r, i, a, k}z_{i, j[a]}), \\ &\text{logit}(p_{r, i, a, k}) = \textbf{v}^{\top}_{r, i, a, k}\boldsymbol{\alpha}_{r, i}, \end{split} \end{equation}\]
where \(p_{r,i, a, k}\) is the probability of detecting species \(i\) at site \(a\) during replicate \(k\) (given it is present at site \(a\)) for data source \(r\), which is a function of site, replicate, and data source specific covariates \(\textbf{V}_r\), and a vector of regression coefficients specific to each data source and species (\(\boldsymbol{\alpha}_{r, i}\)). \(z_{i, j[a]}\) is the true latent occurrence status of species \(i\) at point count location \(j\) that corresponds to the \(a\)th data set location in data set \(r\). Note that species-specific coefficients are modeled as random effects arising from a common distribution with community-level mean and variance parameters, as we saw previously with the occurrence coefficients.
We assign normal priors for the occurrence and data-set specific
detection regression coefficients, and inverse-Gamma priors for the
community-level variance parameters to complete the Bayesian
specification of the model. Pólya-Gamma data augmentation is implemented
in an analogous manner to that of other spOccupancy
functions to yield an efficient implementation of multi-species
integrated occupancy models.
Fitting integrated multi-species occupancy models with
intMsPGOcc()
The function intMsPGOcc()
fits multi-species integrated
occupancy models in spOccupancy
. The function follows the
standard spOccupancy
syntax, and more details on this
syntax are available in the introductory
vignette.
intMsPGOcc(occ.formula, det.formula, data, inits, priors, n.samples,
n.omp.threads = 1, verbose = TRUE, n.report = 100,
n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1,
k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)
The data
argument contains the list of data elements
necessary for fitting an integrated multi-species occupancy model.
data
should be a list comprised of the following objects:
y
(list of detection-nondetection data matrices for each
data source), occ.covs
(data frame or matrix of covariates
for occurrence model), det.covs
(a list of lists where each
element of the list corresponds to the detection-nondetection data for
the given data source), sites
(a list where each element
consists of the site indices for the given data source), and
species
(a list where each element consists of the species
indices for the given data source).
The hbef2015
and neon2015
data sources are
currently formatted for use in standard multi-species occupancy models.
Perhaps the trickiest part of data integration is ensuring each location
in each data source lines up with the correct geographical location
where you want to determine the true presence/absence of each species of
interest. In spOccupancy
, most of this bookkeeping is done
under the hood, but we will need to combine the two data sources
together into a single list in which we are consistent about how the
data sources are sorted. To accomplish this, we recommend first creating
the occurrence covariates matrix for all data sources. Because our two
data sources do not overlap spatially, this is relatively simple and
here we can just use rbind()
.
num [1:453, 1] 475 494 546 587 588 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "Elevation"
Notice the order in which we placed these covariates: all covariate
values for HBEF come first, followed by all covariates for NEON. We need
to ensure that we use this identical ordering for all objects in the
data
list. Next, we create the site indices stored in
sites
. sites
should be a list with two
elements (one for each data source), where each element consists of a
vector that indicates the rows in occ.covs
that correspond
with the specific row of the detection-nondetection data for that data
source. When the data sources stem from distinct points (like in our
current case), this is again relatively straightforward as the indices
simply correspond to how we ordered the points in
occ.covs
.
sites.indx <- list(hbef = 1:nrow(hbef2015$occ.covs),
neon = 1:nrow(neon2015$occ.covs) + nrow(hbef2015$occ.covs))
str(sites.indx)
List of 2
$ hbef: int [1:373] 1 2 3 4 5 6 7 8 9 10 ...
$ neon: int [1:80] 374 375 376 377 378 379 380 381 382 383 ...
Next we create the combined detection-nondetection data
y
. For integrated multi-species models in
spOccupancy
, y
is a list of three-dimensional
arrays, with each array having dimensions corresponding to species,
site, and replicate. Again, we must ensure that we place the data
sources in the correct order.
List of 2
$ hbef: num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
.. ..$ : chr [1:373] "1" "2" "3" "4" ...
.. ..$ : chr [1:3] "1" "2" "3"
$ neon: num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
..- attr(*, "dimnames")=List of 3
.. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
.. ..$ : NULL
.. ..$ : NULL
The last thing we need to specify in data
for an
integrated multi-species occupancy model is the species
list, which is a list of vectors where each vector is a set of codes
that indicates the specific species in the corresponding data source. In
other words, each element of the species
list should
consist of some code to indicate the species that each row of the
corresponding array in the y
data list corresponds to. This
is necessary to link data sources that may not sample exactly the same
species, as is our case here where the NEON data only sample 9 of the 11
species we are modeling.
sp.indx <- list(hbef = sp.names[hbef.obs.sp.indx],
neon = sp.names[neon.obs.sp.indx])
sp.indx
$hbef
[1] "BAWW" "BHVI" "BLBW" "BLPW" "BTBW" "BTNW" "CAWA" "MAWA" "NAWA" "OVEN"
[11] "REVI"
$neon
[1] "BAWW" "BHVI" "BLBW" "BTBW" "BTNW" "CAWA" "MAWA" "OVEN" "REVI"
Lastly, we put the detection covariates for each data set in a single
list, and then combine all the elements together in one data list that
we will eventually supply to intMsPGOcc()
.
det.covs <- list(hbef = hbef2015$det.covs,
neon = neon2015$det.covs)
data.list <- list(y = y.full,
occ.covs = occ.covs,
det.covs = det.covs,
sites = sites.indx,
species = sp.indx)
str(data.list)
List of 5
$ y :List of 2
..$ hbef: num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
.. .. ..$ : chr [1:373] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:3] "1" "2" "3"
..$ neon: num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
.. ..- attr(*, "dimnames")=List of 3
.. .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
.. .. ..$ : NULL
.. .. ..$ : NULL
$ occ.covs: num [1:453, 1] 475 494 546 587 588 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "Elevation"
$ det.covs:List of 2
..$ hbef:List of 2
.. ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
.. ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
..$ neon:List of 2
.. ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
.. ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
$ sites :List of 2
..$ hbef: int [1:373] 1 2 3 4 5 6 7 8 9 10 ...
..$ neon: int [1:80] 374 375 376 377 378 379 380 381 382 383 ...
$ species :List of 2
..$ hbef: chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
..$ neon: chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
We will now go ahead and run the integrated multi-species occupancy
model. We will first fit a model with a linear and quadratic effect of
elevation on occurrence, and effects of day (linear and quadratic) of
the year and time of day (linear) on the detection process for each data
source. Note that there is no requirement to have the same covariates on
the detection models for each individual data source. The function
intMsPGOcc()
also allows the user to specify the priors and
initial values, which follow the same format as other
spOccupancy
functions (see the introductory
vignette for details), but here we will just use the default values,
which we see below are printed to the screen. The last thing we need to
do is specify the formulas for occurrence and detection (note we need a
separate formula for each data source for the detection models in
det.formula
, which are supplied as a list). We also specify
all the criteria of the MCMC sampler, where here we run three chains of
the model for 20,000 iterations with a burn-in period of 12,000 and a
thinning rate of 8, resulting in a total of 3000 posterior samples.
# Approx. run time: 3 minutes
out.1 <- intMsPGOcc(occ.formula = ~ scale(Elevation) + I(scale(Elevation)^2),
det.formula = list(hbef = ~ scale(day) + I(scale(day)^2) + scale(tod),
neon = ~ scale(day) + I(scale(day)^2) + scale(tod)),
data = data.list,
n.samples = 20000,
n.omp.threads = 1,
verbose = TRUE,
n.report = 4000,
n.burn = 12000,
n.thin = 8,
n.chains = 3)
----------------------------------------
Preparing to run the model
----------------------------------------
No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1
No prior specified for tau.sq.alpha.ig.
Setting prior shape and scale to 0.1
z is not specified in initial values.
Setting initial values based on observed data
beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
----------------------------------------
Model description
----------------------------------------
Integrated Multispecies Occupancy Model with Polya-Gamma latent
variable fit with 453 sites and 11 species.
Integrating 2 occupancy data sets.
Samples per Chain: 20000
Burn-in: 12000
Thinning Rate: 8
Number of Chains: 3
Total Posterior Samples: 3000
Source compiled with OpenMP support and model fit using 1 thread(s).
----------------------------------------
Chain 1
----------------------------------------
Sampling ...
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
Chain 2
----------------------------------------
Sampling ...
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
Chain 3
----------------------------------------
Sampling ...
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
summary(out.1)
Call:
intMsPGOcc(occ.formula = ~scale(Elevation) + I(scale(Elevation)^2),
det.formula = list(hbef = ~scale(day) + I(scale(day)^2) +
scale(tod), neon = ~scale(day) + I(scale(day)^2) + scale(tod)),
data = data.list, n.samples = 20000, n.omp.threads = 1, verbose = TRUE,
n.report = 4000, n.burn = 12000, n.thin = 8, n.chains = 3)
Samples per Chain: 20000
Burn-in: 12000
Thinning Rate: 8
Number of Chains: 3
Total Posterior Samples: 3000
Run Time (min): 2.9239
----------------------------------------
Community Level
----------------------------------------
Occurrence Means (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 0.4250 0.8894 -1.2971 0.4199 2.2252 1.0112 1700
scale(Elevation) 0.1581 0.4490 -0.7297 0.1567 1.0722 1.0046 1789
I(scale(Elevation)^2) -0.0189 0.2786 -0.5499 -0.0248 0.5614 1.0010 1494
Occurrence Variances (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 12.1092 7.8331 4.4598 10.1770 31.8805 1.0311 367
scale(Elevation) 2.3228 1.5024 0.7037 1.9661 5.9842 1.0085 1156
I(scale(Elevation)^2) 0.7154 0.5116 0.2127 0.5783 2.0209 1.0120 847
-----------------------------
Data source 1
-----------------------------
Detection Means (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) -0.6019 0.4281 -1.4704 -0.5945 0.2470 1.0010 3000
scale(day) 0.0610 0.0900 -0.1176 0.0600 0.2396 1.0052 3000
I(scale(day)^2) -0.0161 0.0877 -0.1836 -0.0154 0.1571 1.0005 3000
scale(tod) -0.0415 0.0781 -0.1975 -0.0386 0.1076 1.0010 3000
Detection Variances (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 2.1203 1.2034 0.7641 1.8087 5.2895 1.0036 2360
scale(day) 0.0685 0.0413 0.0240 0.0579 0.1761 1.0115 3000
I(scale(day)^2) 0.0575 0.0406 0.0181 0.0474 0.1563 1.0053 3000
scale(tod) 0.0505 0.0343 0.0163 0.0413 0.1356 1.0220 2857
-----------------------------
Data source 2
-----------------------------
Detection Means (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) -1.3792 0.3031 -1.9730 -1.3771 -0.7432 1.0017 1242
scale(day) 0.0485 0.2035 -0.3936 0.0517 0.4541 1.0016 2675
I(scale(day)^2) -0.0596 0.1614 -0.3797 -0.0610 0.2617 1.0041 2679
scale(tod) 0.0165 0.1391 -0.2593 0.0185 0.2945 1.0048 3000
Detection Variances (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept) 0.6024 0.5084 0.1282 0.4562 1.8945 1.0030 2438
scale(day) 0.2440 0.2500 0.0395 0.1675 0.8841 1.0044 2551
I(scale(day)^2) 0.1459 0.1525 0.0287 0.1030 0.5139 1.0037 2600
scale(tod) 0.0995 0.0885 0.0238 0.0742 0.3388 1.0164 3016
----------------------------------------
Species Level
----------------------------------------
Occurrence (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept)-BAWW 1.1988 1.5309 -0.6088 0.7192 5.1996 1.1247 67
(Intercept)-BHVI 2.9251 2.4270 -0.1303 2.3344 8.9583 1.7024 35
(Intercept)-BLBW 2.5732 0.6180 1.6854 2.4656 4.1286 1.0166 530
(Intercept)-BLPW -5.0370 0.6278 -6.3493 -5.0015 -3.8924 1.0055 930
(Intercept)-BTBW 3.6727 0.4822 2.8651 3.6235 4.7321 1.0162 1240
(Intercept)-BTNW 2.1894 0.3059 1.6735 2.1605 2.8602 1.0020 2040
(Intercept)-CAWA -2.3058 0.4757 -3.1522 -2.3350 -1.2657 1.0280 496
(Intercept)-MAWA -1.9971 0.2633 -2.5237 -1.9895 -1.4882 1.0009 2716
(Intercept)-NAWA -2.7928 0.5234 -3.8109 -2.8120 -1.7556 1.0019 705
(Intercept)-OVEN 2.5768 0.3073 2.0316 2.5540 3.2370 0.9996 2205
(Intercept)-REVI 3.5437 0.5116 2.6758 3.4933 4.6646 1.0019 472
scale(Elevation)-BAWW -0.1730 0.5909 -1.5648 -0.0972 0.7321 1.1514 141
scale(Elevation)-BHVI 0.2507 0.9579 -1.6886 0.2489 2.3567 1.0221 206
scale(Elevation)-BLBW -0.2465 0.2451 -0.7377 -0.2175 0.1189 1.0236 536
scale(Elevation)-BLPW 2.2139 0.6078 1.2709 2.1377 3.6573 0.9996 582
scale(Elevation)-BTBW -0.1442 0.1588 -0.4794 -0.1409 0.1539 1.0020 2191
scale(Elevation)-BTNW 0.6496 0.2674 0.2117 0.6285 1.2447 1.0009 1641
scale(Elevation)-CAWA 1.0965 0.4752 0.4084 1.0164 2.2491 1.0217 236
scale(Elevation)-MAWA 1.1845 0.2360 0.7745 1.1693 1.6919 1.0026 2013
scale(Elevation)-NAWA 1.0202 0.3965 0.3465 0.9897 1.8688 1.0011 1380
scale(Elevation)-OVEN -1.7620 0.3434 -2.5742 -1.7114 -1.2170 1.0049 942
scale(Elevation)-REVI -2.1784 0.7255 -3.8453 -2.0875 -1.0567 1.0126 292
I(scale(Elevation)^2)-BAWW -0.7774 0.4385 -1.8010 -0.7213 -0.0871 1.0590 200
I(scale(Elevation)^2)-BHVI 0.4661 0.7257 -0.8097 0.3970 2.0993 1.0424 289
I(scale(Elevation)^2)-BLBW -0.6044 0.2064 -1.0525 -0.5915 -0.2246 1.0076 1275
I(scale(Elevation)^2)-BLPW 1.0183 0.3870 0.1976 1.0216 1.7291 1.0037 964
I(scale(Elevation)^2)-BTBW -1.1203 0.1827 -1.4973 -1.1081 -0.7877 1.0131 1778
I(scale(Elevation)^2)-BTNW 0.0104 0.1998 -0.3471 -0.0050 0.4334 1.0017 2192
I(scale(Elevation)^2)-CAWA 0.2715 0.3718 -0.3044 0.2234 1.1928 1.0286 287
I(scale(Elevation)^2)-MAWA 0.6836 0.2094 0.2905 0.6745 1.1157 1.0031 1811
I(scale(Elevation)^2)-NAWA 0.3856 0.2791 -0.1649 0.3874 0.9449 1.0010 1572
I(scale(Elevation)^2)-OVEN -0.4161 0.2191 -0.8168 -0.4282 0.0523 1.0137 1609
I(scale(Elevation)^2)-REVI -0.0625 0.3544 -0.6845 -0.0797 0.6820 1.0138 401
-----------------------------
Data source 1
-----------------------------
Detection (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept)-BAWW -2.5022 0.4386 -3.2806 -2.5218 -1.6380 1.0398 110
(Intercept)-BHVI -2.6860 0.2529 -3.1442 -2.7007 -2.1490 1.1518 182
(Intercept)-BLBW -0.0399 0.1244 -0.2854 -0.0421 0.2096 1.0006 1223
(Intercept)-BLPW -0.4520 0.2618 -0.9694 -0.4458 0.0385 1.0008 3000
(Intercept)-BTBW 0.6306 0.1074 0.4204 0.6292 0.8420 1.0103 3000
(Intercept)-BTNW 0.5787 0.1118 0.3615 0.5771 0.8050 1.0048 2622
(Intercept)-CAWA -1.7370 0.5638 -2.8646 -1.7198 -0.7355 1.0224 292
(Intercept)-MAWA -0.8431 0.2388 -1.3081 -0.8467 -0.3704 1.0029 2064
(Intercept)-NAWA -1.4722 0.4742 -2.4247 -1.4711 -0.5801 1.0005 687
(Intercept)-OVEN 0.7896 0.1205 0.5574 0.7882 1.0196 1.0017 3000
(Intercept)-REVI 0.5471 0.1073 0.3407 0.5455 0.7578 1.0007 3000
scale(day)-BAWW 0.2138 0.1398 -0.0567 0.2105 0.4997 0.9997 2652
scale(day)-BHVI 0.2011 0.1181 -0.0276 0.1985 0.4355 1.0011 2771
scale(day)-BLBW -0.2232 0.0678 -0.3554 -0.2240 -0.0897 1.0015 3185
scale(day)-BLPW 0.0660 0.1605 -0.2472 0.0672 0.3704 1.0035 3000
scale(day)-BTBW 0.2690 0.0672 0.1359 0.2707 0.4007 1.0013 3000
scale(day)-BTNW 0.1511 0.0658 0.0230 0.1514 0.2785 1.0025 3000
scale(day)-CAWA -0.0081 0.1763 -0.3502 -0.0061 0.3353 1.0055 3000
scale(day)-MAWA 0.1129 0.1234 -0.1291 0.1126 0.3607 0.9998 3000
scale(day)-NAWA 0.0216 0.1768 -0.3301 0.0231 0.3548 1.0004 3000
scale(day)-OVEN -0.0756 0.0735 -0.2211 -0.0775 0.0710 1.0020 3000
scale(day)-REVI -0.0688 0.0673 -0.2035 -0.0681 0.0585 1.0000 3018
I(scale(day)^2)-BAWW -0.0318 0.1498 -0.3303 -0.0294 0.2555 1.0029 2834
I(scale(day)^2)-BHVI 0.0589 0.1266 -0.1929 0.0591 0.3088 1.0031 3000
I(scale(day)^2)-BLBW -0.1703 0.0785 -0.3306 -0.1701 -0.0159 1.0006 3000
I(scale(day)^2)-BLPW 0.2031 0.1851 -0.1340 0.1932 0.5969 1.0016 3000
I(scale(day)^2)-BTBW -0.0602 0.0790 -0.2161 -0.0604 0.0939 1.0008 3000
I(scale(day)^2)-BTNW -0.0607 0.0782 -0.2152 -0.0589 0.0898 1.0005 3000
I(scale(day)^2)-CAWA -0.0370 0.1805 -0.3859 -0.0345 0.3167 1.0017 3000
I(scale(day)^2)-MAWA 0.0117 0.1364 -0.2529 0.0106 0.2781 1.0032 3000
I(scale(day)^2)-NAWA -0.1329 0.1872 -0.5185 -0.1258 0.2175 1.0003 2725
I(scale(day)^2)-OVEN 0.0142 0.0876 -0.1595 0.0149 0.1816 1.0001 3000
I(scale(day)^2)-REVI 0.0367 0.0773 -0.1121 0.0356 0.1863 1.0015 3000
scale(tod)-BAWW -0.1786 0.1369 -0.4578 -0.1749 0.0761 1.0024 3000
scale(tod)-BHVI -0.0512 0.1137 -0.2722 -0.0510 0.1713 1.0015 3000
scale(tod)-BLBW 0.0580 0.0656 -0.0663 0.0585 0.1843 1.0032 3000
scale(tod)-BLPW 0.1070 0.1481 -0.1849 0.1033 0.4074 0.9999 3000
scale(tod)-BTBW -0.0348 0.0665 -0.1651 -0.0337 0.0987 0.9999 3000
scale(tod)-BTNW 0.0355 0.0631 -0.0900 0.0377 0.1557 1.0032 3000
scale(tod)-CAWA -0.2139 0.1627 -0.5494 -0.2067 0.0930 1.0030 3000
scale(tod)-MAWA 0.0203 0.1125 -0.2088 0.0206 0.2316 1.0006 3000
scale(tod)-NAWA -0.0694 0.1590 -0.3840 -0.0686 0.2340 1.0007 3000
scale(tod)-OVEN -0.0460 0.0735 -0.1900 -0.0453 0.0932 1.0025 3000
scale(tod)-REVI -0.0786 0.0650 -0.2082 -0.0781 0.0501 1.0044 2784
-----------------------------
Data source 2
-----------------------------
Detection (logit scale):
Mean SD 2.5% 50% 97.5% Rhat ESS
(Intercept)-BAWW -1.3422 0.7565 -2.6152 -1.4268 0.3398 1.0405 108
(Intercept)-BHVI -2.1563 0.3995 -2.9256 -2.1575 -1.3695 1.0824 388
(Intercept)-BLBW -2.0784 0.3248 -2.7448 -2.0587 -1.4949 1.0030 2416
(Intercept)-BTBW -1.7732 0.2859 -2.3380 -1.7654 -1.2359 1.0000 2839
(Intercept)-BTNW -1.0066 0.2474 -1.4946 -1.0097 -0.5154 1.0012 3445
(Intercept)-CAWA -1.3018 0.6855 -2.6172 -1.3062 0.0931 1.0064 1425
(Intercept)-MAWA -1.3798 0.5662 -2.4973 -1.3848 -0.2319 1.0014 3000
(Intercept)-OVEN -0.7468 0.2026 -1.1468 -0.7449 -0.3612 1.0007 2834
(Intercept)-REVI -0.8568 0.2032 -1.2533 -0.8549 -0.4725 0.9997 3000
scale(day)-BAWW -0.2881 0.3448 -1.0203 -0.2640 0.3384 1.0066 2676
scale(day)-BHVI 0.1894 0.3037 -0.3647 0.1714 0.8397 1.0020 2511
scale(day)-BLBW 0.5813 0.2838 0.0889 0.5571 1.1955 1.0019 2711
scale(day)-BTBW -0.1391 0.2434 -0.6376 -0.1369 0.3291 1.0009 2533
scale(day)-BTNW 0.0763 0.1744 -0.2687 0.0763 0.4203 1.0016 3000
scale(day)-CAWA 0.1999 0.4387 -0.6459 0.1899 1.1210 1.0020 2847
scale(day)-MAWA -0.3311 0.4560 -1.3440 -0.2963 0.4561 1.0010 2568
scale(day)-OVEN 0.1403 0.1747 -0.1946 0.1356 0.4859 1.0008 3000
scale(day)-REVI -0.0334 0.1595 -0.3506 -0.0337 0.2810 1.0026 3000
I(scale(day)^2)-BAWW -0.0217 0.2623 -0.5043 -0.0354 0.5468 1.0122 1241
I(scale(day)^2)-BHVI -0.4430 0.3248 -1.1467 -0.4100 0.0884 1.0081 1961
I(scale(day)^2)-BLBW 0.0178 0.2208 -0.4245 0.0233 0.4400 1.0030 3000
I(scale(day)^2)-BTBW -0.2233 0.2098 -0.6723 -0.2056 0.1621 1.0009 3000
I(scale(day)^2)-BTNW 0.1826 0.1648 -0.1305 0.1785 0.5172 1.0036 3553
I(scale(day)^2)-CAWA 0.0126 0.3392 -0.6201 -0.0021 0.7203 1.0047 2777
I(scale(day)^2)-MAWA 0.0129 0.2751 -0.5043 0.0050 0.5714 1.0034 3000
I(scale(day)^2)-OVEN -0.1028 0.1593 -0.4218 -0.0975 0.2051 1.0022 3000
I(scale(day)^2)-REVI 0.0075 0.1478 -0.2905 0.0082 0.2944 1.0005 3422
scale(tod)-BAWW -0.1095 0.2622 -0.6350 -0.1060 0.3984 1.0022 2714
scale(tod)-BHVI -0.1328 0.2293 -0.6057 -0.1283 0.2954 1.0012 2809
scale(tod)-BLBW -0.0017 0.1906 -0.3788 -0.0004 0.3605 1.0033 3203
scale(tod)-BTBW 0.1257 0.1991 -0.2570 0.1265 0.5349 1.0018 3000
scale(tod)-BTNW 0.0632 0.1666 -0.2647 0.0645 0.3971 1.0010 3000
scale(tod)-CAWA -0.0107 0.2896 -0.5933 -0.0025 0.5442 1.0040 3000
scale(tod)-MAWA 0.0706 0.3102 -0.5367 0.0662 0.7071 1.0098 3000
scale(tod)-OVEN 0.0840 0.1502 -0.2052 0.0834 0.3792 1.0014 3044
scale(tod)-REVI 0.0678 0.1452 -0.2253 0.0668 0.3459 0.9999 2404
From the summary output, we can see that most of the model parameters
have converged. We can compare this model to a simpler model that
assumes only a linear effect of elevation, and then compare the two
models using WAIC with the waicOcc()
function.
# Approx. run time: 3 minutes
out.2 <- intMsPGOcc(occ.formula = ~ scale(Elevation),
det.formula = list(hbef = ~ scale(day) + I(scale(day)^2) + scale(tod),
neon = ~ scale(day) + I(scale(day)^2) + scale(tod)),
data = data.list,
n.samples = 20000,
n.omp.threads = 1,
verbose = TRUE,
n.report = 4000,
n.burn = 12000,
n.thin = 8,
n.chains = 3)
----------------------------------------
Preparing to run the model
----------------------------------------
No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72
No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1
No prior specified for tau.sq.alpha.ig.
Setting prior shape and scale to 0.1
z is not specified in initial values.
Setting initial values based on observed data
beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10
beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution
----------------------------------------
Model description
----------------------------------------
Integrated Multispecies Occupancy Model with Polya-Gamma latent
variable fit with 453 sites and 11 species.
Integrating 2 occupancy data sets.
Samples per Chain: 20000
Burn-in: 12000
Thinning Rate: 8
Number of Chains: 3
Total Posterior Samples: 3000
Source compiled with OpenMP support and model fit using 1 thread(s).
----------------------------------------
Chain 1
----------------------------------------
Sampling ...
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
Chain 2
----------------------------------------
Sampling ...
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
Chain 3
----------------------------------------
Sampling ...
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
# Model selection using WAIC
waicOcc(out.1)
elpd pD WAIC
-5121.01590 84.07401 10410.17981
waicOcc(out.2)
elpd pD WAIC
-5154.02429 78.28919 10464.62696
We see the model with both a linear and quadratic effect of elevation
is preferred. We can also do prediction using the predict()
function to generate species distribution maps across a region of
interest (or generate estimates of community-level metrics), which
follows exactly the format used for single data-source multi-species
occupancy models.
As of v0.7.0
, we can also calculate WAIC individually
for each species using the by.sp
argument
waicOcc(out.1, by.sp = TRUE)
elpd pD WAIC
1 -228.4167 8.998136 474.8296
2 -290.3455 6.682839 594.0567
3 -767.1750 9.762715 1553.8755
4 -139.4260 4.896930 288.6459
5 -764.0191 9.254624 1546.5475
6 -842.9051 9.198185 1704.2065
7 -125.6505 7.631983 266.5650
8 -289.2931 6.661293 591.9088
9 -106.6762 5.372210 224.0968
10 -751.3157 7.632899 1517.8972
11 -815.5886 8.661218 1648.4997
waicOcc(out.2, by.sp = TRUE)
elpd pD WAIC
1 -230.7406 8.463095 478.4074
2 -290.6858 5.910009 593.1917
3 -771.3186 11.275470 1565.1882
4 -141.0589 5.132568 292.3829
5 -778.8126 8.088996 1573.8033
6 -842.8739 8.141197 1702.0302
7 -126.4658 5.650348 264.2323
8 -296.3188 6.015551 604.6688
9 -107.5462 4.680241 224.4530
10 -752.5325 7.548766 1520.1626
11 -815.5069 8.188468 1647.3908
Active development
The functionality for integrated multi-species occupancy models in
spOccupancy
is in active development, and currently some
functionality that is common to all other spOccupancy
model
fitting functions is not available for intMsPGOcc()
(i.e.,
posterior predictive checks, cross-validation, random effects on
detection probability). I am actively working on incorporating these
extensions into the model. Additionally, there will of course be
spatially-explicit versions of these models available at some point in
the future as well, with options to accommodate species correlations
using the factor modeling approach detailed in Doser, Finley, and Banerjee (2023).