Integrated multi-species occupancy models in spOccupancy

Introduction

This vignette presents new functionality in v0.6.0 to fit an integrated multi-species occupancy model (i.e., a multi-species occupancy model with multiple data sources) with the intMsPGOcc() function. This is a single-season version of the “integrated community occupancy model” developed by Doser et al. (2022). Often, multiple detection-nondetection data sources are available to study the occurrence and distribution of multiple species of interest. Such data can arise from citizen science checklists, point count surveys, camera traps, and autonomous acoustic recording units. A model that combines multiple multi-species data sources together in a single modeling framework has the potential to provide improved ecological inference as a result of increased sample sizes, increased spatial extent, and/or increased ability to account for sampling biases in individual data sources (Miller et al. 2019). Integrated multi-species occupancy models seek to combine multiple sources of multi-species detection-nondetection data in a single hierarchical modeling framework to provide improved inference on multiple species occurrence patterns. The data sources may more not be replicated, allowing for use of so-called “single-visit” data within the integrated modeilng approach. Further, data sources can be used in the modeling framework even if they do not all sample the entire species community. For a more thorough description of integrated multi-species occupancy models, see Doser et al. (2022).

Functionality for integrated multi-species occupancy models in spOccupancy is under active development, and the full suite of spOccupancy tools (e.g., cross-validation, posterior predictive checks) are not currently available for the intMsPGOcc() function, and currently only a single-season, non-spatial model is available. In this vignette, we will present a brief overview of how to fit integrated multi-species occupancy models using the current functionality in spOccupancy. We will update this vignette as we add in new functionality to fit spatial models, as well as perform cross-validation and do posterior predictive checks.

First we load spOccupancy.

library(spOccupancy)

Example data sets: Foliage-gleaning birds at Hubbard Brook and Bartlett Forests

As an exmaple data set, we will use data from point count surveys on twelve foliage-gleaning bird species in the Hubbard Brook Experimental Forest (HBEF) and the National Ecological Observatory Network (NEON) at Bartlett Forest. These are two forest patches in the White Mountain National Forest in New Hampshire, USA (see Figure 1 in Doser et al. (2022)]. Both data sources are provided as part of the spOccupancy package in the objects hbef2015 and neon2015.

In the Hubbard Brook data set, observers performed standard point count surveys at 373 sites over three replicates, each of 10 minutes in length and with a detection radius of 100m. In the data set provided here, we converted these data to detection-nondetection data. Some sites were not visited for all three replicates. The NEON data are collected at 80 point count sites in Bartlett Forest using a removal protocol with three time periods, resulting in replicated detection-nondetection data that can be used in an occupancy modeling framework (after reducing counts to binary detection indicators). In this case, the two data sources do not overlap spatially, but rather provide information on two areas within a larger forested region of interest (i.e., White Mountain National Forest).

data(hbef2015)
data(neon2015)
# Take a look at the two data sources
str(hbef2015)

List of 4
 $ y       : num [1:12, 1:373, 1:3] 0 0 0 0 0 1 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:12] "AMRE" "BAWW" "BHVI" "BLBW" ...
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ occ.covs: num [1:373, 1] 475 494 546 587 588 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
  ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
 $ coords  : num [1:373, 1:2] 280000 280000 280000 280001 280000 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"

str(neon2015)

List of 4
 $ y       : num [1:12, 1:80, 1:3] 0 0 0 0 0 0 1 0 0 0 ...
 $ occ.covs: num [1:80, 1] 390 425 443 382 441 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
  ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
 $ coords  : num [1:80, 1:2] 318472 318722 318972 318472 318722 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:80] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"

The objects hbef2015 and neon2015 contain data formatted for fitting a multi-species occupancy model in spOccupancy. Briefly, the object y is a three-dimensional array of the detection-nondetection data, with dimensions corresponding to species, site, and replicate. occ.covs is a matrix or data frame with rows corresponding to sites and columns corresponding to the possible covariates for modeling occupancy probability. det.covs is a list where each element of the list consists of the possible covariates for modeling detection probability, where site-level covariates are specified as a one-dimensional vector while observation-level covariates are specified as a two-dimensional matrix with rows corresponding to sites and columns corresponding to replicates. Lastly, the coords object consists of the spatial coordinates for each site. These are only necessary for spatially-explicit models in spOccupancy (note there is no current implementation of spatially-explicit integrated multi-species occupancy models).

Next, let’s take a look at the total observations for each species in each data source.

# Species four-letter codes
sp.names <- dimnames(hbef2015$y)[[1]]
(sp.sums.hbef <- apply(hbef2015$y, 1, sum, na.rm = TRUE))

AMRE BAWW BHVI BLBW BLPW BTBW BTNW CAWA MAWA NAWA OVEN REVI 
   4   44   66  420   66  619  615   27   94   26  588  613

# Assign species names to the NEON data.
dimnames(neon2015$y)[[1]] <- sp.names
(sp.sums.neon <- apply(neon2015$y, 1, sum, na.rm = TRUE))

AMRE BAWW BHVI BLBW BLPW BTBW BTNW CAWA MAWA NAWA OVEN REVI 
   8   15   12   23    0   22   45    2    3    0   62   60

We see that BLPW and NAWA do not have any observations in the NEON data. Additionally, AMRE is only detected a total of 12 times between the two data sources. Here we will remove BLPW and NAWA from the NEON data set and will remove AMRE from both data sources, and thus will model a total of 11 species across the two data sources.

neon.obs.sp.indx <- which(sp.sums.neon > 0 & sp.names != 'AMRE')
hbef.obs.sp.indx <- which(sp.sums.hbef > 0 & sp.names != 'AMRE')
hbef2015$y <- hbef2015$y[hbef.obs.sp.indx, , ]
neon2015$y <- neon2015$y[neon.obs.sp.indx, , ]
str(hbef2015)

List of 4
 $ y       : num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ occ.covs: num [1:373, 1] 475 494 546 587 588 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
  ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
 $ coords  : num [1:373, 1:2] 280000 280000 280000 280001 280000 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"

str(neon2015)

List of 4
 $ y       : num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
  .. ..$ : NULL
  .. ..$ : NULL
 $ occ.covs: num [1:80, 1] 390 425 443 382 441 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
  ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
 $ coords  : num [1:80, 1:2] 318472 318722 318972 318472 318722 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:80] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "X" "Y"

Notice that now the two data sources have a different number of species.

Basic model description

An integrated multi-species occupancy model takes the same form as a multi-species occupancy model, with separate sub-models to describe the ecological process (occupancy) from the observation process. The only difference is that now there are multiple observation sub-models for each data source included in the model. The ecological sub-model of an integrated multi-species occupancy model is identical to that of a regular multi-species occupancy model. Let \(z_{i, j}\) be the true presence (1) or absence (0) of a species \(i\) at site \(j\), with \(j = 1, \dots, J\) and \(i = 1, \dots, N\). We model the latent occurrence of each species according to

\[\begin{equation} \begin{split} &z_{i, j} \sim \text{Bernoulli}(\psi_{i, j}), \\ &\text{logit}(\psi_{i, j}) = \boldsymbol{x}^{\top}_{j}\boldsymbol{\beta}_i, \end{split} \end{equation}\]

where \(\psi_{i, j}\) is the probability of occurrence of species \(i\) at site \(j\), which is a function of site-specific covariates \(\boldsymbol{X}\) and a vector of species-specific regression coefficients (\(\boldsymbol{\beta}_i\)). The regression coefficients in multi-species occupancy models are envisioned as random effects arising from a common community level distribution. Specifically, we have,

\[\begin{equation} \boldsymbol{\beta}_i \sim \text{Normal}(\boldsymbol{\mu}_{\beta}, \boldsymbol{T}_{\beta}), \end{equation}\]

where \(\boldsymbol{\mu}_{\beta}\) is a vector of community level mean effects for each occurrence covariate effect (including the intercept) and \(\boldsymbol{T}_{\beta}\) is a diagonal matrix with diagonal elements \(\boldsymbol{\tau}^2_{\beta}\) that represent the variability of each occurrence covariate effect among species in the community.

We do not directly observe \(z_{i, j}\), but rather we observe an imperfect representation of the latent occurrence process. In integrated models, we have \(r = 1, \dots, R\) distinct sources of data that are all imperfect representations of a single, shared occurrence process. Let \(y_{r, i, a, k}\) be the observed detection (1) or nondetection (0) of species \(i\) in data set \(r\) at site \(a\) during replicate \(k\). Note that a data source may have data for all \(i = 1, \dots, N\) species in the community, or only a subset of the species (of course, each species modeled must have data from at least one data source). Because different data sources have different variables influencing the observation process, we envision a separate detection model for each data source that is defined conditional on a single, shared ecological process described above. More specifically, for data source \(r\) we have

\[\begin{equation} \begin{split} &y_{r, i, a, k} \sim \text{Bernoulli}(p_{r, i, a, k}z_{i, j[a]}), \\ &\text{logit}(p_{r, i, a, k}) = \textbf{v}^{\top}_{r, i, a, k}\boldsymbol{\alpha}_{r, i}, \end{split} \end{equation}\]

where \(p_{r,i, a, k}\) is the probability of detecting species \(i\) at site \(a\) during replicate \(k\) (given it is present at site \(a\)) for data source \(r\), which is a function of site, replicate, and data source specific covariates \(\textbf{V}_r\), and a vector of regression coefficients specific to each data source and species (\(\boldsymbol{\alpha}_{r, i}\)). \(z_{i, j[a]}\) is the true latent occurrence status of species \(i\) at point count location \(j\) that corresponds to the \(a\)th data set location in data set \(r\). Note that species-specific coefficients are modeled as random effects arising from a common distribution with community-level mean and variance parameters, as we saw previously with the occurrence coefficients.

We assign normal priors for the occurrence and data-set specific detection regression coefficients, and inverse-Gamma priors for the community-level variance parameters to complete the Bayesian specification of the model. Pólya-Gamma data augmentation is implemented in an analogous manner to that of other spOccupancy functions to yield an efficient implementation of multi-species integrated occupancy models.

Fitting integrated multi-species occupancy models with `intMsPGOcc()`

The function intMsPGOcc() fits multi-species integrated occupancy models in spOccupancy. The function follows the standard spOccupancy syntax, and more details on this syntax are available in the introductory vignette.

intMsPGOcc(occ.formula, det.formula, data, inits, priors, n.samples,
           n.omp.threads = 1, verbose = TRUE, n.report = 100, 
           n.burn = round(.10 * n.samples), n.thin = 1, n.chains = 1,
           k.fold, k.fold.threads = 1, k.fold.seed, k.fold.only = FALSE, ...)

The data argument contains the list of data elements necessary for fitting an integrated multi-species occupancy model. data should be a list comprised of the following objects: y (list of detection-nondetection data matrices for each data source), occ.covs (data frame or matrix of covariates for occurrence model), det.covs (a list of lists where each element of the list corresponds to the detection-nondetection data for the given data source), sites (a list where each element consists of the site indices for the given data source), and species (a list where each element consists of the species indices for the given data source).

The hbef2015 and neon2015 data sources are currently formatted for use in standard multi-species occupancy models. Perhaps the trickiest part of data integration is ensuring each location in each data source lines up with the correct geographical location where you want to determine the true presence/absence of each species of interest. In spOccupancy, most of this bookkeeping is done under the hood, but we will need to combine the two data sources together into a single list in which we are consistent about how the data sources are sorted. To accomplish this, we recommend first creating the occurrence covariates matrix for all data sources. Because our two data sources do not overlap spatially, this is relatively simple and here we can just use rbind().

occ.covs <- rbind(hbef2015$occ.covs, neon2015$occ.covs)
str(occ.covs)

 num [1:453, 1] 475 494 546 587 588 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "Elevation"

Notice the order in which we placed these covariates: all covariate values for HBEF come first, followed by all covariates for NEON. We need to ensure that we use this identical ordering for all objects in the data list. Next, we create the site indices stored in sites. sites should be a list with two elements (one for each data source), where each element consists of a vector that indicates the rows in occ.covs that correspond with the specific row of the detection-nondetection data for that data source. When the data sources stem from distinct points (like in our current case), this is again relatively straightforward as the indices simply correspond to how we ordered the points in occ.covs.

sites.indx <- list(hbef = 1:nrow(hbef2015$occ.covs),
                  neon = 1:nrow(neon2015$occ.covs) + nrow(hbef2015$occ.covs))
str(sites.indx)

List of 2
 $ hbef: int [1:373] 1 2 3 4 5 6 7 8 9 10 ...
 $ neon: int [1:80] 374 375 376 377 378 379 380 381 382 383 ...

Next we create the combined detection-nondetection data y. For integrated multi-species models in spOccupancy, y is a list of three-dimensional arrays, with each array having dimensions corresponding to species, site, and replicate. Again, we must ensure that we place the data sources in the correct order.

y.full <- list(hbef = hbef2015$y,
               neon = neon2015$y)
str(y.full)

List of 2
 $ hbef: num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ neon: num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
  ..- attr(*, "dimnames")=List of 3
  .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
  .. ..$ : NULL
  .. ..$ : NULL

The last thing we need to specify in data for an integrated multi-species occupancy model is the species list, which is a list of vectors where each vector is a set of codes that indicates the specific species in the corresponding data source. In other words, each element of the species list should consist of some code to indicate the species that each row of the corresponding array in the y data list corresponds to. This is necessary to link data sources that may not sample exactly the same species, as is our case here where the NEON data only sample 9 of the 11 species we are modeling.

sp.indx <- list(hbef = sp.names[hbef.obs.sp.indx], 
                neon = sp.names[neon.obs.sp.indx])
sp.indx

$hbef
 [1] "BAWW" "BHVI" "BLBW" "BLPW" "BTBW" "BTNW" "CAWA" "MAWA" "NAWA" "OVEN"
[11] "REVI"

$neon
[1] "BAWW" "BHVI" "BLBW" "BTBW" "BTNW" "CAWA" "MAWA" "OVEN" "REVI"

Lastly, we put the detection covariates for each data set in a single list, and then combine all the elements together in one data list that we will eventually supply to intMsPGOcc().

det.covs <- list(hbef = hbef2015$det.covs,
                 neon = neon2015$det.covs)
data.list <- list(y = y.full,
                 occ.covs = occ.covs,
                 det.covs = det.covs,
                 sites = sites.indx, 
                 species = sp.indx)
str(data.list)

List of 5
 $ y       :List of 2
  ..$ hbef: num [1:11, 1:373, 1:3] 0 0 0 0 1 0 0 0 0 1 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  .. .. ..$ : chr [1:373] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:3] "1" "2" "3"
  ..$ neon: num [1:9, 1:80, 1:3] 0 0 0 0 1 0 0 1 1 0 ...
  .. ..- attr(*, "dimnames")=List of 3
  .. .. ..$ : chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...
  .. .. ..$ : NULL
  .. .. ..$ : NULL
 $ occ.covs: num [1:453, 1] 475 494 546 587 588 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "Elevation"
 $ det.covs:List of 2
  ..$ hbef:List of 2
  .. ..$ day: num [1:373, 1:3] 156 156 156 156 156 156 156 156 156 156 ...
  .. ..$ tod: num [1:373, 1:3] 330 346 369 386 409 425 447 463 482 499 ...
  ..$ neon:List of 2
  .. ..$ day: num [1:80] 169 169 169 169 169 169 169 169 169 170 ...
  .. ..$ tod: int [1:80] 8 8 6 7 8 6 7 7 7 8 ...
 $ sites   :List of 2
  ..$ hbef: int [1:373] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ neon: int [1:80] 374 375 376 377 378 379 380 381 382 383 ...
 $ species :List of 2
  ..$ hbef: chr [1:11] "BAWW" "BHVI" "BLBW" "BLPW" ...
  ..$ neon: chr [1:9] "BAWW" "BHVI" "BLBW" "BTBW" ...

We will now go ahead and run the integrated multi-species occupancy model. We will first fit a model with a linear and quadratic effect of elevation on occurrence, and effects of day (linear and quadratic) of the year and time of day (linear) on the detection process for each data source. Note that there is no requirement to have the same covariates on the detection models for each individual data source. The function intMsPGOcc() also allows the user to specify the priors and initial values, which follow the same format as other spOccupancy functions (see the introductory vignette for details), but here we will just use the default values, which we see below are printed to the screen. The last thing we need to do is specify the formulas for occurrence and detection (note we need a separate formula for each data source for the detection models in det.formula, which are supplied as a list). We also specify all the criteria of the MCMC sampler, where here we run three chains of the model for 20,000 iterations with a burn-in period of 12,000 and a thinning rate of 8, resulting in a total of 3000 posterior samples.

# Approx. run time: 3 minutes
out.1 <- intMsPGOcc(occ.formula = ~ scale(Elevation) + I(scale(Elevation)^2),
                    det.formula = list(hbef = ~ scale(day) + I(scale(day)^2) + scale(tod),
                                       neon = ~ scale(day) + I(scale(day)^2) + scale(tod)),
                    data = data.list,
                    n.samples = 20000,
                    n.omp.threads = 1,
                    verbose = TRUE,
                    n.report = 4000,
                    n.burn = 12000,
                    n.thin = 8,
                    n.chains = 3)

----------------------------------------
    Preparing to run the model
----------------------------------------

No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72

No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72

No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1

No prior specified for tau.sq.alpha.ig.
Setting prior shape and scale to 0.1

z is not specified in initial values.
Setting initial values based on observed data

beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution

alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution

tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10

tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10

beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution

alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution

----------------------------------------
    Model description
----------------------------------------
Integrated Multispecies Occupancy Model with Polya-Gamma latent
variable fit with 453 sites and 11 species.

Integrating 2 occupancy data sets.

Samples per Chain: 20000 
Burn-in: 12000 
Thinning Rate: 8 
Number of Chains: 3 
Total Posterior Samples: 3000 

Source compiled with OpenMP support and model fit using 1 thread(s).

----------------------------------------
    Chain 1
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 2
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 3
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%

summary(out.1)


Call:
intMsPGOcc(occ.formula = ~scale(Elevation) + I(scale(Elevation)^2), 
    det.formula = list(hbef = ~scale(day) + I(scale(day)^2) + 
        scale(tod), neon = ~scale(day) + I(scale(day)^2) + scale(tod)), 
    data = data.list, n.samples = 20000, n.omp.threads = 1, verbose = TRUE, 
    n.report = 4000, n.burn = 12000, n.thin = 8, n.chains = 3)

Samples per Chain: 20000
Burn-in: 12000
Thinning Rate: 8
Number of Chains: 3
Total Posterior Samples: 3000
Run Time (min): 2.9239

----------------------------------------
    Community Level
----------------------------------------
Occurrence Means (logit scale): 
                         Mean     SD    2.5%     50%  97.5%   Rhat  ESS
(Intercept)            0.4250 0.8894 -1.2971  0.4199 2.2252 1.0112 1700
scale(Elevation)       0.1581 0.4490 -0.7297  0.1567 1.0722 1.0046 1789
I(scale(Elevation)^2) -0.0189 0.2786 -0.5499 -0.0248 0.5614 1.0010 1494

Occurrence Variances (logit scale): 
                         Mean     SD   2.5%     50%   97.5%   Rhat  ESS
(Intercept)           12.1092 7.8331 4.4598 10.1770 31.8805 1.0311  367
scale(Elevation)       2.3228 1.5024 0.7037  1.9661  5.9842 1.0085 1156
I(scale(Elevation)^2)  0.7154 0.5116 0.2127  0.5783  2.0209 1.0120  847

-----------------------------
    Data source 1
-----------------------------
Detection Means (logit scale): 
                   Mean     SD    2.5%     50%  97.5%   Rhat  ESS
(Intercept)     -0.6019 0.4281 -1.4704 -0.5945 0.2470 1.0010 3000
scale(day)       0.0610 0.0900 -0.1176  0.0600 0.2396 1.0052 3000
I(scale(day)^2) -0.0161 0.0877 -0.1836 -0.0154 0.1571 1.0005 3000
scale(tod)      -0.0415 0.0781 -0.1975 -0.0386 0.1076 1.0010 3000

Detection Variances (logit scale): 
                  Mean     SD   2.5%    50%  97.5%   Rhat  ESS
(Intercept)     2.1203 1.2034 0.7641 1.8087 5.2895 1.0036 2360
scale(day)      0.0685 0.0413 0.0240 0.0579 0.1761 1.0115 3000
I(scale(day)^2) 0.0575 0.0406 0.0181 0.0474 0.1563 1.0053 3000
scale(tod)      0.0505 0.0343 0.0163 0.0413 0.1356 1.0220 2857

-----------------------------
    Data source 2
-----------------------------
Detection Means (logit scale): 
                   Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)     -1.3792 0.3031 -1.9730 -1.3771 -0.7432 1.0017 1242
scale(day)       0.0485 0.2035 -0.3936  0.0517  0.4541 1.0016 2675
I(scale(day)^2) -0.0596 0.1614 -0.3797 -0.0610  0.2617 1.0041 2679
scale(tod)       0.0165 0.1391 -0.2593  0.0185  0.2945 1.0048 3000

Detection Variances (logit scale): 
                  Mean     SD   2.5%    50%  97.5%   Rhat  ESS
(Intercept)     0.6024 0.5084 0.1282 0.4562 1.8945 1.0030 2438
scale(day)      0.2440 0.2500 0.0395 0.1675 0.8841 1.0044 2551
I(scale(day)^2) 0.1459 0.1525 0.0287 0.1030 0.5139 1.0037 2600
scale(tod)      0.0995 0.0885 0.0238 0.0742 0.3388 1.0164 3016


----------------------------------------
    Species Level
----------------------------------------
Occurrence (logit scale): 
                              Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)-BAWW            1.1988 1.5309 -0.6088  0.7192  5.1996 1.1247   67
(Intercept)-BHVI            2.9251 2.4270 -0.1303  2.3344  8.9583 1.7024   35
(Intercept)-BLBW            2.5732 0.6180  1.6854  2.4656  4.1286 1.0166  530
(Intercept)-BLPW           -5.0370 0.6278 -6.3493 -5.0015 -3.8924 1.0055  930
(Intercept)-BTBW            3.6727 0.4822  2.8651  3.6235  4.7321 1.0162 1240
(Intercept)-BTNW            2.1894 0.3059  1.6735  2.1605  2.8602 1.0020 2040
(Intercept)-CAWA           -2.3058 0.4757 -3.1522 -2.3350 -1.2657 1.0280  496
(Intercept)-MAWA           -1.9971 0.2633 -2.5237 -1.9895 -1.4882 1.0009 2716
(Intercept)-NAWA           -2.7928 0.5234 -3.8109 -2.8120 -1.7556 1.0019  705
(Intercept)-OVEN            2.5768 0.3073  2.0316  2.5540  3.2370 0.9996 2205
(Intercept)-REVI            3.5437 0.5116  2.6758  3.4933  4.6646 1.0019  472
scale(Elevation)-BAWW      -0.1730 0.5909 -1.5648 -0.0972  0.7321 1.1514  141
scale(Elevation)-BHVI       0.2507 0.9579 -1.6886  0.2489  2.3567 1.0221  206
scale(Elevation)-BLBW      -0.2465 0.2451 -0.7377 -0.2175  0.1189 1.0236  536
scale(Elevation)-BLPW       2.2139 0.6078  1.2709  2.1377  3.6573 0.9996  582
scale(Elevation)-BTBW      -0.1442 0.1588 -0.4794 -0.1409  0.1539 1.0020 2191
scale(Elevation)-BTNW       0.6496 0.2674  0.2117  0.6285  1.2447 1.0009 1641
scale(Elevation)-CAWA       1.0965 0.4752  0.4084  1.0164  2.2491 1.0217  236
scale(Elevation)-MAWA       1.1845 0.2360  0.7745  1.1693  1.6919 1.0026 2013
scale(Elevation)-NAWA       1.0202 0.3965  0.3465  0.9897  1.8688 1.0011 1380
scale(Elevation)-OVEN      -1.7620 0.3434 -2.5742 -1.7114 -1.2170 1.0049  942
scale(Elevation)-REVI      -2.1784 0.7255 -3.8453 -2.0875 -1.0567 1.0126  292
I(scale(Elevation)^2)-BAWW -0.7774 0.4385 -1.8010 -0.7213 -0.0871 1.0590  200
I(scale(Elevation)^2)-BHVI  0.4661 0.7257 -0.8097  0.3970  2.0993 1.0424  289
I(scale(Elevation)^2)-BLBW -0.6044 0.2064 -1.0525 -0.5915 -0.2246 1.0076 1275
I(scale(Elevation)^2)-BLPW  1.0183 0.3870  0.1976  1.0216  1.7291 1.0037  964
I(scale(Elevation)^2)-BTBW -1.1203 0.1827 -1.4973 -1.1081 -0.7877 1.0131 1778
I(scale(Elevation)^2)-BTNW  0.0104 0.1998 -0.3471 -0.0050  0.4334 1.0017 2192
I(scale(Elevation)^2)-CAWA  0.2715 0.3718 -0.3044  0.2234  1.1928 1.0286  287
I(scale(Elevation)^2)-MAWA  0.6836 0.2094  0.2905  0.6745  1.1157 1.0031 1811
I(scale(Elevation)^2)-NAWA  0.3856 0.2791 -0.1649  0.3874  0.9449 1.0010 1572
I(scale(Elevation)^2)-OVEN -0.4161 0.2191 -0.8168 -0.4282  0.0523 1.0137 1609
I(scale(Elevation)^2)-REVI -0.0625 0.3544 -0.6845 -0.0797  0.6820 1.0138  401

-----------------------------
    Data source 1
-----------------------------
Detection (logit scale): 
                        Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)-BAWW     -2.5022 0.4386 -3.2806 -2.5218 -1.6380 1.0398  110
(Intercept)-BHVI     -2.6860 0.2529 -3.1442 -2.7007 -2.1490 1.1518  182
(Intercept)-BLBW     -0.0399 0.1244 -0.2854 -0.0421  0.2096 1.0006 1223
(Intercept)-BLPW     -0.4520 0.2618 -0.9694 -0.4458  0.0385 1.0008 3000
(Intercept)-BTBW      0.6306 0.1074  0.4204  0.6292  0.8420 1.0103 3000
(Intercept)-BTNW      0.5787 0.1118  0.3615  0.5771  0.8050 1.0048 2622
(Intercept)-CAWA     -1.7370 0.5638 -2.8646 -1.7198 -0.7355 1.0224  292
(Intercept)-MAWA     -0.8431 0.2388 -1.3081 -0.8467 -0.3704 1.0029 2064
(Intercept)-NAWA     -1.4722 0.4742 -2.4247 -1.4711 -0.5801 1.0005  687
(Intercept)-OVEN      0.7896 0.1205  0.5574  0.7882  1.0196 1.0017 3000
(Intercept)-REVI      0.5471 0.1073  0.3407  0.5455  0.7578 1.0007 3000
scale(day)-BAWW       0.2138 0.1398 -0.0567  0.2105  0.4997 0.9997 2652
scale(day)-BHVI       0.2011 0.1181 -0.0276  0.1985  0.4355 1.0011 2771
scale(day)-BLBW      -0.2232 0.0678 -0.3554 -0.2240 -0.0897 1.0015 3185
scale(day)-BLPW       0.0660 0.1605 -0.2472  0.0672  0.3704 1.0035 3000
scale(day)-BTBW       0.2690 0.0672  0.1359  0.2707  0.4007 1.0013 3000
scale(day)-BTNW       0.1511 0.0658  0.0230  0.1514  0.2785 1.0025 3000
scale(day)-CAWA      -0.0081 0.1763 -0.3502 -0.0061  0.3353 1.0055 3000
scale(day)-MAWA       0.1129 0.1234 -0.1291  0.1126  0.3607 0.9998 3000
scale(day)-NAWA       0.0216 0.1768 -0.3301  0.0231  0.3548 1.0004 3000
scale(day)-OVEN      -0.0756 0.0735 -0.2211 -0.0775  0.0710 1.0020 3000
scale(day)-REVI      -0.0688 0.0673 -0.2035 -0.0681  0.0585 1.0000 3018
I(scale(day)^2)-BAWW -0.0318 0.1498 -0.3303 -0.0294  0.2555 1.0029 2834
I(scale(day)^2)-BHVI  0.0589 0.1266 -0.1929  0.0591  0.3088 1.0031 3000
I(scale(day)^2)-BLBW -0.1703 0.0785 -0.3306 -0.1701 -0.0159 1.0006 3000
I(scale(day)^2)-BLPW  0.2031 0.1851 -0.1340  0.1932  0.5969 1.0016 3000
I(scale(day)^2)-BTBW -0.0602 0.0790 -0.2161 -0.0604  0.0939 1.0008 3000
I(scale(day)^2)-BTNW -0.0607 0.0782 -0.2152 -0.0589  0.0898 1.0005 3000
I(scale(day)^2)-CAWA -0.0370 0.1805 -0.3859 -0.0345  0.3167 1.0017 3000
I(scale(day)^2)-MAWA  0.0117 0.1364 -0.2529  0.0106  0.2781 1.0032 3000
I(scale(day)^2)-NAWA -0.1329 0.1872 -0.5185 -0.1258  0.2175 1.0003 2725
I(scale(day)^2)-OVEN  0.0142 0.0876 -0.1595  0.0149  0.1816 1.0001 3000
I(scale(day)^2)-REVI  0.0367 0.0773 -0.1121  0.0356  0.1863 1.0015 3000
scale(tod)-BAWW      -0.1786 0.1369 -0.4578 -0.1749  0.0761 1.0024 3000
scale(tod)-BHVI      -0.0512 0.1137 -0.2722 -0.0510  0.1713 1.0015 3000
scale(tod)-BLBW       0.0580 0.0656 -0.0663  0.0585  0.1843 1.0032 3000
scale(tod)-BLPW       0.1070 0.1481 -0.1849  0.1033  0.4074 0.9999 3000
scale(tod)-BTBW      -0.0348 0.0665 -0.1651 -0.0337  0.0987 0.9999 3000
scale(tod)-BTNW       0.0355 0.0631 -0.0900  0.0377  0.1557 1.0032 3000
scale(tod)-CAWA      -0.2139 0.1627 -0.5494 -0.2067  0.0930 1.0030 3000
scale(tod)-MAWA       0.0203 0.1125 -0.2088  0.0206  0.2316 1.0006 3000
scale(tod)-NAWA      -0.0694 0.1590 -0.3840 -0.0686  0.2340 1.0007 3000
scale(tod)-OVEN      -0.0460 0.0735 -0.1900 -0.0453  0.0932 1.0025 3000
scale(tod)-REVI      -0.0786 0.0650 -0.2082 -0.0781  0.0501 1.0044 2784

-----------------------------
    Data source 2
-----------------------------
Detection (logit scale): 
                        Mean     SD    2.5%     50%   97.5%   Rhat  ESS
(Intercept)-BAWW     -1.3422 0.7565 -2.6152 -1.4268  0.3398 1.0405  108
(Intercept)-BHVI     -2.1563 0.3995 -2.9256 -2.1575 -1.3695 1.0824  388
(Intercept)-BLBW     -2.0784 0.3248 -2.7448 -2.0587 -1.4949 1.0030 2416
(Intercept)-BTBW     -1.7732 0.2859 -2.3380 -1.7654 -1.2359 1.0000 2839
(Intercept)-BTNW     -1.0066 0.2474 -1.4946 -1.0097 -0.5154 1.0012 3445
(Intercept)-CAWA     -1.3018 0.6855 -2.6172 -1.3062  0.0931 1.0064 1425
(Intercept)-MAWA     -1.3798 0.5662 -2.4973 -1.3848 -0.2319 1.0014 3000
(Intercept)-OVEN     -0.7468 0.2026 -1.1468 -0.7449 -0.3612 1.0007 2834
(Intercept)-REVI     -0.8568 0.2032 -1.2533 -0.8549 -0.4725 0.9997 3000
scale(day)-BAWW      -0.2881 0.3448 -1.0203 -0.2640  0.3384 1.0066 2676
scale(day)-BHVI       0.1894 0.3037 -0.3647  0.1714  0.8397 1.0020 2511
scale(day)-BLBW       0.5813 0.2838  0.0889  0.5571  1.1955 1.0019 2711
scale(day)-BTBW      -0.1391 0.2434 -0.6376 -0.1369  0.3291 1.0009 2533
scale(day)-BTNW       0.0763 0.1744 -0.2687  0.0763  0.4203 1.0016 3000
scale(day)-CAWA       0.1999 0.4387 -0.6459  0.1899  1.1210 1.0020 2847
scale(day)-MAWA      -0.3311 0.4560 -1.3440 -0.2963  0.4561 1.0010 2568
scale(day)-OVEN       0.1403 0.1747 -0.1946  0.1356  0.4859 1.0008 3000
scale(day)-REVI      -0.0334 0.1595 -0.3506 -0.0337  0.2810 1.0026 3000
I(scale(day)^2)-BAWW -0.0217 0.2623 -0.5043 -0.0354  0.5468 1.0122 1241
I(scale(day)^2)-BHVI -0.4430 0.3248 -1.1467 -0.4100  0.0884 1.0081 1961
I(scale(day)^2)-BLBW  0.0178 0.2208 -0.4245  0.0233  0.4400 1.0030 3000
I(scale(day)^2)-BTBW -0.2233 0.2098 -0.6723 -0.2056  0.1621 1.0009 3000
I(scale(day)^2)-BTNW  0.1826 0.1648 -0.1305  0.1785  0.5172 1.0036 3553
I(scale(day)^2)-CAWA  0.0126 0.3392 -0.6201 -0.0021  0.7203 1.0047 2777
I(scale(day)^2)-MAWA  0.0129 0.2751 -0.5043  0.0050  0.5714 1.0034 3000
I(scale(day)^2)-OVEN -0.1028 0.1593 -0.4218 -0.0975  0.2051 1.0022 3000
I(scale(day)^2)-REVI  0.0075 0.1478 -0.2905  0.0082  0.2944 1.0005 3422
scale(tod)-BAWW      -0.1095 0.2622 -0.6350 -0.1060  0.3984 1.0022 2714
scale(tod)-BHVI      -0.1328 0.2293 -0.6057 -0.1283  0.2954 1.0012 2809
scale(tod)-BLBW      -0.0017 0.1906 -0.3788 -0.0004  0.3605 1.0033 3203
scale(tod)-BTBW       0.1257 0.1991 -0.2570  0.1265  0.5349 1.0018 3000
scale(tod)-BTNW       0.0632 0.1666 -0.2647  0.0645  0.3971 1.0010 3000
scale(tod)-CAWA      -0.0107 0.2896 -0.5933 -0.0025  0.5442 1.0040 3000
scale(tod)-MAWA       0.0706 0.3102 -0.5367  0.0662  0.7071 1.0098 3000
scale(tod)-OVEN       0.0840 0.1502 -0.2052  0.0834  0.3792 1.0014 3044
scale(tod)-REVI       0.0678 0.1452 -0.2253  0.0668  0.3459 0.9999 2404

From the summary output, we can see that most of the model parameters have converged. We can compare this model to a simpler model that assumes only a linear effect of elevation, and then compare the two models using WAIC with the waicOcc() function.

# Approx. run time: 3 minutes
out.2 <- intMsPGOcc(occ.formula = ~ scale(Elevation),
                    det.formula = list(hbef = ~ scale(day) + I(scale(day)^2) + scale(tod),
                       neon = ~ scale(day) + I(scale(day)^2) + scale(tod)),
                    data = data.list,
                    n.samples = 20000,
                    n.omp.threads = 1,
                    verbose = TRUE,
                    n.report = 4000,
                    n.burn = 12000,
                    n.thin = 8,
                    n.chains = 3)

----------------------------------------
    Preparing to run the model
----------------------------------------

No prior specified for beta.comm.normal.
Setting prior mean to 0 and prior variance to 2.72

No prior specified for alpha.comm.normal.
Setting prior mean to 0 and prior variance to 2.72

No prior specified for tau.sq.beta.ig.
Setting prior shape to 0.1 and prior scale to 0.1

No prior specified for tau.sq.alpha.ig.
Setting prior shape and scale to 0.1

z is not specified in initial values.
Setting initial values based on observed data

beta.comm is not specified in initial values.
Setting initial values to random values from the prior distribution

alpha.comm is not specified in initial values.
Setting initial values to random values from the prior distribution

tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10

tau.sq.beta is not specified in initial values.
Setting initial values to random values between 0.5 and 10

beta is not specified in initial values.
Setting initial values to random values from the community-level normal distribution

alpha is not specified in initial values.
Setting initial values to random values from the community-level normal distribution

----------------------------------------
    Model description
----------------------------------------
Integrated Multispecies Occupancy Model with Polya-Gamma latent
variable fit with 453 sites and 11 species.

Integrating 2 occupancy data sets.

Samples per Chain: 20000 
Burn-in: 12000 
Thinning Rate: 8 
Number of Chains: 3 
Total Posterior Samples: 3000 

Source compiled with OpenMP support and model fit using 1 thread(s).

----------------------------------------
    Chain 1
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 2
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%
----------------------------------------
    Chain 3
----------------------------------------
Sampling ... 
Sampled: 4000 of 20000, 20.00%
-------------------------------------------------
Sampled: 8000 of 20000, 40.00%
-------------------------------------------------
Sampled: 12000 of 20000, 60.00%
-------------------------------------------------
Sampled: 16000 of 20000, 80.00%
-------------------------------------------------
Sampled: 20000 of 20000, 100.00%

# Model selection using WAIC
waicOcc(out.1)

       elpd          pD        WAIC 
-5121.01590    84.07401 10410.17981

waicOcc(out.2)

       elpd          pD        WAIC 
-5154.02429    78.28919 10464.62696

We see the model with both a linear and quadratic effect of elevation is preferred. We can also do prediction using the predict() function to generate species distribution maps across a region of interest (or generate estimates of community-level metrics), which follows exactly the format used for single data-source multi-species occupancy models.

As of v0.7.0, we can also calculate WAIC individually for each species using the by.sp argument

waicOcc(out.1, by.sp = TRUE)

        elpd       pD      WAIC
1  -228.4167 8.998136  474.8296
2  -290.3455 6.682839  594.0567
3  -767.1750 9.762715 1553.8755
4  -139.4260 4.896930  288.6459
5  -764.0191 9.254624 1546.5475
6  -842.9051 9.198185 1704.2065
7  -125.6505 7.631983  266.5650
8  -289.2931 6.661293  591.9088
9  -106.6762 5.372210  224.0968
10 -751.3157 7.632899 1517.8972
11 -815.5886 8.661218 1648.4997

waicOcc(out.2, by.sp = TRUE)

        elpd        pD      WAIC
1  -230.7406  8.463095  478.4074
2  -290.6858  5.910009  593.1917
3  -771.3186 11.275470 1565.1882
4  -141.0589  5.132568  292.3829
5  -778.8126  8.088996 1573.8033
6  -842.8739  8.141197 1702.0302
7  -126.4658  5.650348  264.2323
8  -296.3188  6.015551  604.6688
9  -107.5462  4.680241  224.4530
10 -752.5325  7.548766 1520.1626
11 -815.5069  8.188468 1647.3908

Active development

The functionality for integrated multi-species occupancy models in spOccupancy is in active development, and currently some functionality that is common to all other spOccupancy model fitting functions is not available for intMsPGOcc() (i.e., posterior predictive checks, cross-validation, random effects on detection probability). I am actively working on incorporating these extensions into the model. Additionally, there will of course be spatially-explicit versions of these models available at some point in the future as well, with options to accommodate species correlations using the factor modeling approach detailed in Doser, Finley, and Banerjee (2023).

References

Doser, Jeffrey W, Andrew O Finley, and Sudipto Banerjee. 2023. “Joint Species Distribution Models with Imperfect Detection for High-Dimensional Spatial Data.” Ecology 104 (9): e4137.

Doser, Jeffrey W, Wendy Leuenberger, T Scott Sillett, Michael T Hallworth, and Elise F Zipkin. 2022. “Integrated Community Occupancy Models: A Framework to Assess Occurrence and Biodiversity Dynamics Using Multiple Data Sources.” Methods in Ecology and Evolution.

Miller, David AW, Krishna Pacifici, Jamie S Sanderlin, and Brian J Reich. 2019. “The Recent Past and Promising Future for Data Integration Methods to Estimate Species’ Distributions.” Methods in Ecology and Evolution 10 (1): 22–37.

Jeffrey W. Doser

February 20, 2023

Introduction

Example data sets: Foliage-gleaning birds at Hubbard Brook and Bartlett Forests

Basic model description

Fitting integrated multi-species occupancy models with `intMsPGOcc()`

Active development

References

Integrated multi-species occupancy models in spOccupancy

Jeffrey W. Doser

February 20, 2023

Introduction

Example data sets: Foliage-gleaning birds at Hubbard Brook and Bartlett Forests

Basic model description

Fitting integrated multi-species occupancy models with intMsPGOcc()

Active development

References

Fitting integrated multi-species occupancy models with `intMsPGOcc()`