Function for prediction at new locations for multi-season single-species spatially-varying coefficient occupancy models

The function predict collects posterior predictive samples for a set of new locations given an object of class `svcTPGOcc`. Prediction is possible for both the latent occupancy state as well as detection. Predictions are currently only possible for sampled primary time periods.

Usage

# S3 method for svcTPGOcc
predict(object, X.0, coords.0, t.cols, weights.0, n.omp.threads = 1, 
        verbose = TRUE, n.report = 100, 
        ignore.RE = FALSE, type = 'occupancy', forecast = FALSE, 
        grid.index.0, ...)

Arguments

object: an object of class svcTPGOcc
X.0: the design matrix of covariates at the prediction locations. This should be a three-dimensional array, with dimensions corresponding to site, primary time period, and covariate, respectively. Note that the first covariate should consist of all 1s for the intercept if an intercept is included in the model. If random effects are included in the occupancy (or detection if type = 'detection') portion of the model, the levels of the random effects at the new locations/time periods should be included as an element of the three-dimensional array. The ordering of the levels should match the ordering used to fit the data in svcTPGOcc. The covariates should be organized in the same order as they were specified in the corresponding formula argument of svcTPGOcc. Names of the third dimension (covariates) of any random effects in X.0 must match the name of the random effects used to fit the model, if specified in the corresponding formula argument of svcTPGOcc. See example below.
coords.0: the spatial coordinates corresponding to X.0. Note that spOccupancy assumes coordinates are specified in a projected coordinate system.
t.cols: an indexing vector used to denote which primary time periods are contained in the design matrix of covariates at the prediction locations (X.0). The values should denote the specific primary time periods used to fit the model. The values should indicate the columns in data$y used to fit the model for which prediction is desired. See example below. Not required when forecast = TRUE.
weights.0: not used for objects of class svcTPGOcc. Used when calling other functions.
n.omp.threads: a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting n.omp.threads up to the number of hyperthreaded cores. Note, n.omp.threads > 1 might not work on some systems.
verbose: if TRUE, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.
ignore.RE: logical value that specifies whether or not to remove random unstructured occurrence (or detection if type = 'detection') effects from the subsequent predictions. If TRUE, random effects will be included. If FALSE, unstructured random effects will be set to 0 and predictions will only be generated from the fixed effects, the spatial random effects, and AR(1) random effects if the model was fit with ar1 = TRUE.
n.report: the interval to report sampling progress.
type: a quoted keyword indicating what type of prediction to produce. Valid keywords are 'occupancy' to predict latent occupancy probability and latent occupancy values (this is the default), or 'detection' to predict detection probability given new values of detection covariates.
grid.index.0: an indexing vector used to specify how each row in X.0 corresponds to the coordinates specified in coords.0. Only relevant if the spatial random effect was estimated at a higher spatial resolution (e.g., grid cells) than point locations.
forecast: a logical value indicating whether prediction is occurring at non-sampled primary time periods (e.g., forecasting).
...: currently no additional arguments

Note

When ignore.RE = FALSE, both sampled levels and non-sampled levels of unstructured random effects are supported for prediction. For sampled levels, the posterior distribution for the random intercept corresponding to that level of the random effect will be used in the prediction. For non-sampled levels, random values are drawn from a normal distribution using the posterior samples of the random effect variance, which results in fully propagated uncertainty in predictions with models that incorporate random effects.

Occurrence predictions at sites that are only sampled for a subset of the total number of primary time periods are obtained directly when fitting the model. See the psi.samples and z.samples portions of the output list from the model object of class svcTPGOcc.

Author

Jeffrey W. Doser doserjef@msu.edu,
Andrew O. Finley finleya@msu.edu

Value

A list object of class predict.svcTPGOcc. When type = 'occupancy', the list consists of:

psi.0.samples: a three-dimensional object of posterior predictive samples for the latent occupancy probability values with dimensions corresponding to posterior predictive sample, site, and primary time period.
z.0.samples: a three-dimensional object of posterior predictive samples for the latent occupancy values with dimensions corresponding to posterior predictive sample, site, and primary time period.
w.0.samples: a three-dimensional array of posterior predictive samples for the spatial random effects, with dimensions corresponding to MCMC iteration, coefficient, and site.

When type = 'detection', the list consists of:

p.0.samples: a three-dimensional object of posterior predictive samples for the detection probability values with dimensions corresponding to posterior predictive sample, site, and primary time period.

The return object will include additional objects used for standard extractor functions.

Examples

set.seed(500)
# Sites
J.x <- 10
J.y <- 10
J <- J.x * J.y
# Primary time periods
n.time <- sample(10, J, replace = TRUE)
n.time.max <- max(n.time)
# Replicates
n.rep <- matrix(NA, J, max(n.time))
for (j in 1:J) {
  n.rep[j, 1:n.time[j]] <- sample(1:4, n.time[j], replace = TRUE)
}
# Occurrence --------------------------
beta <- c(0.4, 0.5, -0.9)
trend <- TRUE 
sp.only <- 0
psi.RE <- list()
# Detection ---------------------------
alpha <- c(-1, 0.7, -0.5)
p.RE <- list()
# Spatial -----------------------------
svc.cols <- c(1, 2)
p.svc <- length(svc.cols)
sp <- TRUE
cov.model <- "exponential"
sigma.sq <- runif(p.svc, 0.1, 1)
phi <- runif(p.svc, 3 / .9, 3 / .1)

# Get all the data
dat <- simTOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, 
               beta = beta, alpha = alpha, sp.only = sp.only, trend = trend, 
               psi.RE = psi.RE, p.RE = p.RE, sp = TRUE, sigma.sq = sigma.sq, 
               phi = phi, cov.model = cov.model, ar1 = FALSE, svc.cols = svc.cols)

# Subset data for prediction
pred.indx <- sample(1:J, round(J * .25), replace = FALSE)
y <- dat$y[-pred.indx, , , drop = FALSE]
# Occupancy covariates
X <- dat$X[-pred.indx, , , drop = FALSE]
# Prediction covariates
X.0 <- dat$X[pred.indx, , , drop = FALSE]
# Detection covariates
X.p <- dat$X.p[-pred.indx, , , , drop = FALSE]
psi.0 <- dat$psi[pred.indx, ]
# Coordinates
coords <- dat$coords[-pred.indx, ]
coords.0 <- dat$coords[pred.indx, ]

# Package all data into a list
# Occurrence
occ.covs <- list(int = X[, , 1], 
                 trend = X[, , 2], 
                 occ.cov.1 = X[, , 3]) 
# Detection
det.covs <- list(det.cov.1 = X.p[, , , 2], 
                 det.cov.2 = X.p[, , , 3]) 
# Data list bundle
data.list <- list(y = y, 
                  occ.covs = occ.covs,
                  det.covs = det.covs, 
                  coords = coords) 
# Priors
prior.list <- list(beta.normal = list(mean = 0, var = 2.72), 
                   alpha.normal = list(mean = 0, var = 2.72), 
                   sigma.sq.ig = list(a = 2, b = 0.5), 
                   phi.unif = list(a = 3 / 1, b = 3 / 0.1))

# Initial values
z.init <- apply(y, c(1, 2), function(a) as.numeric(sum(a, na.rm = TRUE) > 0))
inits.list <- list(beta = 0, alpha = 0, z = z.init, phi = 3 / .5, sigma.sq = 2, 
                   w = rep(0, J))
# Tuning
tuning.list <- list(phi = 1)
# Number of batches
n.batch <- 10
# Batch length
batch.length <- 25
n.iter <- n.batch * batch.length

# Run the model
# Note that this is just a test case and more iterations/chains may need to 
# be run to ensure convergence.
out <- svcTPGOcc(occ.formula = ~ trend + occ.cov.1, 
               det.formula = ~ det.cov.1 + det.cov.2, 
               data = data.list, 
               inits = inits.list, 
               n.batch = n.batch, 
               batch.length = batch.length, 
               priors = prior.list,
               cov.model = "exponential", 
               svc.cols = svc.cols, 
               tuning = tuning.list, 
               NNGP = TRUE, 
               ar1 = FALSE,
               n.neighbors = 5, 
               search.type = 'cb', 
               n.report = 10, 
               n.burn = 50, 
               n.chains = 1)
#> ----------------------------------------
#> 	Preparing the data
#> ----------------------------------------
#> ----------------------------------------
#> 	Building the neighbor list
#> ----------------------------------------
#> ----------------------------------------
#> Building the neighbors of neighbors list
#> ----------------------------------------
#> ----------------------------------------
#> 	Model description
#> ----------------------------------------
#> Spatial NNGP Multi-season Occupancy Model with Polya-Gamma latent
#> variable fit with 75 sites and 10 years.
#> 
#> Samples per chain: 250 (10 batches of length 25)
#> Burn-in: 50 
#> Thinning Rate: 1 
#> Number of Chains: 1 
#> Total Posterior Samples: 200 
#> 
#> Number of spatially-varying coefficients: 2 
#> Using the exponential spatial correlation model.
#> 
#> Using 5 nearest neighbors.
#> 
#> Source compiled with OpenMP support and model fit using 1 thread(s).
#> 
#> Adaptive Metropolis with target acceptance rate: 43.0
#> ----------------------------------------
#> 	Chain 1
#> ----------------------------------------
#> Sampling ... 
#> Batch: 10 of 10, 100.00%

summary(out)
#> 
#> Call:
#> svcTPGOcc(occ.formula = ~trend + occ.cov.1, det.formula = ~det.cov.1 + 
#>     det.cov.2, data = data.list, inits = inits.list, priors = prior.list, 
#>     tuning = tuning.list, svc.cols = svc.cols, cov.model = "exponential", 
#>     NNGP = TRUE, n.neighbors = 5, search.type = "cb", n.batch = n.batch, 
#>     batch.length = batch.length, ar1 = FALSE, n.report = 10, 
#>     n.burn = 50, n.chains = 1)
#> 
#> Samples per Chain: 250
#> Burn-in: 50
#> Thinning Rate: 1
#> Number of Chains: 1
#> Total Posterior Samples: 200
#> Run Time (min): 0.0037
#> 
#> Occurrence (logit scale): 
#>                Mean     SD    2.5%     50%   97.5% Rhat ESS
#> (Intercept)  0.7991 0.3761  0.0888  0.8177  1.3866   NA  10
#> trend        0.6101 0.3827 -0.2235  0.6204  1.2300   NA   9
#> occ.cov.1   -0.5080 0.1980 -0.8394 -0.5308 -0.0771   NA  25
#> 
#> Detection (logit scale): 
#>                Mean     SD    2.5%     50%   97.5% Rhat ESS
#> (Intercept) -1.1338 0.1524 -1.4425 -1.1332 -0.8509   NA  22
#> det.cov.1    0.8275 0.1214  0.5782  0.8295  1.0807   NA  47
#> det.cov.2   -0.4623 0.1003 -0.6428 -0.4566 -0.2569   NA 128
#> 
#> Spatial Covariance: 
#>                         Mean     SD   2.5%     50%   97.5% Rhat ESS
#> sigma.sq-(Intercept)  0.5876 0.3084 0.1528  0.5616  1.2891   NA   7
#> sigma.sq-trend        0.5665 0.5132 0.1591  0.3271  1.9043   NA   4
#> phi-(Intercept)      15.3733 7.9892 4.3202 14.6523 28.9483   NA   9
#> phi-trend            15.9472 8.8039 3.1561 18.7010 27.4173   NA   6

# Predict at new sites across all n.max.years
# Take a look at array of covariates for prediction
str(X.0)
#>  num [1:25, 1:10, 1:3] 1 1 1 1 1 1 1 1 1 1 ...
# Subset to only grab time periods 1, 2, and 5
t.cols <- c(1, 2, 5)
X.pred <- X.0[, t.cols, ]
out.pred <- predict(out, X.0, coords.0, t.cols = t.cols, type = 'occupancy')
#> ----------------------------------------
#> 	Prediction description
#> ----------------------------------------
#> Spatial NNGP Multi-season Occupancy model with Polya-Gamma latent
#> variable fit with 75 observations and 10 years.
#> 
#> Number of fixed covariates 3 (including intercept if specified).
#> 
#> Using the exponential spatial correlation model.
#> 
#> Using 5 nearest neighbors.
#> 
#> Number of MCMC samples 200.
#> 
#> Predicting at 25 non-sampled locations.
#> 
#> 
#> Source compiled with OpenMP support and model fit using 1 threads.
#> -------------------------------------------------
#> 		Predicting
#> -------------------------------------------------
#> Location: 25 of 25, 100.00%
#> Generating latent occupancy state
str(out.pred)
#> List of 6
#>  $ z.0.samples  : num [1:200, 1:25, 1:10] 1 1 0 0 1 1 1 1 1 1 ...
#>  $ w.0.samples  : num [1:200, 1:2, 1:25] 0.0274 -0.2407 0.0254 -0.3488 -0.5979 ...
#>  $ psi.0.samples: num [1:200, 1:25, 1:10] 0.734 0.816 0.624 0.478 0.504 ...
#>  $ run.time     : 'proc_time' Named num [1:5] 0.037 0.091 0.022 0 0
#>   ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
#>  $ call         : language predict.svcTPGOcc(object = out, X.0 = X.0, coords.0 = coords.0, t.cols = t.cols,      type = "occupancy")
#>  $ object.class : chr "svcTPGOcc"
#>  - attr(*, "class")= chr "predict.svcTPGOcc"