Combination of Buffer and pixel sizes, plus fake points

Index Moon and Sleeping sites RSFs

library(tidyverse)
library(terra)
library(suncalc)
library(purrr)
library(hexbin)
library(viridis)
library(geosphere)
library(scales)
library(spdep)

moon <- readRDS("moon_base.rds")

0.1 Environmental Covariate Data Preparation

We prepared a set of high-resolution spatial datasets to characterize environmental conditions across all roost sites and their surroundings for subsequent analyses. Canopy height and its associated uncertainty were extracted from the 2020 ETH Global Canopy Height product at 10-m resolution (ETH Global Canopy Height 2020, S27W057; see Lang et al., 2022). These raster layers provide both mean canopy height and the associated standard deviation (canopy structural variability) for each grid cell.

Vegetation productivity was quantified via the Normalized Difference Vegetation Index (NDVI), derived from Sentinel-2 Level-2A imagery for representative dates of each season (winter, autumn, and summer). For each season, we used the red (B04) and near-infrared (B08) bands at 10-m spatial resolution, computing NDVI as (B08 - B04) / (B08 + B04). This resulted in a seasonal NDVI layer for each period, ensuring environmental covariates matched the phenological state of the forest at the time of field observations.

Weather data were obtained from local meteorological records (“clima.csv”), including daily total precipitation (pp), maximum temperature (t_max), and minimum temperature (t_min), and were later merged with the main dataset by date.

These data sources were loaded into R as raster or tabular objects and provided the basis for extracting, by spatial overlay or table join, the corresponding environmental values at each roost or pseudo-absence location for every night included in the analysis.

Canopy Height and Error

canopy <- rast("ETH_GlobalCanopyHeight_10m_2020_S27W057_Map.tif")
canopy_sd <- rast("ETH_GlobalCanopyHeight_10m_2020_S27W057_Map_SD.tif")

NDVI, Sentinel T21JYM_20220709T133839

b04_winter <- rast("T21JYM_20220709T133839_B04_10m.jp2")
b08_winter <- rast("T21JYM_20220709T133839_B08_10m.jp2")
ndvi_winter <- (b08_winter - b04_winter) / (b08_winter + b04_winter)

NDVI, Sentinel 20160317T134032

b04_aut <- rast("T21JYM_20160317T134032_B04_10m.jp2")
b08_aut <- rast("T21JYM_20160317T134032_B08_10m.jp2")
ndvi_aut <- (b08_aut - b04_aut) / (b08_aut + b04_aut)

NDVI , Sentinel 20221201T133841

b04_sum <- rast("T21JYM_20221201T133841_B04_10m.jp2")
b08_sum <- rast("T21JYM_20221201T133841_B08_10m.jp2")
ndvi_sum <- (b08_sum - b04_sum) / (b08_sum + b04_sum)

Weather information

weather <- read.csv("clima.csv")

1 Systematic Generation of Random Points Combinations for Sleeping site Analysis

This script implements a systematic approach to generate all combinations of buffer size and number of random (pseudo-absence) points around each roost (“used”) site, with the goal of optimizing the design of resource selection analyses. For each possible combination of number of random points (n_fake) and buffer radius, the following steps are executed:

Random point generation: For each observed roost site and night, n_fake random points are generated at random bearings (0–360°) and random distances (uniformly distributed between a minimum and maximum buffer distance) from the observed roost location. A minimum distance (min_dist) is enforced between all random points generated for the same night to reduce spatial clustering.
Data merging: All “used” (real) points and the newly generated “available” (random) points are combined into a single dataset, with all points from the same group and night assigned a unique event identifier.
Parameter tracking: Each point is annotated with the buffer size and number of random points used for its generation, enabling downstream comparisons across design choices.
List storage: The resulting dataset for each combination is stored as a separate element in a list for easy iteration and analysis.

This approach is fully automated, enabling rapid testing of all pairwise combinations of six candidate numbers of random points and five buffer sizes (30 total combinations).

1.1 Rationale

We systematically explored all parameter combinations in order to balance two methodological goals:
(i) minimizing spatial autocorrelation between “used” and “available” points (which can arise if random points are placed too close together or too close to the used location), and
(ii) maximizing stability and robustness of estimated coefficients in RSF (Resource Selection Function) models, given the inherent stochasticity in random point generation.

By storing the full set of possible design permutations, we were able to quantify how variation in random point number and buffer size affects both spatial structure in the data and model outcomes, and to select optimal parameter values for final inference. This design ensures transparent, reproducible, and robust ecological modeling.

n_fakes <- c(1, 2, 5, 10, 15, 20)
buffers <- c(100, 200, 300, 400, 500)
min_dist <- 30

combos <- expand.grid(n_fake = n_fakes, buffer = buffers)

set.seed(42)

all_dfs <- list()

for (i in 1:nrow(combos)) {
  n_fake <- combos$n_fake[i]
  buffer <- combos$buffer[i]
  message("Combinando: ", n_fake, " puntos, buffer: ", buffer)
  
  # 1. Puntos FALSOS
  pseudo_data <- moon %>%
    rowwise() %>%
    do({
      res <- generate_pseudo_points(
        lat = .$lat, lon = .$lon, 
        n_fake = n_fake, buffer = buffer, min_dist = min_dist,
        fecha = .$fecha,
        group = .$group,
        moon_fraction = .$moon_fraction,
        moon_phase_angle = .$moon_phase_angle,
        moon_phase_name = .$moon_phase_name,
        moon_visible = .$moon_visible,
        used = FALSE
      )
      if (is.null(res)) tibble(
        lon = numeric(0), lat = numeric(0), 
        fecha = character(0), group = character(0),
        moon_fraction = numeric(0), moon_phase_angle = numeric(0),
        moon_phase_name = character(0), moon_visible = logical(0),
        used = logical(0)
      )
      else res
    }) %>%
    ungroup() %>%
    mutate(n_fake = n_fake, buffer = buffer)
  
  moon_used <- moon %>%
    mutate(used = TRUE,
           n_fake = n_fake,
           buffer = buffer) %>%
    select(lon, lat, fecha, group, moon_fraction, moon_phase_angle,
           moon_phase_name, moon_visible, used, n_fake, buffer)
  
  moon_all <- bind_rows(moon_used, pseudo_data)
  
  moon_all <- moon_all %>%
    mutate(event_key = paste(group, fecha, sep = "_")) %>%
    group_by(event_key) %>%
    mutate(id = cur_group_id()) %>%
    ungroup() %>%
    select(-event_key)
  
  # 5. Guardar en lista
  all_dfs[[paste0("n", n_fake, "_b", buffer)]] <- moon_all
}

2 Lunar Light and Canopy Covariate Extraction Functions

We defined dedicated R functions to systematically annotate each point in our datasets with key environmental covariates required for our analyses:

2.1 Function: `agregar_moon_luz()`

This function calculates temporally- and spatially-explicit lunar illumination indices for each point. For every location, it:

Computes three timestamps for the same night: the previous evening (20:00), local midnight (00:00), and early morning (04:00), relative to the recorded date.
Uses these timestamps and the point’s coordinates to determine the altitude of the moon above the horizon at each time, via the getMoonPosition() function.
For each time, calculates the scaled altitude as the sine of the moon’s altitude (with values below zero set to zero, reflecting the moon being below the horizon).
Computes an average scaled moon altitude for the night, and multiplies this by the moon’s illuminated fraction to yield an “effective moonlight” index for each site-night.
Returns the original data frame with two additional columns: average moon altitude and effective moonlight index.

These steps ensure that the model incorporates realistic, time-specific measures of nocturnal lunar illumination at each site, which is ecologically relevant for species with night-time activity.

2.2 Function: `agregar_canopy()`

This function extracts canopy structure variables for each point, by:

Converting each point’s coordinates into a spatial vector with the appropriate coordinate reference system (CRS) for raster extraction.
Extracting the mean canopy height and the associated standard deviation (as a measure of canopy structural variability) from high-resolution raster layers at each point’s location.
Appending these values as new columns to the input data frame.

By using these functions in sequence, we consistently enrich each observation with the relevant spatially- and temporally-matched environmental predictors needed for robust resource selection modeling.

agregar_moon_luz <- function(df) {
  df <- df %>%
    mutate(
      fecha_a = as.POSIXct(paste(as.Date(fecha) - 1, "20:00:00"), tz = "UTC"),
      fecha_b = as.POSIXct(paste(as.Date(fecha), "00:00:00"), tz = "UTC"),
      fecha_c = as.POSIXct(paste(as.Date(fecha), "04:00:00"), tz = "UTC")
    )

  pos_a <- getMoonPosition(data = df %>% rename(date = fecha_a) %>% select(date, lat, lon))
  pos_b <- getMoonPosition(data = df %>% rename(date = fecha_b) %>% select(date, lat, lon))
  pos_c <- getMoonPosition(data = df %>% rename(date = fecha_c) %>% select(date, lat, lon))

  alt_scaled_a <- pmax(0, sin(pos_a$altitude))
  alt_scaled_b <- pmax(0, sin(pos_b$altitude))
  alt_scaled_c <- pmax(0, sin(pos_c$altitude))

  df <- df %>%
    mutate(
      moon_altitude_avg = (alt_scaled_a + alt_scaled_b + alt_scaled_c) / 3,
      moon_light_effective_avg = moon_fraction * moon_altitude_avg
    )
  return(df)
}

agregar_canopy <- function(df, canopy, canopy_sd) {
  moon_vect <- vect(df, geom = c("lon", "lat"), crs = crs(canopy))
  df$canopy_height <- extract(canopy, moon_vect)[,2]
  df$canopy_height_sd <- extract(canopy_sd, moon_vect)[,2]
  return(df)
}

all_dfs_proc <- map(
  all_dfs,
  ~ .x %>%
    agregar_moon_luz() %>%
    agregar_canopy(canopy, canopy_sd))

Checking everything is fine

head(all_dfs_proc[["n5_b200"]])

Lets see this visually

comb_names <- c(
  "n1_b100",   
  "n2_b200",
  "n5_b300",
  "n10_b400",
  "n15_b500",
  "n20_b500")

plot_df <- purrr::map_dfr(
  comb_names,
  ~ dplyr::mutate(all_dfs_proc[[.x]], combination = .x)
)
plot_df$used <- as.factor(plot_df$used)

ggplot(plot_df, aes(x = moon_fraction, y = canopy_height)) +
  geom_jitter(
    aes(fill = used),
    shape = 21, size = 3, stroke = 0.3, 
    alpha = 0.05, color = "black",
    width = 0.03,
    show.legend = TRUE
  ) +
  geom_smooth(
    aes(color = used),
    method = "lm",
    se = TRUE,
    size = 1.9
  ) +
  scale_fill_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Available", "Used"),
    name = "Points"
  ) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Available", "Used"),
    name = "Trend"
  ) +
  guides(
    fill = guide_legend(override.aes = list(alpha = 1)),
    color = guide_legend(override.aes = list(size = 1.5))
  ) +
  labs(
    x = "Moon Fraction",
    y = "Canopy Height (m)"
  ) +
  facet_wrap(~ combination, nrow = 2) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(size = 15, face = "bold"),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

### Function: agregar_ndvi()

This function assigns seasonally-matched vegetation productivity values to each point based on NDVI (Normalized Difference Vegetation Index) calculated from Sentinel-2 imagery. For each observation, the function:

Determines the corresponding meteorological season (summer, autumn, winter, or spring) for the date of the observation.
For each season, selects the appropriate NDVI raster layer (derived from the red and near-infrared bands of Sentinel-2 at 10 m resolution).
Converts the point coordinates to the required coordinate reference system and projects them if necessary to match the NDVI raster.
Extracts the NDVI value from the raster at each point’s location, and assigns it to a new column (ndvi_sentinel) in the data frame, ensuring that each observation receives the NDVI value that reflects the correct phenological context.
Returns the updated data frame with both the assigned season and its matching NDVI value.

By doing so, this function ensures that each “used” or “available” location is accurately characterized in terms of contemporaneous vegetation productivity, a key predictor in ecological and resource selection analyses.

agregar_ndvi <- function(df, ndvi_winter, ndvi_aut, ndvi_sum) {
  df <- df %>%
    mutate(season = case_when(
      month(fecha) %in% c(12, 1, 2)  ~ "summer",
      month(fecha) %in% c(3, 4, 5)   ~ "autumn",
      month(fecha) %in% c(6, 7, 8)   ~ "winter",
      month(fecha) %in% c(9, 10, 11) ~ "spring"
    ))
  
  names(ndvi_winter) <- "ndvi"
  names(ndvi_aut)    <- "ndvi"
  names(ndvi_sum)    <- "ndvi"
  
  df$ndvi_sentinel <- NA_real_
  
  for (season_name in c("winter", "autumn", "spring", "summer")) {
    idx <- which(df$season == season_name)
    if(length(idx) == 0) next
    
    ndvi_raster <- switch(
      season_name,
      "winter" = ndvi_winter,
      "autumn" = ndvi_aut,
      "spring" = ndvi_sum,
      "summer" = ndvi_sum
    )
    moon_sub <- df[idx, ]
    
    vect_sub <- vect(moon_sub, geom = c("lon", "lat"), crs = "EPSG:4326") %>%
      project(crs(ndvi_raster))
    
    ndvi_vals <- extract(ndvi_raster, vect_sub)[, 2]
    df$ndvi_sentinel[idx] <- ndvi_vals
  }
  return(df)
}

all_dfs_proc <- map(
  all_dfs_proc,
  ~ agregar_ndvi(.x, ndvi_winter, ndvi_aut, ndvi_sum)
)

2.3 Function: `add_weather_vars()`

This function appends daily weather variables—precipitation (pp), maximum temperature (t_max), and minimum temperature (t_min), to each point by joining on the observation date.
It ensures that each “used” and “available” point is annotated with the relevant weather conditions for its respective night.

add_weather_vars <- function(df, weather) {
  df <- df %>%
    left_join(weather %>% select(fecha, pp),    by = "fecha") %>%
    left_join(weather %>% select(fecha, t_max), by = "fecha") %>%
    left_join(weather %>% select(fecha, t_min), by = "fecha")
  return(df)
}

all_dfs_proc <- map(
  all_dfs_proc,
  ~ add_weather_vars(.x, weather)
)

comb_names <- c("n1_b100", "n2_b200", "n5_b300", "n10_b400", "n15_b500", "n20_b500")

plot_df <- map_dfr(
  comb_names,
  ~ all_dfs_proc[[.x]] %>%
      mutate(ndvi_cat = ntile(ndvi_sentinel, 3),
             combination = .x)
)
plot_df$used <- as.factor(plot_df$used)

ggplot(plot_df, aes(x = moon_light_effective_avg, y = canopy_height)) +
  geom_jitter(
    aes(fill = used),
    shape = 21, size = 3, stroke = 0.3,
    alpha = 0.05, color = "black",
    width = 0.03,
    show.legend = TRUE
  ) +
  geom_smooth(
    aes(color = used),
    method = "lm",
    se = TRUE,
    size = 1.9
  ) +
  facet_grid(combination ~ ndvi_cat, labeller = label_both) +
  scale_fill_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Available", "Used"),
    name = "Points"
  ) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Available", "Used"),
    name = "Trend"
  ) +
  labs(
    title = "Moonlight vs. Canopy Height across NDVI Tiles\nby Combo (n_fake/buffer)",
    x = "Effective Moonlight Index",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(size = 15, face = "bold"),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

3 Spatial Autocorrelation

3.1 Function: `calc_moran()` and Moran’s I Analysis Pipeline

The function calc_moran() quantifies spatial autocorrelation in canopy height across all locations within a given dataset using Moran’s I statistic. Specifically:

It constructs a spatial neighborhood structure for all points based on geographic coordinates (longitude and latitude), using either a distance-based approach (all points within 200 meters) or, if necessary, a k-nearest neighbors fallback.
It builds spatial weights and applies Moran’s I test (moran.test) to assess the degree of spatial clustering in canopy height values.
The function returns a data frame containing the estimated Moran’s I value and its associated p-value.

This function is applied iteratively across all combinations of randomization scenarios (all_dfs_proc), automatically extracting the number of random points (n_fake) and buffer size from each dataset’s identifier. The results are consolidated in a single summary table (moran_table), allowing comparison of spatial autocorrelation across all parameter combinations.

3.2 Spatial Autocorrelation Analysis Using Moran’s I

For each combination of pseudo-absence point number and buffer size, we quantified the spatial autocorrelation of canopy height values using Moran’s I statistic. The analysis proceeds as follows:

For each dataset (corresponding to a unique combination of random point number and buffer size), all point coordinates (longitude and latitude) and their associated canopy height values are extracted.
A spatial neighborhood structure is constructed for all points: for each point, neighbors are defined as all other points within a fixed radius (here, 200 meters), with a k-nearest neighbor fallback for isolated points.
A spatial weights matrix is created, and Moran’s I is calculated using these weights, quantifying the degree to which canopy height values are spatially clustered (autocorrelated) across all points in the dataset.
The results, including Moran’s I and its associated p-value, are tabulated alongside the specific combination of random point number and buffer size.

Interpretation:
- A high, positive Moran’s I value indicates that points with similar canopy height tend to be spatially clustered (high autocorrelation). - A value near zero suggests little to no spatial structure (randomness). - A negative value implies spatial dispersion (points with dissimilar values are close together).

Purpose in the study:
This analysis allows us to systematically assess how different randomization designs (i.e., number of pseudo-absence points and buffer size) affect the spatial autocorrelation present in our environmental covariate data. By minimizing spatial autocorrelation in the generated points, we reduce the risk of statistical biases and improve the robustness of subsequent resource selection analyses (RSF). The Moran’s I table provides an empirical basis for selecting optimal parameter values for generating pseudo-absence points.

df <- all_dfs_proc[[1]]
sum(is.na(df$lon) | is.na(df$lat))
sum(is.na(df$canopy_height))

coords <- as.matrix(df[, c("lon", "lat")])
dlist <- dnearneigh(coords, 0, 0.01, longlat = TRUE)
table(card(dlist))

Lets make 20x20 pixels for height

canopy20 <- aggregate(canopy, fact = 2, fun = mean, na.rm = TRUE)
canopy20_sd <- aggregate(canopy_sd, fact = 2, fun = mean, na.rm = TRUE)

agregar_canopy20 <- function(df, canopy20, canopy20_sd) {
  require(terra)
  moon_vect <- vect(df, geom = c("lon", "lat"), crs = crs(canopy20))
  df$canopy_height20   <- extract(canopy20, moon_vect)[,2]
  df$canopy_height_sd20 <- extract(canopy20_sd, moon_vect)[,2]
  return(df)
}
all_dfs_proc <- map(
  all_dfs_proc,
  ~ agregar_canopy20(.x, canopy20, canopy20_sd)
)

calc_moran <- function(df) {
  df <- df[!is.na(df$canopy_height) & !is.na(df$lon) & !is.na(df$lat), ]
  if(nrow(df) < 10) return(data.frame(moran_i=NA, pval=NA))
  if(nrow(df) > 500) df <- df[sample(1:nrow(df), 500), ]
  coords <- as.matrix(df[, c("lon", "lat")])
  
  # Primer intento: 0.002 grados (~222 m)
  dlist <- dnearneigh(coords, 0, 0.002, longlat = TRUE)
  if(any(card(dlist)==0)) {
    # Segundo intento: 0.005 grados (~555 m)
    dlist <- dnearneigh(coords, 0, 0.005, longlat = TRUE)
  }
  if(any(card(dlist)==0)) {
    # Tercer intento: 0.01 grados (~1.1 km)
    dlist <- dnearneigh(coords, 0, 0.01, longlat = TRUE)
  }
  if(any(card(dlist)==0)) {
    dlist <- knn2nb(knearneigh(coords, k=8))
  }
  if(any(card(dlist)==0)) return(data.frame(moran_i=NA, pval=NA)) # si todavía hay puntos sin vecinos
  
  lw <- nb2listw(dlist, style = "W", zero.policy = TRUE)
  res <- moran.test(df$canopy_height, lw, zero.policy = TRUE)
  moran_i <- res$estimate[1]
  pval <- res$p.value
  data.frame(moran_i = as.numeric(moran_i), pval = as.numeric(pval))
}

library(purrr)

moran_table <- imap_dfr(
  all_dfs_proc,
  function(df, nm) {
    n_fake <- as.numeric(sub("n([0-9]+)_b[0-9]+", "\\1", nm))
    buffer <- as.numeric(sub("n[0-9]+_b([0-9]+)", "\\1", nm))
    moran_res <- tryCatch(calc_moran(df), error=function(e) data.frame(moran_i=NA, pval=NA))
    data.frame(
      combo = nm,
      n_fake = n_fake,
      buffer = buffer,
      moran_i = moran_res$moran_i,
      pval = moran_res$pval
    )
  }
)

ggplot() +
  geom_line(
    data = simu_table,
    aes(x = factor(n_fake), y = moran_i_simu, group = interaction(buffer, rep), color = factor(buffer)),
    size = 0.4, alpha = 0.11
  ) +
  geom_line(
    data = media_table,
    aes(x = factor(n_fake), y = moran_i, group = factor(buffer), color = factor(buffer)),
    size = 1.6, alpha = 1
  ) +
  scale_color_viridis_d(name = "Buffer (m)", option = "plasma", end = 0.85) +
  scale_y_continuous(limits = c(-1, 1), breaks = seq(-1, 1, 0.2)) +
  scale_x_discrete(
    limits = as.character(c(1, 2, 5, 10, 15, 20)),
    breaks = as.character(c(1, 2, 5, 10, 15, 20)),
    expand = c(0,0)) +
  labs(
    x = "Number of pseudo-absence points per sleeping sites",
    y = "Moran's I (Canopy Height)"
  ) +
  theme_classic(base_size = 16) +
  theme(
    plot.title = element_text(size = 15, hjust = 0),
    axis.title = element_text(),
    axis.text = element_text(size = 13),
    legend.position = "top",
    legend.title = element_text(),
    legend.key.height = unit(0.7, "lines"),
    legend.key.width  = unit(2.5, "lines"))

n_fake <- c(1, 2, 5, 10, 15, 20)
buffers <- c(100, 200, 300, 400, 500)
n_rep <- 100   # número de réplicas

combos <- expand.grid(n_fake = n_fake, buffer = buffers)

sens_results <- vector("list", nrow(combos))

for(i in 1:nrow(combos)) {
  n_pts <- combos$n_fake[i]
  buffer <- combos$buffer[i]
  message("Simulating for n_fake = ", n_pts, ", buffer = ", buffer)
  
  sens_results[[i]] <- map(1:n_rep, function(r) {
    df_sim <- genera_puntos_falsos(moon, n_fake = n_pts, buffer = buffer, min_dist = 30)

    moon_used <- moon %>% mutate(used = TRUE)
    moon_all <- bind_rows(moon_used, df_sim)
    
    n_used <- sum(moon_all$used == TRUE)
    n_avail <- sum(moon_all$used == FALSE)
    moon_all <- moon_all %>%
      mutate(w = ifelse(used, 1, n_used / n_avail))
    
    fit <- glm(used ~ moon_light_effective_avg * canopy_height,
               data = moon_all,
               weights = w,
               family = binomial(link = "logit"))
    
    coefs <- tidy(fit) %>%
      filter(term %in% c("moon_light_effective_avg", "canopy_height", 
                         "moon_light_effective_avg:canopy_height"))
    coefs$n_fake <- n_pts
    coefs$buffer <- buffer
    coefs$rep <- r
    coefs
  }) %>% bind_rows()
}

all_sens <- bind_rows(sens_results)

summ_sens <- all_sens %>%
  group_by(n_fake, buffer, term) %>%
  summarise(
    mean_est = mean(estimate, na.rm=TRUE),
    sd_est = sd(estimate, na.rm=TRUE),
    range_est = max(estimate, na.rm=TRUE) - min(estimate, na.rm=TRUE),
    mean_abs_diff = mean(abs(diff(sort(estimate))), na.rm=TRUE),
    .groups = "drop"
  )

4 Sensitivity Analysis of RSF Coefficients

To properly account for the inherent randomness involved in generating available (pseudo-absence) points for our RSF models, we performed a sensitivity analysis inspired by the approach of Fieberg et al. (2021).

For each combination of buffer size and number of pseudo-absence points per roost, we repeated the random generation process multiple times (e.g., 100 replicates per scenario). For every replicate, we fitted a resource selection function (RSF) model that included the interaction between moonlight availability and canopy height.

We then extracted the coefficients of interest from each model fit and summarized their variability across replicates. This procedure allows us to directly quantify the uncertainty and stability of our RSF estimates, ensuring that our main conclusions are not driven by a particular random realization of available points.

This sensitivity analysis is critical because the spatial configuration of available points can influence both the magnitude and the significance of RSF coefficients. By systematically evaluating the spread and mean differences of coefficients across all randomizations, we can identify which parameter settings (buffer size and number of pseudo-absence points) yield the most robust and reliable inference.

Finally, the results of this sensitivity analysis will be integrated with our assessment of spatial autocorrelation (e.g., Moran’s I) to select the optimal pseudo-absence design for robust and unbiased ecological inference.

5 Optimal buffer and pseudo-absences

Balancing Buffer Size, Number of Pseudo-Absences, Sensitivity, and Spatial Autocorrelation

Let \(b\) denote the buffer size (in meters) and \(n\) the number of pseudo-absence points generated around each used point.

5.1 1. Sensitivity of RSF Estimators

For each \((b, n)\) combination, we repeat the random sampling process \(R\) times and fit a resource selection function (RSF) in each iteration, yielding \(R\) estimated coefficients \(\hat{\beta}_{b,n}^{(r)}\) for a parameter of interest (e.g., interaction between moonlight and canopy height):

\[ \hat{\beta}_{b,n}^{(r)}, \quad r = 1, 2, \ldots, R \]

We then quantify the sensitivity (variability) of the estimator as:

Sample variance: \[ \mathrm{Var}_{b,n} = \frac{1}{R-1} \sum_{r=1}^R \left(\hat{\beta}_{b,n}^{(r)} - \overline{\beta}_{b,n}\right)^2 \]
Mean absolute difference (MAD): \[ \mathrm{MAD}_{b,n} = \frac{1}{R(R-1)} \sum_{i \neq j} |\hat{\beta}_{b,n}^{(i)} - \hat{\beta}_{b,n}^{(j)}| \]

Where \(\overline{\beta}_{b,n}\) is the mean coefficient estimate for that combination.

5.2 2. Spatial Autocorrelation (Moran’s I)

For each combination, we calculate the spatial autocorrelation of a habitat covariate (e.g., canopy height) using Moran’s I statistic:

\[ I_{b,n} = \frac{N}{W} \cdot \frac{ \sum_{i=1}^{N} \sum_{j=1}^{N} w_{ij} (x_i - \overline{x})(x_j - \overline{x}) } { \sum_{i=1}^{N} (x_i - \overline{x})^2 } \]

where \(x_i\) is the covariate value at location \(i\), \(\overline{x}\) is the mean value, \(w_{ij}\) is the spatial weight between \(i\) and \(j\), \(N\) is the total number of points, and \(W = \sum_{i,j} w_{ij}\).

5.3 3. Objective: Joint Optimization

Our goal is to select \((b^*, n^*)\) such that:

The sensitivity of the estimators is minimized (i.e., low \(\mathrm{Var}_{b,n}\) or \(\mathrm{MAD}_{b,n}\))
The spatial autocorrelation is minimized (i.e., \(I_{b,n} \approx 0\) and not significant)

Formally,

\[ (b^*, n^*) = \arg\min_{b, n} \left\{ f\left( \mathrm{Var}_{b,n}, |I_{b,n}| \right) \right\} \]

where \(f(\cdot)\) is a function that balances estimator variability and spatial autocorrelation.

This joint optimization ensures that our RSF inferences are both stable to the random generation of available points and free from spatial artifacts.

ggplot(sim_df, aes(x = factor(n_fake), y = estimate)) +
  geom_boxplot(
    aes(fill = mad_gray, group = n_fake),
    color = "black", outlier.shape = NA, width = 0.75, size = 0.5
  ) +
  geom_jitter(
    width = 0.13, size = 3, alpha = 0.14, color = "black"
  ) +
  facet_grid(. ~ buffer, labeller = label_both) +
  scale_fill_gradient(
    low = "white", high = "black", name = "Mean Diff."
  ) +
  labs(
    x = "Pseudo-absence Points per Location",
    y = "Interaction Estimate (Moon x Canopy Height)"
  ) +
  theme_classic(base_size = 18) +
  theme(
    strip.background = element_blank(),
    strip.text.x = element_text(size = 19),
    axis.title = element_text(),
    legend.position = "bottom",
    legend.key.height = unit(1.3, "lines"),   
    legend.key.width  = unit(3.5, "lines"),  
    axis.text.x = element_text(hjust = 1))

Distribution of RSF interaction estimator values (moonlight × canopy height) for different numbers of pseudo-absence points and buffer sizes. Each boxplot represents the distribution of estimator values across simulations. The fill gradient indicates the mean absolute difference (mean diff.) in estimator values, with darker colors corresponding to lower variability (greater estimator stability).

sim_summary <- mad_table %>%   
  rename(mean_diff = mean_diff) %>%
  mutate(n_fake = as.numeric(as.character(n_fake)))

moran_summary <- media_table %>%   
  select(buffer, n_fake, moran_i) %>%
  mutate(n_fake = as.numeric(as.character(n_fake)))

balance_df <- left_join(sim_summary, moran_summary, by = c("buffer", "n_fake")) %>%
  group_by(buffer) %>%
  mutate(mean_diff_rescaled = scales::rescale(mean_diff, to = range(moran_i, na.rm = TRUE))) %>%
  ungroup()

balance_long <- balance_df %>%
  pivot_longer(
    cols = c(mean_diff_rescaled, moran_i),
    names_to = "metric",
    values_to = "value"
  ) %>%
  mutate(metric = recode(metric,
                         mean_diff_rescaled = "Estimator Variability",
                         moran_i = "Moran's I"))

highlight_300 <- balance_df %>%
  filter(buffer == 300, n_fake %in% c(10, 15)) %>%
  pivot_longer(
    cols = c(mean_diff_rescaled, moran_i),
    names_to = "metric",
    values_to = "value"
  ) %>%
  mutate(metric = recode(metric,
                         mean_diff_rescaled = "Estimator Variability",
                         moran_i = "Moran's I"))

ggplot(balance_long, aes(x = n_fake, y = value, group = metric, color = metric)) +
  geom_line(size = 1.4) +
  geom_point(size = 2) +
  facet_wrap(~ buffer, labeller = label_both) +
  scale_color_manual(
    name = "",
    values = c("Estimator Variability" = "black", "Moran's I" = viridis(2, end = 0.85)[2])
  ) +
  geom_point(
    data = highlight_300,
    aes(x = n_fake, y = value),
    color = "red", size = 7, shape = 3, inherit.aes = FALSE
  ) +
  labs(
    x = "Pseudo-absence Points per Location",
    y = "Value (rescaled)"
  ) +
  theme_classic(base_size = 16) +
  theme(
    legend.position = "top",
    legend.title = element_text(face = "bold"),
    axis.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    strip.text = element_text(size = 14, face = "bold")
  )

Balance between estimator variability and spatial autocorrelation (Moran’s I) for different numbers of pseudo-absence points (n_fake) and buffer sizes. The black line shows the variability of the RSF interaction estimator (moonlight × canopy height) across simulations, while the colored line shows the Moran’s I for canopy height. Red stars highlight the two best-performing combinations (buffer = 300 m and n_fake = 10 and 15), which represent optimal trade-offs between estimator stability and low spatial autocorrelation.