library(tidyverse)
library(terra)
library(suncalc)
library(purrr)
library(hexbin)
library(viridis)
library(geosphere)
library(scales)
library(spdep)
We prepared a set of high-resolution spatial datasets to characterize environmental conditions across all roost sites and their surroundings for subsequent analyses. Canopy height and its associated uncertainty were extracted from the 2020 ETH Global Canopy Height product at 10-m resolution (ETH Global Canopy Height 2020, S27W057; see Lang et al., 2022). These raster layers provide both mean canopy height and the associated standard deviation (canopy structural variability) for each grid cell.
Vegetation productivity was quantified via the Normalized Difference Vegetation Index (NDVI), derived from Sentinel-2 Level-2A imagery for representative dates of each season (winter, autumn, and summer). For each season, we used the red (B04) and near-infrared (B08) bands at 10-m spatial resolution, computing NDVI as (B08 - B04) / (B08 + B04). This resulted in a seasonal NDVI layer for each period, ensuring environmental covariates matched the phenological state of the forest at the time of field observations.
Weather data were obtained from local meteorological records (“clima.csv”), including daily total precipitation (pp), maximum temperature (t_max), and minimum temperature (t_min), and were later merged with the main dataset by date.
These data sources were loaded into R as raster or tabular objects and provided the basis for extracting, by spatial overlay or table join, the corresponding environmental values at each roost or pseudo-absence location for every night included in the analysis.
Canopy Height and Error
canopy <- rast("ETH_GlobalCanopyHeight_10m_2020_S27W057_Map.tif")
canopy_sd <- rast("ETH_GlobalCanopyHeight_10m_2020_S27W057_Map_SD.tif")
NDVI, Sentinel T21JYM_20220709T133839
b04_winter <- rast("T21JYM_20220709T133839_B04_10m.jp2")
b08_winter <- rast("T21JYM_20220709T133839_B08_10m.jp2")
ndvi_winter <- (b08_winter - b04_winter) / (b08_winter + b04_winter)
NDVI, Sentinel 20160317T134032
b04_aut <- rast("T21JYM_20160317T134032_B04_10m.jp2")
b08_aut <- rast("T21JYM_20160317T134032_B08_10m.jp2")
ndvi_aut <- (b08_aut - b04_aut) / (b08_aut + b04_aut)
NDVI , Sentinel 20221201T133841
b04_sum <- rast("T21JYM_20221201T133841_B04_10m.jp2")
b08_sum <- rast("T21JYM_20221201T133841_B08_10m.jp2")
ndvi_sum <- (b08_sum - b04_sum) / (b08_sum + b04_sum)
Weather information
This script implements a systematic approach to generate all
combinations of buffer size and number of random (pseudo-absence) points
around each roost (“used”) site, with the goal of optimizing the design
of resource selection analyses. For each possible combination of number
of random points (n_fake
) and buffer radius, the following
steps are executed:
n_fake
random points are generated at
random bearings (0–360°) and random distances (uniformly distributed
between a minimum and maximum buffer distance) from the observed roost
location. A minimum distance (min_dist
) is enforced between
all random points generated for the same night to reduce spatial
clustering.This approach is fully automated, enabling rapid testing of all pairwise combinations of six candidate numbers of random points and five buffer sizes (30 total combinations).
We systematically explored all parameter combinations in order to
balance two methodological goals:
(i) minimizing spatial autocorrelation between “used” and “available”
points (which can arise if random points are placed too close together
or too close to the used location), and
(ii) maximizing stability and robustness of estimated coefficients in
RSF (Resource Selection Function) models, given the inherent
stochasticity in random point generation.
By storing the full set of possible design permutations, we were able to quantify how variation in random point number and buffer size affects both spatial structure in the data and model outcomes, and to select optimal parameter values for final inference. This design ensures transparent, reproducible, and robust ecological modeling.
n_fakes <- c(1, 2, 5, 10, 15, 20)
buffers <- c(100, 200, 300, 400, 500)
min_dist <- 30
combos <- expand.grid(n_fake = n_fakes, buffer = buffers)
set.seed(42)
all_dfs <- list()
for (i in 1:nrow(combos)) {
n_fake <- combos$n_fake[i]
buffer <- combos$buffer[i]
message("Combinando: ", n_fake, " puntos, buffer: ", buffer)
# 1. Puntos FALSOS
pseudo_data <- moon %>%
rowwise() %>%
do({
res <- generate_pseudo_points(
lat = .$lat, lon = .$lon,
n_fake = n_fake, buffer = buffer, min_dist = min_dist,
fecha = .$fecha,
group = .$group,
moon_fraction = .$moon_fraction,
moon_phase_angle = .$moon_phase_angle,
moon_phase_name = .$moon_phase_name,
moon_visible = .$moon_visible,
used = FALSE
)
if (is.null(res)) tibble(
lon = numeric(0), lat = numeric(0),
fecha = character(0), group = character(0),
moon_fraction = numeric(0), moon_phase_angle = numeric(0),
moon_phase_name = character(0), moon_visible = logical(0),
used = logical(0)
)
else res
}) %>%
ungroup() %>%
mutate(n_fake = n_fake, buffer = buffer)
moon_used <- moon %>%
mutate(used = TRUE,
n_fake = n_fake,
buffer = buffer) %>%
select(lon, lat, fecha, group, moon_fraction, moon_phase_angle,
moon_phase_name, moon_visible, used, n_fake, buffer)
moon_all <- bind_rows(moon_used, pseudo_data)
moon_all <- moon_all %>%
mutate(event_key = paste(group, fecha, sep = "_")) %>%
group_by(event_key) %>%
mutate(id = cur_group_id()) %>%
ungroup() %>%
select(-event_key)
# 5. Guardar en lista
all_dfs[[paste0("n", n_fake, "_b", buffer)]] <- moon_all
}
We defined dedicated R functions to systematically annotate each point in our datasets with key environmental covariates required for our analyses:
agregar_moon_luz()
This function calculates temporally- and spatially-explicit lunar illumination indices for each point. For every location, it:
getMoonPosition()
function.These steps ensure that the model incorporates realistic, time-specific measures of nocturnal lunar illumination at each site, which is ecologically relevant for species with night-time activity.
agregar_canopy()
This function extracts canopy structure variables for each point, by:
By using these functions in sequence, we consistently enrich each observation with the relevant spatially- and temporally-matched environmental predictors needed for robust resource selection modeling.
agregar_moon_luz <- function(df) {
df <- df %>%
mutate(
fecha_a = as.POSIXct(paste(as.Date(fecha) - 1, "20:00:00"), tz = "UTC"),
fecha_b = as.POSIXct(paste(as.Date(fecha), "00:00:00"), tz = "UTC"),
fecha_c = as.POSIXct(paste(as.Date(fecha), "04:00:00"), tz = "UTC")
)
pos_a <- getMoonPosition(data = df %>% rename(date = fecha_a) %>% select(date, lat, lon))
pos_b <- getMoonPosition(data = df %>% rename(date = fecha_b) %>% select(date, lat, lon))
pos_c <- getMoonPosition(data = df %>% rename(date = fecha_c) %>% select(date, lat, lon))
alt_scaled_a <- pmax(0, sin(pos_a$altitude))
alt_scaled_b <- pmax(0, sin(pos_b$altitude))
alt_scaled_c <- pmax(0, sin(pos_c$altitude))
df <- df %>%
mutate(
moon_altitude_avg = (alt_scaled_a + alt_scaled_b + alt_scaled_c) / 3,
moon_light_effective_avg = moon_fraction * moon_altitude_avg
)
return(df)
}
agregar_canopy <- function(df, canopy, canopy_sd) {
moon_vect <- vect(df, geom = c("lon", "lat"), crs = crs(canopy))
df$canopy_height <- extract(canopy, moon_vect)[,2]
df$canopy_height_sd <- extract(canopy_sd, moon_vect)[,2]
return(df)
}
Checking everything is fine
Lets see this visually
plot_df <- purrr::map_dfr(
comb_names,
~ dplyr::mutate(all_dfs_proc[[.x]], combination = .x)
)
plot_df$used <- as.factor(plot_df$used)
ggplot(plot_df, aes(x = moon_fraction, y = canopy_height)) +
geom_jitter(
aes(fill = used),
shape = 21, size = 3, stroke = 0.3,
alpha = 0.05, color = "black",
width = 0.03,
show.legend = TRUE
) +
geom_smooth(
aes(color = used),
method = "lm",
se = TRUE,
size = 1.9
) +
scale_fill_manual(
values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
labels = c("Available", "Used"),
name = "Points"
) +
scale_color_manual(
values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
labels = c("Available", "Used"),
name = "Trend"
) +
guides(
fill = guide_legend(override.aes = list(alpha = 1)),
color = guide_legend(override.aes = list(size = 1.5))
) +
labs(
x = "Moon Fraction",
y = "Canopy Height (m)"
) +
facet_wrap(~ combination, nrow = 2) +
theme_classic(base_size = 14) +
theme(
plot.title = element_text(size = 15, face = "bold"),
axis.title = element_text(),
legend.title = element_text(face = "bold"),
legend.position = "right"
)
### Function:
agregar_ndvi()
This function assigns seasonally-matched vegetation productivity values to each point based on NDVI (Normalized Difference Vegetation Index) calculated from Sentinel-2 imagery. For each observation, the function:
ndvi_sentinel
) in the data
frame, ensuring that each observation receives the NDVI value that
reflects the correct phenological context.By doing so, this function ensures that each “used” or “available” location is accurately characterized in terms of contemporaneous vegetation productivity, a key predictor in ecological and resource selection analyses.
agregar_ndvi <- function(df, ndvi_winter, ndvi_aut, ndvi_sum) {
df <- df %>%
mutate(season = case_when(
month(fecha) %in% c(12, 1, 2) ~ "summer",
month(fecha) %in% c(3, 4, 5) ~ "autumn",
month(fecha) %in% c(6, 7, 8) ~ "winter",
month(fecha) %in% c(9, 10, 11) ~ "spring"
))
names(ndvi_winter) <- "ndvi"
names(ndvi_aut) <- "ndvi"
names(ndvi_sum) <- "ndvi"
df$ndvi_sentinel <- NA_real_
for (season_name in c("winter", "autumn", "spring", "summer")) {
idx <- which(df$season == season_name)
if(length(idx) == 0) next
ndvi_raster <- switch(
season_name,
"winter" = ndvi_winter,
"autumn" = ndvi_aut,
"spring" = ndvi_sum,
"summer" = ndvi_sum
)
moon_sub <- df[idx, ]
vect_sub <- vect(moon_sub, geom = c("lon", "lat"), crs = "EPSG:4326") %>%
project(crs(ndvi_raster))
ndvi_vals <- extract(ndvi_raster, vect_sub)[, 2]
df$ndvi_sentinel[idx] <- ndvi_vals
}
return(df)
}
add_weather_vars()
This function appends daily weather variables—precipitation (pp),
maximum temperature (t_max), and minimum temperature (t_min), to each
point by joining on the observation date.
It ensures that each “used” and “available” point is annotated with the
relevant weather conditions for its respective night.
add_weather_vars <- function(df, weather) {
df <- df %>%
left_join(weather %>% select(fecha, pp), by = "fecha") %>%
left_join(weather %>% select(fecha, t_max), by = "fecha") %>%
left_join(weather %>% select(fecha, t_min), by = "fecha")
return(df)
}
plot_df <- map_dfr(
comb_names,
~ all_dfs_proc[[.x]] %>%
mutate(ndvi_cat = ntile(ndvi_sentinel, 3),
combination = .x)
)
plot_df$used <- as.factor(plot_df$used)
ggplot(plot_df, aes(x = moon_light_effective_avg, y = canopy_height)) +
geom_jitter(
aes(fill = used),
shape = 21, size = 3, stroke = 0.3,
alpha = 0.05, color = "black",
width = 0.03,
show.legend = TRUE
) +
geom_smooth(
aes(color = used),
method = "lm",
se = TRUE,
size = 1.9
) +
facet_grid(combination ~ ndvi_cat, labeller = label_both) +
scale_fill_manual(
values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
labels = c("Available", "Used"),
name = "Points"
) +
scale_color_manual(
values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
labels = c("Available", "Used"),
name = "Trend"
) +
labs(
title = "Moonlight vs. Canopy Height across NDVI Tiles\nby Combo (n_fake/buffer)",
x = "Effective Moonlight Index",
y = "Canopy Height (m)"
) +
theme_classic(base_size = 14) +
theme(
plot.title = element_text(size = 15, face = "bold"),
axis.title = element_text(),
legend.title = element_text(face = "bold"),
legend.position = "right"
)
calc_moran()
and Moran’s I Analysis PipelineThe function calc_moran()
quantifies spatial
autocorrelation in canopy height across all locations within a given
dataset using Moran’s I statistic. Specifically:
moran.test
) to assess the degree of spatial clustering in
canopy height values.This function is applied iteratively across all combinations of
randomization scenarios (all_dfs_proc
), automatically
extracting the number of random points (n_fake
) and buffer
size from each dataset’s identifier. The results are consolidated in a
single summary table (moran_table
), allowing comparison of
spatial autocorrelation across all parameter combinations.
For each combination of pseudo-absence point number and buffer size, we quantified the spatial autocorrelation of canopy height values using Moran’s I statistic. The analysis proceeds as follows:
Interpretation:
- A high, positive Moran’s I value indicates that points with similar
canopy height tend to be spatially clustered (high autocorrelation). - A
value near zero suggests little to no spatial structure (randomness). -
A negative value implies spatial dispersion (points with dissimilar
values are close together).
Purpose in the study:
This analysis allows us to systematically assess how different
randomization designs (i.e., number of pseudo-absence points and buffer
size) affect the spatial autocorrelation present in our environmental
covariate data. By minimizing spatial autocorrelation in the generated
points, we reduce the risk of statistical biases and improve the
robustness of subsequent resource selection analyses (RSF). The Moran’s
I table provides an empirical basis for selecting optimal parameter
values for generating pseudo-absence points.
df <- all_dfs_proc[[1]]
sum(is.na(df$lon) | is.na(df$lat))
sum(is.na(df$canopy_height))
coords <- as.matrix(df[, c("lon", "lat")])
dlist <- dnearneigh(coords, 0, 0.01, longlat = TRUE)
table(card(dlist))
Lets make 20x20 pixels for height
canopy20 <- aggregate(canopy, fact = 2, fun = mean, na.rm = TRUE)
canopy20_sd <- aggregate(canopy_sd, fact = 2, fun = mean, na.rm = TRUE)
agregar_canopy20 <- function(df, canopy20, canopy20_sd) {
require(terra)
moon_vect <- vect(df, geom = c("lon", "lat"), crs = crs(canopy20))
df$canopy_height20 <- extract(canopy20, moon_vect)[,2]
df$canopy_height_sd20 <- extract(canopy20_sd, moon_vect)[,2]
return(df)
}
all_dfs_proc <- map(
all_dfs_proc,
~ agregar_canopy20(.x, canopy20, canopy20_sd)
)
calc_moran <- function(df) {
df <- df[!is.na(df$canopy_height) & !is.na(df$lon) & !is.na(df$lat), ]
if(nrow(df) < 10) return(data.frame(moran_i=NA, pval=NA))
if(nrow(df) > 500) df <- df[sample(1:nrow(df), 500), ]
coords <- as.matrix(df[, c("lon", "lat")])
# Primer intento: 0.002 grados (~222 m)
dlist <- dnearneigh(coords, 0, 0.002, longlat = TRUE)
if(any(card(dlist)==0)) {
# Segundo intento: 0.005 grados (~555 m)
dlist <- dnearneigh(coords, 0, 0.005, longlat = TRUE)
}
if(any(card(dlist)==0)) {
# Tercer intento: 0.01 grados (~1.1 km)
dlist <- dnearneigh(coords, 0, 0.01, longlat = TRUE)
}
if(any(card(dlist)==0)) {
dlist <- knn2nb(knearneigh(coords, k=8))
}
if(any(card(dlist)==0)) return(data.frame(moran_i=NA, pval=NA)) # si todavía hay puntos sin vecinos
lw <- nb2listw(dlist, style = "W", zero.policy = TRUE)
res <- moran.test(df$canopy_height, lw, zero.policy = TRUE)
moran_i <- res$estimate[1]
pval <- res$p.value
data.frame(moran_i = as.numeric(moran_i), pval = as.numeric(pval))
}
library(purrr)
moran_table <- imap_dfr(
all_dfs_proc,
function(df, nm) {
n_fake <- as.numeric(sub("n([0-9]+)_b[0-9]+", "\\1", nm))
buffer <- as.numeric(sub("n[0-9]+_b([0-9]+)", "\\1", nm))
moran_res <- tryCatch(calc_moran(df), error=function(e) data.frame(moran_i=NA, pval=NA))
data.frame(
combo = nm,
n_fake = n_fake,
buffer = buffer,
moran_i = moran_res$moran_i,
pval = moran_res$pval
)
}
)
ggplot() +
geom_line(
data = simu_table,
aes(x = factor(n_fake), y = moran_i_simu, group = interaction(buffer, rep), color = factor(buffer)),
size = 0.4, alpha = 0.11
) +
geom_line(
data = media_table,
aes(x = factor(n_fake), y = moran_i, group = factor(buffer), color = factor(buffer)),
size = 1.6, alpha = 1
) +
scale_color_viridis_d(name = "Buffer (m)", option = "plasma", end = 0.85) +
scale_y_continuous(limits = c(-1, 1), breaks = seq(-1, 1, 0.2)) +
scale_x_discrete(
limits = as.character(c(1, 2, 5, 10, 15, 20)),
breaks = as.character(c(1, 2, 5, 10, 15, 20)),
expand = c(0,0)) +
labs(
x = "Number of pseudo-absence points per sleeping sites",
y = "Moran's I (Canopy Height)"
) +
theme_classic(base_size = 16) +
theme(
plot.title = element_text(size = 15, hjust = 0),
axis.title = element_text(),
axis.text = element_text(size = 13),
legend.position = "top",
legend.title = element_text(),
legend.key.height = unit(0.7, "lines"),
legend.key.width = unit(2.5, "lines"))
n_fake <- c(1, 2, 5, 10, 15, 20)
buffers <- c(100, 200, 300, 400, 500)
n_rep <- 100 # número de réplicas
combos <- expand.grid(n_fake = n_fake, buffer = buffers)
sens_results <- vector("list", nrow(combos))
for(i in 1:nrow(combos)) {
n_pts <- combos$n_fake[i]
buffer <- combos$buffer[i]
message("Simulating for n_fake = ", n_pts, ", buffer = ", buffer)
sens_results[[i]] <- map(1:n_rep, function(r) {
df_sim <- genera_puntos_falsos(moon, n_fake = n_pts, buffer = buffer, min_dist = 30)
moon_used <- moon %>% mutate(used = TRUE)
moon_all <- bind_rows(moon_used, df_sim)
n_used <- sum(moon_all$used == TRUE)
n_avail <- sum(moon_all$used == FALSE)
moon_all <- moon_all %>%
mutate(w = ifelse(used, 1, n_used / n_avail))
fit <- glm(used ~ moon_light_effective_avg * canopy_height,
data = moon_all,
weights = w,
family = binomial(link = "logit"))
coefs <- tidy(fit) %>%
filter(term %in% c("moon_light_effective_avg", "canopy_height",
"moon_light_effective_avg:canopy_height"))
coefs$n_fake <- n_pts
coefs$buffer <- buffer
coefs$rep <- r
coefs
}) %>% bind_rows()
}
all_sens <- bind_rows(sens_results)
summ_sens <- all_sens %>%
group_by(n_fake, buffer, term) %>%
summarise(
mean_est = mean(estimate, na.rm=TRUE),
sd_est = sd(estimate, na.rm=TRUE),
range_est = max(estimate, na.rm=TRUE) - min(estimate, na.rm=TRUE),
mean_abs_diff = mean(abs(diff(sort(estimate))), na.rm=TRUE),
.groups = "drop"
)
To properly account for the inherent randomness involved in generating available (pseudo-absence) points for our RSF models, we performed a sensitivity analysis inspired by the approach of Fieberg et al. (2021).
For each combination of buffer size and number of pseudo-absence points per roost, we repeated the random generation process multiple times (e.g., 100 replicates per scenario). For every replicate, we fitted a resource selection function (RSF) model that included the interaction between moonlight availability and canopy height.
We then extracted the coefficients of interest from each model fit and summarized their variability across replicates. This procedure allows us to directly quantify the uncertainty and stability of our RSF estimates, ensuring that our main conclusions are not driven by a particular random realization of available points.
This sensitivity analysis is critical because the spatial configuration of available points can influence both the magnitude and the significance of RSF coefficients. By systematically evaluating the spread and mean differences of coefficients across all randomizations, we can identify which parameter settings (buffer size and number of pseudo-absence points) yield the most robust and reliable inference.
Finally, the results of this sensitivity analysis will be integrated with our assessment of spatial autocorrelation (e.g., Moran’s I) to select the optimal pseudo-absence design for robust and unbiased ecological inference.
Balancing Buffer Size, Number of Pseudo-Absences, Sensitivity, and Spatial Autocorrelation
Let \(b\) denote the buffer size (in meters) and \(n\) the number of pseudo-absence points generated around each used point.
For each \((b, n)\) combination, we repeat the random sampling process \(R\) times and fit a resource selection function (RSF) in each iteration, yielding \(R\) estimated coefficients \(\hat{\beta}_{b,n}^{(r)}\) for a parameter of interest (e.g., interaction between moonlight and canopy height):
\[ \hat{\beta}_{b,n}^{(r)}, \quad r = 1, 2, \ldots, R \]
We then quantify the sensitivity (variability) of the estimator as:
Sample variance: \[ \mathrm{Var}_{b,n} = \frac{1}{R-1} \sum_{r=1}^R \left(\hat{\beta}_{b,n}^{(r)} - \overline{\beta}_{b,n}\right)^2 \]
Mean absolute difference (MAD): \[ \mathrm{MAD}_{b,n} = \frac{1}{R(R-1)} \sum_{i \neq j} |\hat{\beta}_{b,n}^{(i)} - \hat{\beta}_{b,n}^{(j)}| \]
Where \(\overline{\beta}_{b,n}\) is the mean coefficient estimate for that combination.
For each combination, we calculate the spatial autocorrelation of a habitat covariate (e.g., canopy height) using Moran’s I statistic:
\[ I_{b,n} = \frac{N}{W} \cdot \frac{ \sum_{i=1}^{N} \sum_{j=1}^{N} w_{ij} (x_i - \overline{x})(x_j - \overline{x}) } { \sum_{i=1}^{N} (x_i - \overline{x})^2 } \]
where \(x_i\) is the covariate value at location \(i\), \(\overline{x}\) is the mean value, \(w_{ij}\) is the spatial weight between \(i\) and \(j\), \(N\) is the total number of points, and \(W = \sum_{i,j} w_{ij}\).
Our goal is to select \((b^*, n^*)\) such that:
Formally,
\[ (b^*, n^*) = \arg\min_{b, n} \left\{ f\left( \mathrm{Var}_{b,n}, |I_{b,n}| \right) \right\} \]
where \(f(\cdot)\) is a function that balances estimator variability and spatial autocorrelation.
This joint optimization ensures that our RSF inferences are both stable to the random generation of available points and free from spatial artifacts.
ggplot(sim_df, aes(x = factor(n_fake), y = estimate)) +
geom_boxplot(
aes(fill = mad_gray, group = n_fake),
color = "black", outlier.shape = NA, width = 0.75, size = 0.5
) +
geom_jitter(
width = 0.13, size = 3, alpha = 0.14, color = "black"
) +
facet_grid(. ~ buffer, labeller = label_both) +
scale_fill_gradient(
low = "white", high = "black", name = "Mean Diff."
) +
labs(
x = "Pseudo-absence Points per Location",
y = "Interaction Estimate (Moon x Canopy Height)"
) +
theme_classic(base_size = 18) +
theme(
strip.background = element_blank(),
strip.text.x = element_text(size = 19),
axis.title = element_text(),
legend.position = "bottom",
legend.key.height = unit(1.3, "lines"),
legend.key.width = unit(3.5, "lines"),
axis.text.x = element_text(hjust = 1))
Distribution of RSF interaction estimator values (moonlight × canopy height) for different numbers of pseudo-absence points and buffer sizes. Each boxplot represents the distribution of estimator values across simulations. The fill gradient indicates the mean absolute difference (mean diff.) in estimator values, with darker colors corresponding to lower variability (greater estimator stability).
sim_summary <- mad_table %>%
rename(mean_diff = mean_diff) %>%
mutate(n_fake = as.numeric(as.character(n_fake)))
moran_summary <- media_table %>%
select(buffer, n_fake, moran_i) %>%
mutate(n_fake = as.numeric(as.character(n_fake)))
balance_df <- left_join(sim_summary, moran_summary, by = c("buffer", "n_fake")) %>%
group_by(buffer) %>%
mutate(mean_diff_rescaled = scales::rescale(mean_diff, to = range(moran_i, na.rm = TRUE))) %>%
ungroup()
balance_long <- balance_df %>%
pivot_longer(
cols = c(mean_diff_rescaled, moran_i),
names_to = "metric",
values_to = "value"
) %>%
mutate(metric = recode(metric,
mean_diff_rescaled = "Estimator Variability",
moran_i = "Moran's I"))
highlight_300 <- balance_df %>%
filter(buffer == 300, n_fake %in% c(10, 15)) %>%
pivot_longer(
cols = c(mean_diff_rescaled, moran_i),
names_to = "metric",
values_to = "value"
) %>%
mutate(metric = recode(metric,
mean_diff_rescaled = "Estimator Variability",
moran_i = "Moran's I"))
ggplot(balance_long, aes(x = n_fake, y = value, group = metric, color = metric)) +
geom_line(size = 1.4) +
geom_point(size = 2) +
facet_wrap(~ buffer, labeller = label_both) +
scale_color_manual(
name = "",
values = c("Estimator Variability" = "black", "Moran's I" = viridis(2, end = 0.85)[2])
) +
geom_point(
data = highlight_300,
aes(x = n_fake, y = value),
color = "red", size = 7, shape = 3, inherit.aes = FALSE
) +
labs(
x = "Pseudo-absence Points per Location",
y = "Value (rescaled)"
) +
theme_classic(base_size = 16) +
theme(
legend.position = "top",
legend.title = element_text(face = "bold"),
axis.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(size = 14, face = "bold")
)
Balance between estimator variability and spatial autocorrelation (Moran’s I) for different numbers of pseudo-absence points (n_fake) and buffer sizes. The black line shows the variability of the RSF interaction estimator (moonlight × canopy height) across simulations, while the colored line shows the Moran’s I for canopy height. Red stars highlight the two best-performing combinations (buffer = 300 m and n_fake = 10 and 15), which represent optimal trade-offs between estimator stability and low spatial autocorrelation.