Moon Cycle in relation to Sleeping site Selection

1 Libraries

We relied on a combination of spatial, statistical, and visualization packages in R to conduct this analysis. Each library played a specific role:

tidyverse: A suite of packages for data manipulation and visualization. Core functions from dplyr, ggplot2, tibble, and readr were used to wrangle and plot data in a consistent and readable syntax.
terra: Used for handling spatial raster data. We imported and processed the global canopy height raster (Lang et al. 2022), extracted values, and performed cropping and zooming for localized analysis.
suncalc: Provided astronomical calculations, including moon illumination fraction, moonrise and moonset times, and phase information. This allowed us to incorporate lunar variables into our ecological questions.
purrr: Part of the tidyverse ecosystem, purrr was essential for iterating over rows and applying functions (e.g., moon calculations) across date and location combinations in a clean and vectorized way.
hexbin: Enabled the creation of hexagonal binned plots to explore the density distribution of sleeping sites in relation to canopy height and moonlight, which helps overcome overplotting in dense datasets.
viridis: Provided perceptually uniform and colorblind-friendly color scales for raster and density plots, improving visual interpretation of ecological gradients.
geosphere: Used for accurate geographic calculations such as generating random points within a radius from a known location (destPoint()), taking Earth’s curvature into account, crucial for ecological realism in spatial simulation.

library(tidyverse)
library(terra)
library(suncalc)
library(purrr)
library(hexbin)
library(viridis)
library(geosphere)
library(scales)
library(spdep)
library(ggridges)
library(patchwork)
library(flextable)

1.1 Description of the moon.rds dataset

The moon.rds file contains georeferenced data on sleeping sites of four groups of black capuchin monkeys (Sapajus nigritus) in **Iguazú National Park, Argentina, collected between 2011 and 2023.

Only night-time sleeping locations were considered. Specifically, each record corresponds to the last GPS point of the day (and always after sunset), collected after the monkeys had settled for the night, and does not include morning sleeping positions to avoid duplicates.

Each observation includes:

GPS locations (longitude and latitude in WGS84), obtained either via:
GPS collars, only including high-precision data based on HDOP and satellite number
Direct field observations collected with handheld GPS devices
The Date of the sleeping location record, which allows the estimation of lunar illumination (moon phases and altitude) for each night; detailed below.

This dataset is designed to support analyses exploring the relationship between lunar cycles and illumination and sleeping site selection, such as sleeping height or location in relation to moonlight.

moon <- readRDS("moon2.rds")

1.2 Calculating Lunar Illumination

To evaluate how lunar conditions might influence the sleeping site selection of black capuchin monkeys (Sapajus nigritus), we calculated key lunar variables using the suncalc package.

In the code below, for each observation date in the moon dataset, we extracted:

moon_fraction: the proportion of the moon’s disc illuminated on that night (ranging from 0 for New Moon to 1 for Full Moon).
moon_phase_angle: a normalized value (0 to 1) indicating the moon’s position within its cycle (e.g., 0 = New Moon, 0.5 = Full Moon).
moon_altitude: which was setted for three specific hours during the night and then averaged, procedure is detailed below.

These lunar metrics might be ecologically relevant because moonlight intensity can influence nocturnal visibility and perceived predation risk. In primates, including capuchins, we predict that individuals may select higher sleeping sites during dark nights, which might be linked to a reduction in predation risk perception.

By incorporating lunar illumination data, we can test whether capuchin monkeys adjust their sleeping site height or location in response to varying moonlight conditions across nights and seasons.

moon <- moon %>%
  mutate(
    moon_fraction = sapply(fecha, function(f) getMoonIllumination(as.POSIXct(f))$fraction),
    moon_phase_angle = sapply(fecha, function(f) getMoonIllumination(as.POSIXct(f))$phase))

moon <- moon %>%
  mutate(season = case_when(
    month(fecha) %in% c(12, 1, 2) ~ "Summer",
    month(fecha) %in% c(3, 4, 5) ~ "Autumn",
    month(fecha) %in% c(6, 7, 8) ~ "Winter",
    month(fecha) %in% c(9, 10, 11) ~ "Spring"
  ))
estaciones <- moon %>%
  count(season, name = "n_dormideros")

estaciones %>%
  summarize(
    mean = mean(n_dormideros),
    sd = sd(n_dormideros))

1.3 Classifying Moon Phase Categories

To further interpret lunar illumination patterns, we classified each observation into a moon phase category based on the moon_phase_angle value. The moon’s cycle is continuous, but for ecological analyses, it is often useful to group nights into broader phase categories that may reflect different levels of moonlight or perceived predation risk.

We defined the categories as follows:

New Moon: moon_phase_angle < 0.1 or > 0.9
(Very low illumination; darkest nights)
Waxing: between 0.1 and 0.4
(Increasing illumination; transition toward full moon)
Full Moon: between 0.4 and 0.6
(Peak illumination)
Waning: between 0.6 and 0.9
(Decreasing illumination after full moon)

Ecologically, these categories can help determine whether capuchinn monkeys adjust their sleeping behavior, such as height in the canopy or site exposure, in response to the amount of ambient nocturnal light.

moon <- moon %>%
  mutate(
    moon_phase_name = case_when(
      moon_phase_angle < 0.1 | moon_phase_angle > 0.9 ~ "New Moon",
      moon_phase_angle >= 0.1 & moon_phase_angle < 0.4 ~ "Waxing",
      moon_phase_angle >= 0.4 & moon_phase_angle <= 0.6 ~ "Full Moon",
      moon_phase_angle > 0.6 & moon_phase_angle <= 0.9 ~ "Waning" ))

1.4 Moonrise and Moonset Times

To better understand the temporal availability of moonlight, we calculated the moonrise and moonset times for each observation using the getMoonTimes() function from the suncalc package. This helps determine whether the moon was actually present in the sky during the night, key complement to the phase and illumination fraction.

We specified the location (latitude and longitude) of each observation, along with the observation date and the appropriate time zone for Iguazú National Park (America/Argentina/Cordoba)

These variables allow us to identify whether the moon was above the horizon during the active nighttime period. This is ecologically meaningful because a night with high lunar illumination on paper (e.g., a full moon) might still be functionally dark if the moon had already set by the time the monkeys were selecting or using their sleeping site.

then we can calculate:

Whether the moon was visible during key night hours (e.g., 19:00–05:00),

The duration the moon was above the horizon,

And how these metrics relate to sleeping site selection and height.

moon <- moon %>%
  mutate(
    moon_times = pmap(
      list(fecha, lat, lon),
      function(f, la, lo) {
        getMoonTimes(
          date = f,
          lat = la,
          lon = lo,
          keep = c("rise", "set"),
          tz = "America/Argentina/Cordoba")})) %>%
  mutate(
    moon_rise = map(moon_times, "rise") %>% flatten_dbl(),
    moon_set  = map(moon_times, "set") %>% flatten_dbl()) %>%
  select(-moon_times)

moon <- moon %>%
  mutate(
    moon_rise = as.POSIXct(moon_rise, origin = "1970-01-01", tz = "America/Argentina/Cordoba"),
    moon_set  = as.POSIXct(moon_set,  origin = "1970-01-01", tz = "America/Argentina/Cordoba"))

2 Moon Visibility and Duration

We also calculated whether the moon was actually visible in the sky during each night, and for how long, based on the previously extracted moonrise and moonset times.

moon_duration_hr: the number of hours the moon was above the horizon that night.

moon_visible: a logical indicator (TRUE or FALSE) denoting whether the moon rose and set that night (i.e., was visible at some point).

These variables are ecologically important because they account for actual moonlight exposure at ground level. For example, even on a night with a full moon, if the moon had already set before the monkeys began selecting sleeping sites, its potential influence on behavior would be minimal. Conversely, on darker nights (low lunar fraction), a visible moon even for a few hours could still provide light that may influence perceived predation risk, vigilance, or site choice.

By integrating both moon phase and sky visibility, we can better capture the real nocturnal light environment experienced by the monkeys.

moon <- moon %>%
  mutate(
    moon_duration_hr = as.numeric(difftime(moon_set, moon_rise, units = "hours")),
    moon_visible = !is.na(moon_rise) & !is.na(moon_set) & moon_duration_hr > 0)

ggplot(moon, aes(x = group, y = moon_fraction, color = group)) +
  geom_jitter(width = 0.2, height = 0, alpha = 0.85, size = 2.8) +

  annotate("text", x = 4.5, y = 0, label = "🌑", size = 6) +
  annotate("text", x = 4.5, y = 0.5, label = "🌓", size = 6) +
  annotate("text", x = 4.5, y = 1, label = "🌕", size = 6) +

  scale_color_manual(values = c(
    "gun" = "darkorange3",
    "gue" = "midnightblue",
    "mac" = "goldenrod2",
    "spo" = "darkseagreen4"  #  "slategray4" could be a niceee option
  )) +

  labs(
    title = "Observed Moon Illumination per Group",
    x = "Monkey Group",
    y = "Moon Illumination Fraction"
  ) +
  coord_cartesian(ylim = c(-0.05, 1.05)) +
  theme_bw(base_size = 14) +
  theme(
    legend.position = "none")

$Distribution of moon illumination fractions across monkey groups. Each point represents a recorded sleeping site, colored by group. Lunar phase icons on the right (🌑 new moon, 🌓 half moon, 🌕 full moon) provide a visual scale reference.$

Distribution of moon illumination fractions across monkey groups. Each point represents a recorded sleeping site, colored by group. Lunar phase icons on the right (🌑 new moon, 🌓 half moon, 🌕 full moon) provide a visual scale reference.

This plot shows the distribution of observed lunar illumination fractions (moon_fraction) across all recorded sleeping site events, separated by monkey group.

Each point represents a single observation, reflecting the proportion of the moon’s surface illuminated that night (0 = 🌑 New Moon, 1 = 🌕 Full Moon). The data appears to be well-distributed across the lunar cycle, with no strong evidence of sampling bias across groups — all groups show a broad range of moonlight conditions.

This preliminary exploration supports the idea that the dataset is suitable for investigating whether capuchin monkeys adjust their sleeping site choices (e.g., height or exposure) in response to nocturnal moonlight levels.

To explore the seasonality in sleeping site selection, two visualizations were created using the moon illumination fraction as the key variable. The first plot (circular) positions each observation according to the day of the year (angle) and moon illumination (radius), with a continuous color scale indicating the brightness of the moon. This layout helps reveal potential seasonal concentrations and lunar phase patterns in spatial behavior. The second plot, shown in a cartesian format, displays the same information across calendar months, allowing for clearer month-to-month comparisons. In both cases, the use of color to represent moonlight intensity provides an intuitive visual cue for interpreting temporal patterns in sleeping site usage.

moon <- moon %>%
  mutate(
    yday = yday(fecha),
    angle = 2 * pi * yday / 365 )

ggplot(moon, aes(x = angle, y = moon_fraction, fill = moon_fraction)) +
  geom_tile(width = 0.75, height = 0.19) +  

  scale_fill_viridis_c(
    option = "B",
    name = "Moon Illum.",
    guide = guide_colorbar(barheight = 8)) +

  coord_polar(start = -pi/2) +

  scale_x_continuous(
    breaks = seq(0, 2*pi, length.out = 13)[-13],
    labels = month.abb
  ) +
  scale_y_continuous(limits = c(0, 1), expand = c(0, 0)) +

  labs(
    title = "Seasonality of Sleeping Sites by Moon Phase",
    x = NULL, y = NULL) +

  theme_minimal(base_size = 15) +
  theme(
    axis.text.x = element_text(face = "bold"),
    axis.text.y = element_blank(),
    panel.grid = element_blank(),
    axis.ticks = element_blank(),
    legend.title = element_text(),
    plot.title = element_text(face = "bold"))

Circular visualization of sleeping site observations throughout the year. Each tile corresponds to a record, positioned by day of year (angle) and moon illumination (radius and color). This layout highlights seasonal patterns under varying moonlight conditions.

moon <- moon %>%
  mutate(month = month(fecha, label = TRUE, abbr = TRUE))

ggplot(moon, aes(x = month, y = moon_fraction, fill = moon_fraction)) +
  geom_violin(scale = "width", fill = "gray90", color = NA, alpha = 0.5) +
  geom_jitter(shape = 21, color = "black", size = 2.5, stroke = 0.3, width = 0.2, alpha = 0.8) +
  scale_fill_viridis_c(
  option = "B",
  name = "Moon Phase",
  limits = c(0, 1),  
  breaks = c(0, 0.25, 0.5, 0.75, 1),
 labels = c("0.00", "0.25", "0.50", "0.75", "1.00"),
  guide = guide_colorbar(barheight = 12)) +
  coord_cartesian(ylim = c(0, 1)) +
  labs(
    title = "Seasonality of Sleeping Sites and Moon Illumination",
    x = "Month",
    y = "Moon Illumination Fraction") +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(face = "bold"),
    legend.title = element_text(face = "bold"),
    legend.text = element_text(size = 13),
    legend.position = "right" )

Distribution of moonlight conditions during sleeping site observations throughout the year. Each point corresponds to an observation, jittered for visibility, and colored by lunar illumination. Violin plots represent the monthly density of observations. Emojis in the legend indicate approximate moon phases (new to full).

3 Canopy Height

We used the high-resolution global canopy height dataset produced by Lang et al. (2022), derived from GEDI LiDAR observations and machine learning models, to extract vegetation structure information across the Iguazú National Park. The original raster (ETH_GlobalCanopyHeight_10m_2020_S27W057) provides canopy height estimates at a 10-meter spatial resolution, allowing for detailed landscape-scale ecological analysis.

In this section, we cropped the raster to focus on the Iguazú National Park area (between -55 and -54 longitude, and -26.5 to -25.3 latitude), and applied a vegetation-like green color gradient to better visualize canopy height variations. This visualization enables us to interpret habitat vertical structure and its potential influence on monkey sleeping site selection.

canopy <- rast("ETH_GlobalCanopyHeight_10m_2020_S27W057_Map.tif")

deep_zoom_ext <- ext(-54.5, -54.4, -25.75, -25.65)

greens <- colorRampPalette(c("#edf8e9", "#a1d99b", "#31a354", "#001709"))

plot(canopy, ext = deep_zoom_ext, col = greens(100),
     main = "Canopy Height, Iguazu National Park",
     axes = TRUE)

To explore the vertical structure of habitats associated with sleeping sites, we cropped the canopy height raster to a buffered extent that covers all recorded locations (±0.02° around minimum and maximum coordinates). This focused raster allows for better spatial correspondence between sleeping sites and vegetation structure.

We then overlaid all sleeping site locations (n = {nrow(moon)}), color-coded by group identity, on the canopy height map. The green gradient reflects vegetation height, with darker shades representing taller canopy. Each monkey group was assigned a distinctive color (e.g., gun in orange, gue in blue), allowing clear visual differentiation of their spatial patterns.

This visualization helps assess whether different groups use sleeping sites in areas of varying canopy height and can inform hypotheses regarding group-specific strategies for predator avoidance, thermoregulation, or social cohesion.

ext_moon <- ext(min(moon$lon)-0.02, max(moon$lon)+0.02,
                min(moon$lat)-0.02, max(moon$lat)+0.02)

canopy_zoomed <- crop(canopy, ext_moon)


group_cols <- c(
    "gun" = "darkorange3",
    "gue" = "midnightblue",
    "mac" = "goldenrod2",
    "spo" = "darkseagreen4")


moon$group <- as.character(moon$group)


plot(canopy_zoomed, col = greens(100),
     main = "Canopy Height Around Sleeping Sites",
     axes = TRUE)


with(moon, points(lon, lat,
                  pch = 21,
                  bg = group_cols[group], 
                  col = "black",          
                  cex = 1.2))

To quantify the vegetation structure at each sleeping site, we extracted canopy height values directly from the Lang et al. (2022) raster using the GPS coordinates of the sites. The sleeping site data were converted into spatial vectors (SpatVector) and projected to match the coordinate reference system (CRS) of the canopy raster.

This allowed us to assign a canopy height value to every sleeping location based on its exact position. The resulting distribution shows that most sites are associated with mid- to high-canopy areas, suggesting a potential preference for structurally complex or elevated sleeping locations.

The density plot below summarizes the canopy height values at all recorded sites. This forms the basis for comparing real sleeping site locations with randomly generated control points in later analyses.

moon_vect <- vect(moon, geom = c("lon", "lat"), crs = crs(canopy))
moon$canopy_height <- extract(canopy, moon_vect)[,2]  

ggplot(moon, aes(x = canopy_height)) +
  geom_density(fill = "darkseagreen", alpha = 0.6) +
  labs(
    title = "Density of Canopy Height at Sleeping Sites",
    x = "Canopy Height (meters)",
    y = "Density"
  ) +
  theme_minimal(base_size = 14)

Distribution of canopy height across observed sleeping sites. The density plot shows the overall tendency of monkeys to select sleeping locations at specific vertical strata within the forest structure.

Let’s check the layer error before moving forward

canopy_sd <- rast("ETH_GlobalCanopyHeight_10m_2020_S27W057_Map_SD.tif")

deep_zoom_ext <- ext(-54.5, -54.4, -25.75, -25.65)
canopy_sd_iguazu <- crop(canopy_sd, deep_zoom_ext)
greens <- colorRampPalette(c("#f7fcf5", "#a1d99b", "#298022", "#0f3a01"))

reds <- colorRampPalette(c("#edf8e9", "#e4aa82", "#ca6042", "#81240a"))

plot(canopy_sd, ext = deep_zoom_ext, col = reds(100),
     main = "Canopy Height Error, Iguazu National Park",
     axes = TRUE)

moon_vect <- vect(moon, geom = c("lon", "lat"), crs = crs(canopy_sd))
moon$canopy_height_sd <- extract(canopy_sd, moon_vect)[,2]

Removing observations with high SD on canopy height estimations

moon <- moon[moon$canopy_height_sd <= 6, ]

3.1 Canopy Height vs. Moonlight: Density Visualization

moon_vect <- vect(moon, geom = c("lon", "lat"), crs = crs(canopy))
moon$canopy_height <- extract(canopy, moon_vect)[,2]  

moon <- moon %>%
  mutate(moon_bin = cut(
    moon_fraction,
    breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
    labels = c("0–0.2", "0.2–0.4", "0.4–0.6", "0.6–0.8", "0.8–1"),
    include.lowest = TRUE
  ))


moon_palette <- c(
  "0–0.2" = "midnightblue",  
  "0.2–0.4" = "#2e5984", 
  "0.4–0.6" = "#7192be", 
  "0.6–0.8" = "#f2c849", 
  "0.8–1" = "#f29e02")

ggplot(moon, aes(x = canopy_height, fill = moon_bin, color = moon_bin)) +
  geom_density(alpha = 0.5, linewidth = 1.1) +
  labs(
    title = "Canopy Height at Sleeping Sites\nby Moonlight Fraction (5 bins)",
    x = "Canopy Height (meters)",
    y = "Density",
    fill = "Moon Fraction",
    color = "Moon Fraction") + 
  coord_cartesian(ylim = c(0, 0.6)) +
  
  scale_fill_manual(values = moon_palette) +
  scale_color_manual(values = moon_palette) +
  theme_minimal(base_size = 14)

$Distribution of canopy height across observed sleeping sites and across different moon fractions. The density plot shows the overall tendency of monkeys to select sleeping locations at specific vertical strata within the forest structure.$

Distribution of canopy height across observed sleeping sites and across different moon fractions. The density plot shows the overall tendency of monkeys to select sleeping locations at specific vertical strata within the forest structure.

This figure explores the potential relationship between lunar illumination and the vertical position of monkey sleeping sites. Each point represents a site, with jitter added to improve visibility, and background shading representing a 2D kernel density estimate.

The x-axis shows the fraction of the moon illuminated (from new moon at 0.0 to full moon at 1.0), while the y-axis indicates the canopy height (in meters) at the site. The color gradient (darker = higher density) reveals the most frequent combinations of moonlight and canopy structure.

This visualization helps assess whether sleeping sites under darker nights (low moon fraction) tend to occur in taller vegetation, which could suggest behavioral adaptation for predator avoidance or improved concealment during high-risk conditions.

ggplot(moon, aes(x = moon_fraction, y = canopy_height)) +
  geom_point(
    aes(fill = moon_fraction),
    shape = 21,
    color = "black",
    size = 3,
    stroke = 0.3,
    alpha = 0.85) +
  scale_fill_viridis_c(
    option = "B",
    name = "Moon Illum.",
    guide = guide_colorbar(barheight = 8),
    limits = c(0, 1),
    breaks = c(0, 0.25, 0.5, 0.75, 1),
    labels = c("Dark (0.00)", "Low", "Half (0.50)", "High", "Full (1.00)")
  ) +
  coord_cartesian(ylim = c(0, 40)) +
  labs(
    title = "Canopy Height vs. Lunar Illumination",
    x = "Moon Illumination Fraction",
    y = "Canopy Height (m)"
  ) +
  theme_minimal(base_size = 15) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.text = element_text(size = 11),
    legend.position = "right"
  )

4 Random Points Around Sleeping Sites

To assess whether monkeys are actively selecting taller canopy areas for sleeping under varying moonlight conditions, we generated random control points around each observed sleeping location. For every real site, we simulated 5 random points positioned within a radius of 400 meters, ensuring they remained within the monkeys’ likely perceptual or movement range for choosing a sleeping location. In the link Buffers and Absences we then simulated different combinations of false points per sleeping site and buffers size, trying to find the best option that balanced spatial autocorrelation and parameter stability in RSF betas.

We used the {geosphere} package to calculate these new coordinates based on random bearings (0–360°) and distances. The destPoint() function allows us to project a new coordinate a given distance and direction from a known point, taking into account the Earth’s curvature, a key advantage over simple Euclidean methods, especially in geospatial analysis.

These pseudo-absence points share the same date and lunar attributes as the original site, allowing for controlled comparisons while isolating the role of vegetation structure in sleeping site selection.

n_fake <- 5
min_dist <- 30  

set.seed(42)

pseudo_data <- moon %>%
  rowwise() %>%
  do({
    lat <- .$lat
    lon <- .$lon
    base_point <- c(lon, lat)

    fake_points <- list()
    attempts <- 0
    max_attempts <- 1000

    while (length(fake_points) < n_fake && attempts < max_attempts) {
      attempts <- attempts + 1
      
      d <- runif(1, 100, 400)
      b <- runif(1, 0, 360)
      new_point <- destPoint(p = base_point, b = b, d = d)  # lon, lat

      if (length(fake_points) == 0) {
        fake_points[[1]] <- new_point
      } else {
        existing <- do.call(rbind, fake_points)
        dists <- distGeo(new_point, existing)  

        if (all(dists > min_dist)) {
          fake_points[[length(fake_points) + 1]] <- new_point
        }
      }
    }

    coords <- do.call(rbind, fake_points)

    tibble(
      lon = coords[,1],
      lat = coords[,2],
      fecha = .$fecha,
      group = .$group,
      moon_fraction = .$moon_fraction,
      moon_phase_angle = .$moon_phase_angle,
      moon_phase_name = .$moon_phase_name,
      moon_visible = .$moon_visible,
      used = FALSE
    )
  }) %>%
  ungroup()

After generating pseudo-absence (control) points around each real sleeping site, we merged both datasets into a single unified table. The original sleeping sites were labeled with used = TRUE, while the randomly generated points were marked as used = FALSE.

Each entry, whether real or simulated, includes: - Spatial coordinates (lon, lat) - Group identity - Date of observation - Moon illumination metrics (moon_fraction, moon_phase_angle, etc.) - Whether the moon was visible that night - A usage indicator (used)

This combined dataset (moon_all) allows for controlled comparisons between actual monkey sleeping locations and randomly available but unused alternatives. These comparisons are central to testing whether canopy structure or lunar conditions significantly influence sleeping site choice.

moon_used <- moon %>%
  mutate(used = TRUE) %>%
  select(lon, lat, fecha, group, moon_fraction, moon_phase_angle,
         moon_phase_name, moon_visible, used)


moon_all <- bind_rows(moon_used, pseudo_data)

To enable grouped comparisons between used and available points for each sleeping event, we created a unique identifier (id) by combining monkey group and date (group_fecha). This identifier represents a single nightly sleeping site selection event for a given group.

Using dplyr::cur_group_id(), we assigned a numeric ID to each group-date combination, which is shared across the real sleeping site and its 10 associated random control points. This structure is essential for conditional logistic regression or matched comparisons, ensuring that each set of observations is analyzed within its ecological context.

What this script does This routine builds a combined dataset moon_all by appending, for each real location in moon, a set of spatially “fake” points that:
- are generated at a random bearing and distance from the real point,
- respect a minimum separation among themselves,
- are excluded if they fall inside a mask shapefile (rivers / buildings without trees), and
- inherit the original row’s contextual fields (date, group, moon variables), while being flagged as used = FALSE.

Why this is necessary? The fake points act as spatial controls for downstream analyses (e.g., comparing canopy height or NDVI around true versus pseudo locations) while avoiding non-vegetated or otherwise invalid areas defined by your mask.

Key steps 1. Load mask (ESRI Shapefile) with sf, ensure valid geometries, union, and transform to WGS84 (EPSG:4326).
2. Per-row generator: for each real point, repeatedly sample a random distance d (100–400 m) and bearing b, compute a candidate coordinate with geosphere::destPoint, and:
- discard if it intersects the mask (st_intersects),
- discard if it violates the minimum inter-fake distance (distGeo),
- optionally discard if too close to the base point (min_d_base).
3. Retry logic: keep sampling until reaching n_fake accepted points or max_attempts.
4. Bind outputs:
- moon_used = original data with used = TRUE (only common columns kept),
- pseudo_data = accepted fake points with used = FALSE,
- moon_all = bind_rows(moon_used, pseudo_data).

Assumptions & details - CRS: All geodesic computations (destPoint, distGeo) assume WGS84. The mask is transformed to EPSG:4326.
- Reproducibility: Set with set.seed(42).
- Performance: Mask is unioned once to a single geometry for fast intersection checks.
- Graceful failure: Some rows may produce fewer than n_fake points if constraints are tight.

n_fake      <- 5       
min_dist    <- 30      
min_d_base  <- 0       
max_attempts <- 1000   
set.seed(42)

mask_path <- "mask.shp"

mask_raw <- st_read(mask_path, quiet = TRUE)

mask_union <- mask_raw |>
  st_make_valid() |>
  st_union() |>
  st_transform(4326)  # WGS84 lon/lat


gen_fake_for_row <- function(lon, lat, fecha, group, moon_fraction, moon_phase_angle,
                             moon_phase_name, moon_visible) {
  if (is.na(lon) || is.na(lat)) return(tibble())

  base_point <- c(lon, lat)

  fake_points <- list()
  attempts <- 0

  while (length(fake_points) < n_fake && attempts < max_attempts) {
    attempts <- attempts + 1

    d <- runif(1, 100, 400)     
    b <- runif(1, 0, 360)       
    new_point <- destPoint(p = base_point, b = b, d = d) 

  
    if (min_d_base > 0) {
      if (distGeo(new_point, base_point) <= min_d_base) next
    }

    pt_sf <- st_sfc(st_point(new_point), crs = 4326)
    in_mask <- as.logical(st_intersects(pt_sf, mask_union, sparse = FALSE))
    if (in_mask) next

    if (length(fake_points) == 0) {
      fake_points[[1]] <- new_point
    } else {
      existing <- do.call(rbind, fake_points)  
      dists <- distGeo(new_point, existing)   
      if (all(dists > min_dist)) {
        fake_points[[length(fake_points) + 1]] <- new_point
      } else {
        next
      }
    }
  }

  if (length(fake_points) == 0) {
    return(tibble())
  }

  coords <- do.call(rbind, fake_points)
  tibble(
    lon = coords[, 1],
    lat = coords[, 2],
    fecha = fecha,
    group = group,
    moon_fraction = moon_fraction,
    moon_phase_angle = moon_phase_angle,
    moon_phase_name = moon_phase_name,
    moon_visible = moon_visible,
    used = FALSE
  )
}


pseudo_data <- moon %>%
  mutate(row_id__ = row_number()) %>%
  pmap_dfr(~ gen_fake_for_row(
    lon = ..1, lat = ..2, fecha = ..3, group = ..4,
    moon_fraction = ..5, moon_phase_angle = ..6,
    moon_phase_name = ..7, moon_visible = ..8
  ))


gen_count <- pseudo_data %>%
  count(fecha, group, name = "n_fake_generated")

if (any(gen_count$n_fake_generated < n_fake)) {
  message("Atención: algunas filas no alcanzaron a generar los ", n_fake,
          " puntos falsos dentro de ", max_attempts, " intentos (según restricciones).")
}


moon_used <- moon %>%
  mutate(used = TRUE) %>%
  select(lon, lat, fecha, group, moon_fraction, moon_phase_angle,
         moon_phase_name, moon_visible, used)

moon_all <- bind_rows(moon_used, pseudo_data)

moon_all <- moon_all %>%
  mutate(event_key = paste(group, fecha, sep = "_")) %>%
  group_by(event_key) %>%
  mutate(id = cur_group_id()) %>%
  ungroup() %>%
  select(-event_key)

4.1 Example of a Sleeping Site Selection Event

To illustrate how random control points are spatially distributed around an actual sleeping site, we visualized a single selection event (one group on one night). The focal site (orange) and its random alternatives (gray) are plotted on top of the canopy height raster, cropped to a ~260 m × 260 m window centered on the true location.

A dashed red circle indicates the 400-meter radius within which the control points were generated, while a horizontal arrow provides a visual scale reference (“400 m”). A north indicator is also included for orientation.

This visualization helps communicate the spatial logic behind the pseudo-absence design: real choices are compared to nearby alternatives that share the same lunar and temporal context, isolating the potential effect of vegetation structure.

target_id <- moon_all %>% filter(used) %>% slice(1) %>% pull(id)
subset_points <- moon_all %>% filter(id == target_id)

pt_real <- subset_points %>% filter(used) %>%
  select(lon, lat) %>% unlist()

zoom_ext <- ext(pt_real["lon"] - 0.0063, pt_real["lon"] + 0.0063,
                pt_real["lat"] - 0.0063, pt_real["lat"] + 0.0063)

canopy_focus <- crop(canopy, zoom_ext)

plot(canopy_focus, col = greens(100),
     main = paste("Zoomed Canopy Height - ID", target_id),
     axes = TRUE)

with(subset_points %>% filter(!used),
     points(lon, lat, pch = 21, bg = "gray80", col = "black", cex = 1.4))

with(subset_points %>% filter(used),
     points(lon, lat, pch = 21, bg = "darkorange", col = "black", cex = 1.8))

circle_100 <- destPoint(pt_real, b = seq(0, 360, length.out = 100), d = 400)
lines(circle_100, col = "darkred", lty = 3, lwd = 2)

scale_line <- destPoint(pt_real, b = 90, d = 400) 


x0 <- pt_real["lon"] 
y0 <- pt_real["lat"] + 0.0011 
x1 <- scale_line[1]
y1 <- y0

segments(x0, y0, x1, y1, col = "black", lwd = 2)

arrows(x0, y0, x1, y1, length = 0.05, angle = 90, code = 3, col = "black")

text(x = mean(c(x0, x1)), y = y0 - 0.001, labels = "400 m", cex = 0.9)

text(x = zoom_ext[1] + 0.0002, y = zoom_ext[4] - 0.0002, labels = "↑ N", cex = 1.1)

Figure: Example of a sleeping site selection event. The real sleeping site (orange) is surrounded by 10 randomly generated control points (gray), constrained within a 400-meter radius (red dashed circle). Background raster shows canopy height. A horizontal arrow denotes a 400 m scale bar, and a north indicator provides spatial orientation.

4.2 Contrasting Sleeping Site Events: Dark vs. Bright Moonlight

To visually compare sleeping site selection patterns under contrasting lunar conditions, we randomly selected 8 sleeping events: 4 occurred under low moon illumination (< 0.2 moon fraction) and 4 under high moon illumination (> 0.8). For each event, the real sleeping site (orange) and its associated pseudo-absence points (gray) are plotted within a 400-meter radius, overlaid on high-resolution canopy height data.

This side-by-side layout enables visual inspection of whether monkeys tend to select taller canopy during darker nights, possibly for predator avoidance or concealment. The consistent structure across panels, including a 400 m scale bar and north orientation, facilitates direct ecological comparison between conditions.

These visualizations set the stage for formal statistical testing of habitat selection as a function of moonlight and vegetation structure.

set.seed(135)

ids_dark <- moon_all %>%
  filter(used, moon_fraction < 0.2) %>%
  distinct(id) %>%
  slice_sample(n = 4) %>%
  pull(id)

ids_bright <- moon_all %>%
  filter(used, moon_fraction > 0.8) %>%
  distinct(id) %>%
  slice_sample(n = 4) %>%
  pull(id)

ids_selected <- c(ids_dark, ids_bright)
breaks_fixed <- seq(10, 26, by = 2)
cols_fixed <- greens(length(breaks_fixed) - 1)
par(mfrow = c(2, 4), mar = c(3, 3, 2, 1), cex = 1.2)

for (target_id in ids_selected) {
  
  subset_points <- moon_all %>% filter(id == target_id)
  pt_real <- subset_points %>% filter(used) %>%
    select(lon, lat) %>% unlist()
  
  zoom_ext <- ext(pt_real["lon"] - 0.0053, pt_real["lon"] + 0.0053,
                  pt_real["lat"] - 0.0053, pt_real["lat"] + 0.0053)
  
  canopy_focus <- crop(canopy, zoom_ext)
  
  plot(canopy_focus,
     col = cols_fixed,
     breaks = breaks_fixed,
     main = paste0("ID ", target_id,
                   "\nMoon: ", round(unique(subset_points$moon_fraction), 2)),
     axes = FALSE,
     legend = TRUE)
  
  with(subset_points %>% filter(!used),
       points(lon, lat, pch = 21, bg = "gray80", col = "black", cex = 1.6))
  
  with(subset_points %>% filter(used),
       points(lon, lat, pch = 21, bg = "darkorange", col = "black", cex = 1.6))
  
  circle_100 <- destPoint(pt_real, b = seq(0, 360, length.out = 100), d = 400)
  lines(circle_100, col = "darkred", lty = 3, lwd = 4)
  
  x0 <- pt_real["lon"] + 0.0053
  y0 <- pt_real["lat"] + 0.0062
  scale_line <- destPoint(c(x0, y0), b = 90, d = 400)
  x1 <- scale_line[1]
  y1 <- y0
  
  segments(x0, y0, x1, y1, col = "black", lwd = 2)
  arrows(x0, y0, x1, y1, length = 0.05, angle = 90, code = 3, col = "black")
  text(mean(c(x0, x1)), y0 - 0.0005, "400 m")
  
  text(x = zoom_ext[1] + 0.0002, y = zoom_ext[4] - 0.0002, labels = "↑ N")
}

Examples of sleeping site selection events under contrasting moonlight conditions. Each panel shows a real sleeping site (orange) and 10 random control points (gray) within a 100 m radius (red dashed circle), overlaid on local canopy height. Four events occurred during low moon illumination (< 0.2), and four during high illumination (> 0.8). A 100-meter scale bar and north indicator are included for spatial reference.

5 Effective Moonlight Calculation

To better capture the ecological relevance of nocturnal lunar illumination at sleeping sites, we computed an index of effective moonlight, combining two key astronomical components:

Moon illumination fraction: the proportion of the moon’s disc that is lit.
Moon altitude: its elevation above the horizon, measured at a standardized hour… in this case we used three moments during the night.

We used the suncalc package to calculate moon altitude for each observation, and then estimated effective moonlight as:

\[ \text{moon light effective} = \text{moon_fraction} \times max(0, \sin(\text{moon_altitude})) \]

This formulation ensures that only moons above the horizon contribute to illumination (values below 0 are truncated), and that altitude amplifies the effect of the moon’s light, since a higher moon faces less atmospheric scattering and attenuation.

From an ecological perspective, this metric serves as a biologically meaningful proxy for perceived lunar brightness, which may influence behavioral decisions like sleeping site selection, foraging activity, or predator avoidance among nocturnal or crepuscular species.

To maintain consistency across observations and ensure comparability, we calculate the effective moonlight at three fixed time 20:00 PM (of the previous day) UTC, 00:00 AM and 04:00 AM each night. This standardized time represents a midpoint when nocturnal primates are typically active and when the moon, if present, is often near its zenith. Using a single reference hour simplifies comparisons by avoiding variation due to changing night lengths, moon rise/set times, or unequal sampling times across dates.

Although this approach does not capture the complete nightly dynamics of lunar brightness, it effectively reflects relative nighttime light conditions across the lunar cycle and seasons. The moon’s altitude—and thus light intensity—varies seasonally (with higher maximum elevations in winter) and cyclically (across lunar phases), so comparing a consistent hour still captures biologically relevant variation at international and annual scales.

Several ecological studies have employed similar approximations. For example, research on black swifts and small songbirds found that both the illuminated fraction and altitude of the moon are strong predictors of nocturnal behavior (e.g., Prinz et al. 2025. In predator–prey systems, models using moon phase and elevation have illuminated habitat selection shifts in felids like ocelots and bobcats (Sergeyev et al. 2023).

We calculated moon altitude at a fixed time (~ 02:00 AM UTC) to standardize our measure of nocturnal lunar illumination. This approach helps us compare different nights and sites under consistent conditions—critical when moonrise, moonset, and night length vary seasonally and by phase.

While this does not capture the full nightly variation in moonlight, it effectively reflects relative illumination across the lunar cycle and seasonal changes in the moon’s elevation. For instance, Śmielak (2023) demonstrated that models combining moon phase and altitude explain > 92% of the variability in actual ground-level moonlight, outperforming phase alone (∼ 60%)

That said, some studies use moon altitude and phase dynamically throughout the night. To better approximate overall lunar illumination, one could sample multiple time points (e.g. 20:00, 00:00, and 04:00 local time) or compute a nightly average of moon_altitude and illumination.

This alternative method—sampling at multiple night-time intervals, would provide a more complete picture of illumination dynamics, especially in studies interested in behavior spanning the entire night. However, it also increases computational complexity and may require integrating over longer periods.

moon_all <- moon_all %>%
  mutate(
    fecha_a = as.POSIXct(paste(fecha - 1, "20:00:00"), tz = "UTC"),
    fecha_b = as.POSIXct(paste(fecha, "00:00:00"), tz = "UTC"),
    fecha_c = as.POSIXct(paste(fecha, "04:00:00"), tz = "UTC"))

pos_a <- getMoonPosition(data = moon_all %>% rename(date = fecha_a) %>% select(date, lat, lon))
pos_b <- getMoonPosition(data = moon_all %>% rename(date = fecha_b) %>% select(date, lat, lon))
pos_c <- getMoonPosition(data = moon_all %>% rename(date = fecha_c) %>% select(date, lat, lon))

alt_scaled_a <- pmax(0, sin(pos_a$altitude))
alt_scaled_b <- pmax(0, sin(pos_b$altitude))
alt_scaled_c <- pmax(0, sin(pos_c$altitude))

moon_all <- moon_all %>%
  mutate(
    moon_altitude_avg = (alt_scaled_a + alt_scaled_b + alt_scaled_c) / 3,
    moon_light_effective_avg = moon_fraction * moon_altitude_avg)

moon_all <- moon_all %>%
  filter(used ==TRUE, moon_light_effective_avg <= 0.6) %>% 
  mutate(moon_bin = cut(
    moon_light_effective_avg,
    breaks = c(0, 0.02, 0.05, 0.1, 0.2, 0.4, 0.6),
    labels = c("0–0.02", "0.02–0.05", "0.05–0.1", "0.1–0.2", "0.2–0.4", "0.4–0.6"),
    include.lowest = TRUE
  ))

moon_palette <- c(
  "0–0.02"    = "#000000",
  "0.02–0.05" = "#303030",
  "0.05–0.1"  = "#5c5c5c",
  "0.1–0.2"   = "#a0522d",
  "0.2–0.4"   = "#d2691e",
  "0.4–0.6"   = "#ffae00"
)

p_main <- ggplot(moon_all, aes(x = canopy_height, y = moon_bin, fill = moon_bin)) +
  geom_density_ridges(scale = 3, alpha = 0.9, color = "white", size = 0.3) +
  scale_fill_manual(values = moon_palette, drop = FALSE) +
  labs(
    x = "Canopy Height (m)",
    y = " "
  ) +
  theme_minimal(base_size = 14) +
  theme(
    legend.position = "none",
    axis.text.y = element_blank(),
    plot.margin = margin(8, 60, 8, 0))

grad_df <- data.frame(
  y = seq(0.05, 0.7, length.out = 400),
  x = 1
)

grad_plot <- ggplot(grad_df, aes(x = x, y = y, fill = y)) +
  geom_tile(width = 0.04) +  # más delgado
  scale_fill_gradientn(
    colours = c("#000000", "#303030", "#5c5c5c", "#a0522d", "#ffae00", "#ffae00"),
    limits = c(0, 0.8),
    guide = "none"
  ) +
  scale_y_continuous(
    breaks = seq(0.1, 0.7, by = 0.2),
    labels = c("0.1", "0.3", "0.5", "0.7"),
    limits = c(0.05, 0.75),
    expand = c(0, 0)
  ) +
  theme_void() +
  annotate("text", x = 0.96, y = 0.4, label = "Moonlight", angle = 90, size = 5.2,
           hjust = 0.5) +
  
  annotate("text", x = 1.05, y = seq(0.1, 0.7, by = 0.1),
           label = c("0", "0.1", "0.2", "0.3", "0.4", "0.5", "0.6"), 
           hjust = 1, size = 3.9) +
  theme(
    plot.margin = margin(10, 10, 10, 15))

final_plot <- grad_plot + p_main + plot_layout(widths = c(1.3, 6.2))

final_plot

$Distribution of canopy heights selected for sleeping across a gradient of moonlight intensity. Each horizontal density ridge represents the distribution of canopy heights used as roosts at different moonlight intervals (fraction of moonlight: 0 to 0.6). Color shading reflects increasing moonlight intensity, from darkest (black, left) to brightest (gold, right). A vertical color bar summarizes the moonlight gradient used for binning.$

Distribution of canopy heights selected for sleeping across a gradient of moonlight intensity. Each horizontal density ridge represents the distribution of canopy heights used as roosts at different moonlight intervals (fraction of moonlight: 0 to 0.6). Color shading reflects increasing moonlight intensity, from darkest (black, left) to brightest (gold, right). A vertical color bar summarizes the moonlight gradient used for binning.

moon_summary <- moon_all %>%
  filter(used == TRUE, moon_light_effective_avg <= 0.6) %>% 
  group_by(moon_phase_name) %>%
  summarise(
    Mean_Height = mean(canopy_height, na.rm = TRUE),
    SD_Height   = sd(canopy_height, na.rm = TRUE),
    IQR_Height  = IQR(canopy_height, na.rm = TRUE),
    Min_Height  = min(canopy_height, na.rm = TRUE),
    Max_Height  = max(canopy_height, na.rm = TRUE),
    Range_Height = Max_Height - Min_Height,
    
    Mean_NDVI = mean(ndvi_sentinel, na.rm = TRUE),
    SD_NDVI   = sd(ndvi_sentinel, na.rm = TRUE),
    IQR_NDVI  = IQR(ndvi_sentinel, na.rm = TRUE),
    Min_NDVI  = min(ndvi_sentinel, na.rm = TRUE),
    Max_NDVI  = max(ndvi_sentinel, na.rm = TRUE),
    Range_NDVI = Max_NDVI - Min_NDVI,
    
    N = n(),
    .groups = "drop"
  ) %>%
  mutate(across(where(is.numeric), ~round(., 2)))

moon_ft <- flextable(moon_summary) %>%
  set_header_labels(
    moon_phase_name = "Moon Phase",
    Mean_Height = "Mean Height (m)",
    SD_Height = "SD Height",
    IQR_Height = "IQR Height",
    Min_Height = "Min Height (m)",
    Max_Height = "Max Height (m)",
    Range_Height = "Range Height (m)",
    
    Mean_NDVI = "Mean NDVI",
    SD_NDVI = "SD NDVI",
    IQR_NDVI = "IQR NDVI",
    Min_NDVI = "Min NDVI",
    Max_NDVI = "Max NDVI",
    Range_NDVI = "Range NDVI",
    
    N = "n"
  ) %>%
  autofit() %>%
  theme_booktabs() %>%
  align(align = "center", part = "all")

moon_ft

Moon Phase	Mean Height (m)	SD Height	IQR Height	Min Height (m)	Max Height (m)	Range Height (m)	Mean NDVI	SD NDVI	IQR NDVI	Min NDVI	Max NDVI	Range NDVI	n
Full Moon	17.27	3.14	4	7	23	16	0.54	0.08	0.06	0.12	0.65	0.53	67
New Moon	24.22	1.54	2	21	28	7	0.56	0.07	0.05	0.18	0.67	0.48	87
Waning	21.25	3.71	6	12	29	17	0.56	0.04	0.05	0.42	0.63	0.22	114
Waxing	21.05	4.11	6	5	28	23	0.54	0.12	0.08	-0.14	0.66	0.80	113

moon_summary <- moon_all %>%
  filter(used == TRUE, moon_light_effective_avg <= 0.6) %>%
  group_by(moon_bin) %>%
  summarise(
    # Altura
    Mean_Height  = mean(canopy_height, na.rm = TRUE),
    SD_Height    = sd(canopy_height, na.rm = TRUE),
    IQR_Height   = IQR(canopy_height, na.rm = TRUE),
    Min_Height   = min(canopy_height, na.rm = TRUE),
    Max_Height   = max(canopy_height, na.rm = TRUE),
    Range_Height = max(canopy_height, na.rm = TRUE) - min(canopy_height, na.rm = TRUE),

    # NDVI
    Mean_NDVI  = mean(ndvi_sentinel, na.rm = TRUE),
    SD_NDVI    = sd(ndvi_sentinel, na.rm = TRUE),
    IQR_NDVI   = IQR(ndvi_sentinel, na.rm = TRUE),
    Min_NDVI   = min(ndvi_sentinel, na.rm = TRUE),
    Max_NDVI   = max(ndvi_sentinel, na.rm = TRUE),
    Range_NDVI = max(ndvi_sentinel, na.rm = TRUE) - min(ndvi_sentinel, na.rm = TRUE),

    N = n(),
    .groups = "drop"
  ) %>%
  mutate(across(where(is.numeric), ~ round(., 2)))

moon_ft <- flextable(moon_summary) %>%
  set_header_labels(
    moon_bin = "Moonlight bin",
    Mean_Height  = "Mean Height (m)",
    SD_Height    = "SD Height",
    IQR_Height   = "IQR Height",
    Min_Height   = "Min Height (m)",
    Max_Height   = "Max Height (m)",
    Range_Height = "Range Height (m)",
    Mean_NDVI  = "Mean NDVI",
    SD_NDVI    = "SD NDVI",
    IQR_NDVI   = "IQR NDVI",
    Min_NDVI   = "Min NDVI",
    Max_NDVI   = "Max NDVI",
    Range_NDVI = "Range NDVI",
    N = "n"
  ) %>%
  autofit() %>%
  theme_booktabs() %>%
  align(align = "center", part = "all")

moon_ft

Moonlight bin	Mean Height (m)	SD Height	IQR Height	Min Height (m)	Max Height (m)	Range Height (m)	Mean NDVI	SD NDVI	IQR NDVI	Min NDVI	Max NDVI	Range NDVI	n
0–0.02	23.86	2.08	2.00	15	29	14	0.57	0.06	0.05	0.18	0.67	0.48	145
0.02–0.05	22.64	2.97	2.00	15	26	11	0.57	0.04	0.05	0.47	0.62	0.14	25
0.05–0.1	20.59	3.45	6.00	13	26	13	0.56	0.05	0.07	0.45	0.65	0.21	29
0.1–0.2	21.08	3.90	6.25	12	27	15	0.57	0.05	0.05	0.43	0.65	0.22	52
0.2–0.4	18.33	4.19	5.00	5	28	23	0.52	0.12	0.07	-0.14	0.66	0.80	84
0.4–0.6	17.54	2.75	4.00	13	23	10	0.53	0.10	0.09	0.25	0.65	0.41	46

First lets ckeck for a simpler moonlight metric: moon fraction, then the effective moonlight

ggplot(moon_all, aes(x = moon_fraction, y = canopy_height)) +
  geom_jitter(
    aes(fill = used),
    shape = 21, size = 3, stroke = 0.3, 
    alpha = 0.05, color = "black",
    width = 0.03,
    show.legend = TRUE) +
  geom_smooth(
    aes(color = used),
    method = "lm",
    se = TRUE,
    size = 1.9
  ) +
  scale_fill_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c( "Available", "Used"),
    name = "Points"
  ) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c( "Available", "Used"),
    name = "Trend"
  ) +
  guides(
    fill = guide_legend(override.aes = list(alpha = 1)),
    color = guide_legend(override.aes = list(size = 1.5))
  ) +
  labs(
    x = "Moon Fraction",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(size = 15, face = "bold"),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

ggplot(moon_all, aes(x = moon_light_effective_avg, y = canopy_height)) +
  geom_jitter(
    aes(fill = used),
    shape = 21, size = 3, stroke = 0.3, 
    alpha = 0.05, color = "black",
    width = 0.03,
    show.legend = TRUE) +
  geom_smooth(
    aes(color = used),
    method = "lm",
    se = TRUE,
    size = 1.9
  ) +
  scale_fill_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c( "Available", "Used"),
    name = "Points"
  ) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c( "Available", "Used"),
    name = "Trend"
  ) +
  guides(
    fill = guide_legend(override.aes = list(alpha = 1)),
    color = guide_legend(override.aes = list(size = 1.5))
  ) +
  labs(
    title = "Effective Moonlight vs. Canopy Height",
    x = "Effective Moonlight Index",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(size = 15, face = "bold"),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

$Relationship between effective moonlight and canopy height across sleeping site observations. Points represent both used (orange) and available (blue-gray) locations, while lines show linear trends per group. The effective moonlight index combines lunar illumination fraction and altitude to approximate actual night-time light availability.$

Relationship between effective moonlight and canopy height across sleeping site observations. Points represent both used (orange) and available (blue-gray) locations, while lines show linear trends per group. The effective moonlight index combines lunar illumination fraction and altitude to approximate actual night-time light availability.

ggplot(moon_all %>% filter(used == TRUE), aes(x = moon_light_effective_avg, y = canopy_height)) +
  geom_point(
    aes(fill = used),
    shape = 21, size = 5, stroke = 0.3,
    alpha = 0.1, color = "goldenrod4",
    show.legend = TRUE
  ) +
  geom_smooth(
    aes(color = used),
    method = "lm",
    se = TRUE,
    size = 2.9
  ) +
  scale_fill_manual(
    values = c("TRUE" = "#E1B000"),
    labels = c("Used"),
    name = "Points"
  ) +
  scale_color_manual(
    values = c("TRUE" = "goldenrod4"),
    labels = c("Used"),
    name = "Trend"
  ) +
  guides(
    fill = guide_legend(override.aes = list(alpha = 1)),
    color = guide_legend(override.aes = list(size = 1.5))
  ) +
  coord_cartesian(ylim = c(0, 30)) +
  labs(
    title = "Effective Moonlight vs. Canopy Height",
    x = "Effective Moonlight Index",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(size = 15, face = "bold"),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

6 Weather

To incorporate environmental context into our analysis of sleeping site selection, we obtained daily weather data from a meteorological station operated by the Argentine National Meteorological Service (SMN). This station is located approximately 7 km from the study area and provides reliable long-term climatic records.

The dataset includes maximum, minimum, and mean daily temperatures, as well as daily precipitation (mm). We parsed the date field using the lubridate::dmy() function and converted all numeric fields to proper numeric types for further analysis. These data allow us to explore whether rainfall patterns influence the characteristics of sleeping site selection, particularly with regard to vegetation structure and canopy cover.

As precipitation may affect microhabitat conditions, such as humidity, substrate moisture, and thermal regulation, it is plausible that primates modulate their choice of sleeping sites in response to recent rainfall events.

weather <- read.csv("clima.csv")

weather <- weather %>%
  mutate(
    fecha = dmy(fecha),  # formato "d/m/Y"
    t_max = as.numeric(t_max),
    t_min = as.numeric(t_min),
    t_mean = as.numeric(t_mean),
    pp = as.numeric(pp)
  )

6.1 Weather on Sleeping Site

To investigate how recent rainfall may influence sleeping site structure, we joined the weather dataset to our spatial records using the fecha (date) field as a key. Specifically, we added the daily precipitation amount (pp, in mm) to the moon_all dataset:

moon_all <- moon_all %>%
  left_join(weather %>% select(fecha, pp), by = "fecha")

We then categorized daily rainfall into three meaningful bins: no rain (0 mm), light rain (1–5 mm), and heavy rain (>5 mm). This allowed us to visually explore whether rainfall levels interact with canopy height and effective moonlight in driving sleeping site selection.

The resulting plot combines:

Jittered points, filled by rainfall category, to represent variation in conditions across nights
Separate linear trends for used and available sites, helping assess how rainfall may modulate site choice in different light environments

Together, this visualization provides insight into whether monkeys prefer different canopy structures under varying light and moisture conditions, potentially reflecting thermoregulatory or safety tradeoffs.

moon_all <- moon_all %>%
  left_join(weather %>% select(fecha, pp), by = "fecha")

moon_all <- moon_all %>%
  mutate(pp_cat = cut(
    pp,
    breaks = c(-0.01, 0, 5, Inf),
    labels = c("0 mm", "1–5 mm", ">5 mm")
  ))

ggplot(moon_all, aes(x = moon_light_effective_avg, y = canopy_height)) +
  geom_jitter(
    aes(fill = pp_cat),
    shape = 21, size = 3, stroke = 0.3,
    alpha = 0.5, color = "black", width = 0.03
  ) +
  geom_smooth(
    aes(color = used),
    method = "lm", se = TRUE, size = 1.6
  ) +
  scale_fill_manual(
    values = c("0 mm" = "gray80", "1–5 mm" = "#6baed6", ">5 mm" = "steelblue4"),
    name = "Precipitation"
  ) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Used", "Available"),
    name = "Trend"
  ) +
  labs(
    title = "Canopy Height vs. Effective Moonlight and Precipitation",
    x = "Effective Moonlight Index",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 15),
    legend.title = element_text(face = "bold")
  )

Figure: Relationship between canopy height and effective moonlight under varying precipitation levels. Points are colored by precipitation categories recorded on the same date: no rain (gray), light rain (blue), and moderate to heavy rain (dark blue). Trend lines show linear patterns for used (yellow) and available (blue) locations. While moonlight may influence site selection, canopy height patterns do not appear strongly driven by precipitation.

ggplot(moon_all, aes(x = pp, y = canopy_height, color = used)) +
  geom_jitter(
    shape = 21, fill = "#6baed6", color = "black",
    alpha = 0.1, width = 0.3, size = 2.6
  ) +
  geom_smooth(method = "lm", se = TRUE, size = 2.5) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Used", "Available"),
    name = "Trend"
  ) +
  labs(
    title = "Canopy Height vs. Precipitation",
    x = "Precipitation (mm)",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.title = element_text(face = "bold")
  )

Figure: Relationship between daily precipitation and canopy height at sleeping sites. Each point represents a used (yellow) or available (blue) location on a given date, with precipitation recorded at a nearby weather station. Smoother lines represent linear trends for each category. Although precipitation varies substantially, no strong association with canopy height is evident across categories.

7 NDVI

Breif short summary to understand why is importnate to include NDVI as covariate. In this analysis, we aim to understand the sleeping site selection of capuchin monkeys in relation to moon illumination. Previous analyses used a canopy height map (Lang et al., 2022) as a static layer representing forest structure. However, to better understand how light penetrates the forest canopy—especially under varying lunar conditions it is essential to incorporate a dynamic vegetation indicator: the Normalized Difference Vegetation Index (NDVI).

NDVI provides a proxy for vegetation density and health, which directly affects canopy cover and thus the amount of moonlight reaching the forest floor. Because NDVI varies across seasons and years (e.g., due to leaf phenology, drought, or other disturbances), using a time-matched NDVI raster for each sleeping location allows us to better model the light environment experienced by the monkeys on each night.

To accomplish this, we are:

Downloading multiple Sentinel-2 L2A images (surface reflectance, 10 m resolution) from the Copernicus Data Space Environment, covering our full study area.

Selecting images with low cloud cover (≤10%), spanning across different seasons and years that match the range of our sampling dates.

Using the terra package in R to calculate NDVI from the red (B4) and near-infrared (B8) bands for each image.

Assigning each monkey sleeping site (with known date and location) to the closest NDVI raster in time, so that the vegetation index reflects the actual conditions on the night of observation.

This will allow us to quantify how vegetation density (NDVI) relates to moonlight penetration, providing a more complete picture of the ecological and structural factors that influence sleeping site selection in capuchin monkeys.

Using Sentinel-2 Imagery to Estimate NDVI in Winter

To estimate vegetation density during the winter season, we use Sentinel-2 Level-2A imagery from the Copernicus Data Space. These images provide surface reflectance values corrected for atmospheric effects and are organized in a .SAFE folder structure that contains all the spectral bands at different spatial resolutions.

For NDVI (Normalized Difference Vegetation Index) calculations, two key bands are required:

B04 – Red band (665 nm), 10-meter resolution
B08 – Near-infrared (NIR) band (842 nm), 10-meter resolution

Inside each .SAFE product, these bands are located under the following path:

GRANULE/L2A_Txxxxxxx/IMG_DATA/R10m/

In this analysis, we use the image:

T21JYM_20220709T133839_B04_10m.jp2
T21JYM_20220709T133839_B08_10m.jp2

This image corresponds to July 9, 2022, and is used as a reference for winter NDVI conditions in our study area.

The NDVI is calculated as:

NDVI = (B08 - B04) / (B08 + B04)

Higher NDVI values typically indicate more photosynthetically active vegetation, while lower values may correspond to sparse vegetation, bare ground, or shadows. This seasonal NDVI will later be matched with monkey sleeping locations to assess the potential effect of vegetation density on moonlight exposure.

Note: this is an example, In the link Buffers and Absences I added the full script looping on all rasters to match the right one with the closer date.

b04_winter <- rast("T21JYM_20220709T133839_B04_10m.jp2")
b08_winter <- rast("T21JYM_20220709T133839_B08_10m.jp2")
ndvi_winter <- (b08_winter - b04_winter) / (b08_winter + b04_winter)

## |---------|---------|---------|---------|=========================================                                          |---------|---------|---------|---------|=========================================                                          |---------|---------|---------|---------|=========================================

Ok, autumn

b04_aut <- rast("T21JYM_20160317T134032_B04_10m.jp2")
b08_aut <- rast("T21JYM_20160317T134032_B08_10m.jp2")
ndvi_aut <- (b08_aut - b04_aut) / (b08_aut + b04_aut)

## |---------|---------|---------|---------|=========================================                                          |---------|---------|---------|---------|=========================================                                          |---------|---------|---------|---------|=========================================

summer

b04_sum <- rast("T21JYM_20221201T133841_B04_10m.jp2")
b08_sum <- rast("T21JYM_20221201T133841_B08_10m.jp2")
ndvi_sum <- (b08_sum - b04_sum) / (b08_sum + b04_sum)

## |---------|---------|---------|---------|=========================================                                          |---------|---------|---------|---------|=========================================                                          |---------|---------|---------|---------|=========================================

7.1 Extracting Sentinel-2 NDVI

To evaluate how vegetation density (as measured by NDVI) may influence moonlight penetration at monkey sleeping sites, we extracted NDVI values from seasonal Sentinel-2 images for each observation point.

7.2 Step 1: Calculating NDVI from Sentinel-2 Bands

We used Level-2A Sentinel-2 imagery downloaded from the Copernicus Data Space, which provides surface reflectance values at 10-meter resolution. For each season, we selected a representative image with low cloud cover and calculated NDVI using the formula:

\[ NDVI = \frac{B08 - B04}{B08 + B04} \]

B04 (Red band): 665 nm
B08 (Near-infrared band): 842 nm

This computation results in a continuous raster layer of NDVI values, ranging from -1 to 1, where higher values indicate denser and more photosynthetically active vegetation.

7.3 Step 2: Assigning a Season to Each Observation

We classified each observation (sleeping site) into one of four seasons based on the date:

Summer: December to February
Autumn: March to May
Winter: June to August
Spring: September to November

This allowed us to match each observation to the appropriate seasonal NDVI raster.

7.4 Step 3: Extracting NDVI

For each group of observations (by season), we:

Selected the appropriate NDVI raster (e.g., ndvi_winter).
Reprojected the observation coordinates to match the raster’s coordinate reference system (CRS), ensuring spatial alignment.
Used terra::extract() to obtain the NDVI value at each point location.

The final result is a new column in the moon_all dataset called ndvi_sentinel, containing the NDVI value from the seasonal image most relevant to each sleeping site.

This approach ensures that each data point is associated with realistic, time-matched vegetation structure, which may modulate light conditions under moonlit nights—an important factor in understanding sleeping site selection by capuchin monkeys.

Here we will change the data in order to align them with the correct Sentinel image, approximation by season, but could be by month or week (Sentinel take the image every 5 days).

moon_all <- moon_all %>%
  mutate(season = case_when(
    month(fecha) %in% c(12, 1, 2)  ~ "summer",
    month(fecha) %in% c(3, 4, 5)   ~ "autumn",
    month(fecha) %in% c(6, 7, 8)   ~ "winter",
    month(fecha) %in% c(9, 10, 11) ~ "spring"
  ))

names(ndvi_winter) <- "ndvi"
names(ndvi_aut)    <- "ndvi"
names(ndvi_sum)    <- "ndvi"

moon_all$ndvi_sentinel <- NA_real_

for (season_name in c("winter", "autumn", "spring", "summer")) {
  
  idx <- which(moon_all$season == season_name)
  moon_sub <- moon_all[idx, ]

  ndvi_raster <- switch(season_name,
                        "winter" = ndvi_winter,
                        "autumn" = ndvi_aut,
                        "spring" = ndvi_sum,
                        "summer" = ndvi_sum)
  
  vect_sub <- vect(moon_sub, geom = c("lon", "lat"), crs = "EPSG:4326") %>%
    project(crs(ndvi_raster))
  
  ndvi_vals <- extract(ndvi_raster, vect_sub)[, 2]
  
  moon_all$ndvi_sentinel[idx] <- ndvi_vals
}

7.5 Reprojecting NDVI Raster

To extract NDVI values only within the region where capuchin monkey sleeping sites were observed, we needed to align spatial coordinates and limit the raster to a meaningful extent. This involves a few key steps:

1. Convert point data into a spatial object

The moon dataset contains geographic coordinates (longitude/latitude).
We convert it into a SpatVector using EPSG:4326 (WGS84).
Then, we reproject these points to match the coordinate reference system of the Sentinel-2 NDVI raster (ndvi_winter), which is typically in UTM.
We calculate the bounding box of the projected sleeping site points.
A buffer of 1000 meters is added to ensure the cropped raster covers slightly beyond the observation points.

bbox_vect <- vect(moon, geom = c("lon", "lat"), crs = "EPSG:4326") %>%
  project(crs(ndvi_winter))

ext_ndvi <- ext(bbox_vect) + 1000  

ndvi_zoomed <- crop(ndvi_winter, ext_ndvi)
ndvi_wgs84 <- project(ndvi_winter, "EPSG:4326")
ndvi_zoomed <- crop(ndvi_wgs84, ext_moon)

png("canopy_ndvi_panel.png", width = 1800, height = 900, res = 200)

canopy_colors <- terrain.colors(100)  

par(mfrow = c(1, 2), mar = c(4, 4, 3, 2))

plot(canopy_zoomed, col = canopy_colors,
     main = "Canopy Height",
     axes = TRUE)

with(moon, points(lon, lat,
                  pch = 21,
                  bg = group_cols[group],
                  col = "black",
                  cex = 1.2))

ndvi_colormap <- rev(viridis(100))  

plot(ndvi_zoomed, col = ndvi_colormap,
     main = "NDVI (Winter Sentinel-2)",
     axes = TRUE)

with(moon, points(lon, lat,
                  pch = 21,
                  bg = group_cols[group],
                  col = "black",
                  cex = 1.2))

par(mfrow = c(1, 1)) 
dev.off()

Similar plot to the one in precipitations, now we are categorizing by NDVI index.

moon_all <- moon_all %>%
  mutate(ndvi_cat = ntile(ndvi_sentinel, 3))  # terciles

ggplot(moon_all, aes(x = moon_light_effective_avg, y = canopy_height)) +
  geom_jitter(
    aes(fill = used),
    shape = 21, size = 3, stroke = 0.3,
    alpha = 0.05, color = "black",
    width = 0.03,
    show.legend = TRUE
  ) +
  geom_smooth(
    aes(color = used),
    method = "lm",
    se = TRUE,
    size = 1.9
  ) +
  facet_wrap(~ ndvi_cat, labeller = label_both) +
  scale_fill_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Available", "Used"),
    name = "Points"
  ) +
  scale_color_manual(
    values = c("TRUE" = "#E1B000", "FALSE" = "#1F3552"),
    labels = c("Available", "Used"),
    name = "Trend"
  ) +
  labs(
    title = "Moonlight vs. Canopy Height across NDVI Levels",
    x = "Effective Moonlight Index",
    y = "Canopy Height (m)"
  ) +
  theme_classic(base_size = 14) +
  theme(
    plot.title = element_text(size = 15, face = "bold"),
    axis.title = element_text(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

8 Sleeping site alternation

Sleeping-site alternation refers to the tendency of primate groups to change their sleeping location from one night to the next. This behaviour has been proposed as an adaptive strategy to reduce two major risks: predation and parasite accumulation. By frequently moving between sleeping sites, primates may avoid leaving predictable patterns that could be exploited by predators, and may also reduce exposure to faecal contamination, ectoparasites, or pathogen build-up in a single location.

Testing how sleeping-site alternation varies with ecological factors, such as moonlight or lunar phase, can reveal whether visibility at night influences the trade-off between site fidelity and movement. For example, brighter nights may improve predator detection, potentially allowing more frequent reuse of sleeping sites, whereas darker nights may encourage alternation to maintain unpredictability and hygiene.

alternation <- moon_all %>% filter (used == TRUE)
alt2 <- alternation %>% 
  arrange(group, fecha) %>% 
  group_by(group) %>% 
  mutate(lon_next   = lead(lon),
         lat_next   = lead(lat),
         fecha_next = lead(fecha),
         days_between = as.integer(fecha_next - fecha),
         dist_m = ifelse(!is.na(lon_next),
                         distHaversine(cbind(lon, lat), cbind(lon_next, lat_next)),
                         NA_real_)) %>% 
  ungroup()
alt_daily <- alt2 %>% filter(days_between == 1)
reuse_threshold <- 10

phase_levels <- c("New Moon","Waxing","Full Moon","Waning",
                  "Waxing Crescent","First Quarter","Waxing Gibbous",
                  "Waning Gibbous","Last Quarter","Waning Crescent")


alt_daily <- alt_daily %>%
  mutate(
    group = dplyr::recode(group,
                   "gue" = "Guenon",
                   "gun" = "Gundolf",
                   "mac" = "Macuco",
                   "spo" = "Spot"),
    reuse  = dist_m <= reuse_threshold,
    alternation = !reuse,
    phase = fct_relevel(factor(moon_phase_name), phase_levels)
  )

summary_tbl <- alt_daily %>%
  group_by(group, phase) %>%
  summarise(
    n_nights     = n(),
    alt_rate     = mean(alternation, na.rm = TRUE),
    moonlight_avg= mean(moon_light_effective_avg, na.rm = TRUE),
    moonlight_sd = sd(moon_light_effective_avg, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(group, phase) %>%
  mutate(
    alt_rate     = percent(alt_rate, accuracy = 0.1),
    moonlight_avg= round(moonlight_avg, 3),
    moonlight_sd = round(moonlight_sd, 3)
  )

ft <- flextable(summary_tbl) %>%
  set_header_labels(
    group          = "Group",
    phase          = "Lunar phase",
    n_nights       = "Consecutive nights",
    alt_rate       = "P(alternation)",
    moonlight_avg  = "Moonlight avg",
    moonlight_sd   = "Moonlight SD"
  ) %>%
  autofit() %>%
  theme_booktabs() %>%
  align(align = "center", part = "all")

ft

Group	Lunar phase	Consecutive nights	P(alternation)	Moonlight avg	Moonlight SD
Guenon	New Moon	17	94.1%	0.003	0.005
Guenon	Waxing	7	100.0%	0.247	0.080
Guenon	Full Moon	7	85.7%	0.384	0.085
Guenon	Waning	17	100.0%	0.079	0.077
Gundolf	New Moon	16	81.2%	0.004	0.007
Gundolf	Waxing	21	100.0%	0.246	0.143
Gundolf	Full Moon	15	100.0%	0.378	0.120
Gundolf	Waning	18	88.9%	0.043	0.069
Macuco	New Moon	18	77.8%	0.004	0.008
Macuco	Waxing	23	91.3%	0.231	0.160
Macuco	Full Moon	10	70.0%	0.352	0.081
Macuco	Waning	20	95.0%	0.051	0.062
Spot	New Moon	23	100.0%	0.006	0.009
Spot	Waxing	22	95.5%	0.231	0.157
Spot	Full Moon	6	100.0%	0.318	0.134
Spot	Waning	23	100.0%	0.024	0.048

summary_tbl_num <- summary_tbl %>%
  mutate(
    alt_rate_num = as.numeric(sub("%", "", alt_rate)) / 100)

phase_colors <- c(
  "New Moon"  = "#3a3a3a", 
  "Waxing"    = "#8B4513", 
  "Full Moon" = "gold2", 
  "Waning"    = "#E6550D")

ggplot(summary_tbl_num, aes(x = phase, y = alt_rate_num, fill = phase)) +
  geom_boxplot(width = 0.6, outlier.alpha = 0.2, alpha = 0.9, size = 1.05) +
  scale_fill_manual(values = phase_colors, guide = "none") +
  scale_y_continuous(labels = percent_format(accuracy = 1),
                     limits = c(0.55, 1.00),
                     expand = expansion(mult = c(0, 0.05))) +
  labs(
    x = "Lunar phase",
    y = "P(alternation)") +
  theme_classic(base_size = 12) +
  theme(axis.text.x = element_text(angle = 25, hjust = 1))

Figure X. Probability of sleeping-site alternation across lunar phases. Boxplots show the distribution of alternation probabilities (P(alternation)) across all groups for each lunar phase. Colors indicate phase type: dark gray = New Moon, brown = Waxing, yellow = Full Moon, and orange = Waning. Horizontal lines indicate medians, boxes the interquartile range, and whiskers 1.5× the IQR.

ggplot(summary_tbl_num, aes(x = moonlight_avg, y = alt_rate_num)) +
  geom_point(aes(size = moonlight_sd, fill = phase), 
             shape = 21, color = "black", alpha = 0.8) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linewidth = 0.8) +
  scale_fill_viridis_d(option = "inferno", end = 0.95, name = "Lunar phase") +
  scale_size_continuous(name = "Moonlight SD") +
  scale_y_continuous(labels = percent_format(accuracy = 1),
                     limits = c(0.55, 1.00),
                     expand = expansion(mult = c(0, 0.05))) +
  labs(
    x = "Mean lunar effective light",
    y = "P(alternation)") +
  theme_classic(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0),
    axis.title.x = element_text(margin = margin(t = 8)),
    axis.title.y = element_text(margin = margin(r = 8)))

Figure. Relationship between mean lunar effective light and the probability of sleeping-site alternation. Each point represents a group–phase combination, with fill color indicating lunar phase and point size proportional to the standard deviation of moonlight. The solid line shows the fitted linear relationship (ordinary least squares).

Across phases and groups, we found no clear relationship between mean lunar effective light and the probability of sleeping-site alternation. The fitted linear trend was weak, with points for different lunar phases spanning a wide range of alternation values. This suggests that moonlight alone is not a strong predictor of alternation behaviour in our dataset.

summary_tbl_lm <- alt_daily %>%
  mutate(moon_bin = cut(
    moon_light_effective_avg,
    breaks = c(0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6),
    labels = c("0–0.01", "0.01–0.02", "0.02–0.03", "0.03–0.04", "0.04–0.05", 
               "0.05–0.075", "0.075–0.1", "0.1–0.15", "0.15–0.2", 
               "0.2–0.3", "0.3–0.4", "0.4–0.5", "0.5–0.6"),
    include.lowest = TRUE
  )) %>% 
  group_by(group, moon_bin) %>%
  summarise(
    n_nights     = n(),
    alt_rate     = mean(alternation, na.rm = TRUE),
    moonlight_avg= mean(moon_light_effective_avg, na.rm = TRUE),
    moonlight_sd = sd(moon_light_effective_avg, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(group, moon_bin) %>%
  mutate(
    alt_rate     = percent(alt_rate, accuracy = 0.1),
    moonlight_avg= round(moonlight_avg, 3),
    moonlight_sd = round(moonlight_sd, 3)
  )

summary_tbl_lm_num <- alt_daily %>%
  mutate(moon_bin = cut(
    moon_light_effective_avg,
    breaks = c(0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6),
    labels = c("0–0.01", "0.01–0.02", "0.02–0.03", "0.03–0.04", "0.04–0.05", 
               "0.05–0.075", "0.075–0.1", "0.1–0.15", "0.15–0.2", 
               "0.2–0.3", "0.3–0.4", "0.4–0.5", "0.5–0.6"),
    include.lowest = TRUE
  )) %>% 
  group_by(group, moon_bin) %>%
  summarise(
    n_nights      = n(),
    alt_rate_num  = mean(alternation, na.rm = TRUE),
    moonlight_avg = mean(moon_light_effective_avg, na.rm = TRUE),
    moonlight_sd  = sd(moon_light_effective_avg, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(group, moon_bin)

ggplot(summary_tbl_lm_num, aes(x = moonlight_avg, y = alt_rate_num, color = group)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2, size = 2) +  # alpha más bajo para más transparencia
  scale_color_viridis_d(option = "magma") +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    x = "Moonlight promedio (efectiva)",
    y = "Alternation rate",
    color = "Grupo") +
  theme_minimal()

cor_test_result <- cor.test(
  summary_tbl_lm_num$alt_rate_num,
  summary_tbl_lm_num$moonlight_avg,
  method = "pearson")

cor_test_result

## 
##  Pearson's product-moment correlation
## 
## data:  summary_tbl_lm_num$alt_rate_num and summary_tbl_lm_num$moonlight_avg
## t = -0.01927, df = 46, p-value = 0.9847
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2867444  0.2815208
## sample estimates:
##          cor 
## -0.002841155

Suummaries

# ---- Paquetes ----
library(dplyr)
library(purrr)
library(broom)
library(flextable)
library(glue)
library(tidyr)

# ---- Datos base (ajusta el filtro si querés) ----
df <- moon_all %>%
  filter(used == TRUE, moon_light_effective_avg <= 0.6) %>%
  select(moon_light_effective_avg, canopy_height, ndvi_sentinel)

pairs <- list(
  c("moon_light_effective_avg", "canopy_height"),
  c("moon_light_effective_avg", "ndvi_sentinel"),
  c("canopy_height", "ndvi_sentinel")
)

corr_tbl <- map_dfr(pairs, ~{
  x <- .x[1]; y <- .x[2]
  z <- stats::cor.test(df[[x]], df[[y]], use = "complete.obs", method = "pearson")
  tibble(
    var1 = x,
    var2 = y,
    r = unname(z$estimate),
    p_value = z$p.value,
    n_complete = sum(complete.cases(df[, c(x, y)]))
  )
}) %>%
  mutate(
    r = round(r, 3),
    p_value = signif(p_value, 3)
  ) %>%
  mutate(
    var1 = recode(var1,
                  moon_light_effective_avg = "Moonlight effectiveness",
                  canopy_height           = "Canopy height (m)",
                  ndvi_sentinel           = "NDVI"),
    var2 = recode(var2,
                  moon_light_effective_avg = "Moonlight effectiveness",
                  canopy_height           = "Canopy height (m)",
                  ndvi_sentinel           = "NDVI")
  )

corr_tbl <- corr_tbl %>%
  mutate(
    p_value = round(p_value, 4)
  )

ft_corr <- flextable(corr_tbl) %>%
  set_header_labels(
    var1 = "Variable 1",
    var2 = "Variable 2",
    r = "Pearson r",
    p_value = "p-value",
    n_complete = "n"
  ) %>%
  autofit() %>%
  theme_booktabs() %>%
  align(align = "center", part = "all")

format_range <- function(x) {
  rng <- range(x, na.rm = TRUE)
  paste0(round(rng[1], 2), "–", round(rng[2], 2))
}
mean_sd <- function(x) {
  paste0(round(mean(x, na.rm = TRUE), 2), " ± ", round(sd(x, na.rm = TRUE), 2))
}

quart_tbl <- df %>%
  mutate(moon_eff_q = ntile(moon_light_effective_avg, 4)) %>%
  group_by(moon_eff_q) %>%
  summarise(
    N = n(),
    `Height (Mean ± SD)` = mean_sd(canopy_height),
    `Height range (m)`   = format_range(canopy_height),
    `NDVI (Mean ± SD)`   = mean_sd(ndvi_sentinel),
    `NDVI range`         = format_range(ndvi_sentinel),
    .groups = "drop"
  ) %>%
  mutate(moon_eff_q = paste0("Q", moon_eff_q))

ft_quart <- flextable(quart_tbl) %>%
  set_header_labels(
    moon_eff_q = "Moonlight effectiveness (quartiles)",
    N = "n",
    `Height (Mean ± SD)` = "Canopy height (Mean ± SD)",
    `Height range (m)`   = "Canopy height range (m)",
    `NDVI (Mean ± SD)`   = "NDVI (Mean ± SD)",
    `NDVI range`         = "NDVI range"
  ) %>%
  autofit() %>%
  theme_booktabs() %>%
  align(align = "center", part = "all")

# ---- 3) Modelos lineales ----
mods <- list(
  list(resp = "canopy_height", lab = "Canopy height (m)"),
  list(resp = "ndvi_sentinel", lab = "NDVI")
)

model_tbl <- map_dfr(mods, ~{
  f <- as.formula(paste0(.x$resp, " ~ moon_light_effective_avg"))
  m <- lm(f, data = df)
  tid <- tidy(m, conf.int = TRUE)
  gl  <- glance(m)
  slope <- tid %>% filter(term == "moon_light_effective_avg")
  tibble(
    Response = .x$lab,
    Slope = slope$estimate,
    `CI lower` = slope$conf.low,
    `CI upper` = slope$conf.high,
    `p-value` = slope$p.value,
    `Adj. R²` = gl$adj.r.squared,
    n = gl$df.residual + gl$df.null - gl$df.residual + 1 # o simplemente sum(complete.cases(model.frame(m)))
  )
}) %>%
  mutate(
    Slope = round(Slope, 3),
    `CI lower` = round(`CI lower`, 3),
    `CI upper` = round(`CI upper`, 3),
    `p-value` = signif(`p-value`, 3),
    `Adj. R²` = round(`Adj. R²`, 3)
  )


ft_corr

Variable 1	Variable 2	Pearson r	n
Moonlight effectiveness	Canopy height (m)	-0.614	381
Moonlight effectiveness	NDVI	-0.221	381
Canopy height (m)	NDVI	0.379	381

ft_quart

Moonlight effectiveness (quartiles)	n	Canopy height (Mean ± SD)	Canopy height range (m)	NDVI (Mean ± SD)	NDVI range
Q1	96	24.02 ± 2.03	15–29	0.57 ± 0.07	0.18–0.64
Q2	95	22.58 ± 2.99	13–28	0.57 ± 0.04	0.45–0.67
Q3	95	20.65 ± 4.09	5–28	0.55 ± 0.1	-0.14–0.65
Q4	95	17.39 ± 3.21	7–25	0.52 ± 0.1	0.12–0.66