Working with environmental data

For this module, you’ll need the following packages:

  • sf
  • terra
  • geodata
  • rnaturalearth
  • rnaturalearthdata

In Module 5 we asked whether the patterns in our howler monkey data reflect real ecology or sampling effort. To start answering that — and to eventually build a species distribution model — we need environmental data describing what conditions actually exist across the landscape.

This module covers downloading, inspecting, and preparing the environmental layers we’ll use throughout the rest of the course. By the end, you’ll have a multi-layer raster stack covering Costa Rica that’s ready for analysis.

Why environmental data?

The dominant theory in species distribution modelling is Hutchinson’s n-dimensional hypervolume — the idea that a species can persist anywhere that its full set of environmental requirements are met (Hutchinson 1957). In practice this is simplified considerably, but the core logic holds: species are found where their environment suits them. To predict where a species might occur, we need to know both where it has been recorded and what the environment looks like everywhere else.

The two most commonly used environmental data sources in SDM work are elevation and climate — particularly the WorldClim bioclimatic variables, which we’ll use here.

Loading packages and defining the study region

library(sf)
library(terra)
library(geodata)
library(rnaturalearth)
library(rnaturalearthdata)
world       <- ne_countries(scale = "medium", returnclass = "sf")
region      <- world[world$name == "Costa Rica", ]
region_vect <- vect(region)       # terra needs SpatVector, not sf
region_bbox <- st_bbox(region)    # bounding box for cropping

Elevation

We already met elevation in Module 4. Let’s download it properly and prepare it for use alongside climate data.

geodata_path(".")

elev    <- elevation_global(res = 10)
elev_cr <- crop(elev, region_vect)
elev_cr <- mask(elev_cr, region_vect)

plot(elev_cr, main = "Elevation — Costa Rica (m)")

# Quick summary of elevation values
summary(values(elev_cr), na.rm = TRUE)
##  wc2.1_10m_elev  
##  Min.   :  11.0  
##  1st Qu.:  75.5  
##  Median : 176.0  
##  Mean   : 485.1  
##  3rd Qu.: 699.0  
##  Max.   :2555.0  
##  NA's   :185

Costa Rica ranges from sea level to over 3,800 m at Cerro Chirripó. That’s a remarkable range for a country roughly the size of West Virginia, and it’s a large part of why Costa Rica is so biodiverse — it compresses multiple elevational zones and climate regimes into a small area (Janzen 1983).

WorldClim bioclimatic variables

WorldClim provides 19 bioclimatic variables derived from monthly temperature and precipitation data, interpolated from global weather station networks (Fick & Hijmans 2017). They were designed specifically for species distribution modelling and are now one of the most widely used environmental datasets in ecology. The 19 variables cover things like:

  • bio01: Mean annual temperature
  • bio04: Temperature seasonality (standard deviation × 100)
  • bio12: Annual precipitation
  • bio15: Precipitation seasonality (coefficient of variation)
  • bio17: Precipitation of the driest quarter

A full description of all 19 variables is at worldclim.org/data/bioclim.html.

Note that temperature variables are stored in °C × 10 — so a value of 245 means 24.5°C. This is a legacy convention from when integer storage saved meaningful disk space. Divide by 10 to get degrees Celsius.

# Set download = TRUE the first time, then FALSE to use cached files
clim <- worldclim_global(var = 'bio', res = 10, download = FALSE, path = 'data')

# Rename to cleaner labels
names(clim) <- c(paste0('bio0', 1:9), paste0('bio', 10:19))

# Crop and mask to Costa Rica
clim_cr <- crop(clim, region_vect)
clim_cr <- mask(clim_cr, region_vect)

clim_cr
## class       : SpatRaster 
## size        : 19, 20, 19  (nrow, ncol, nlyr)
## resolution  : 0.1666667, 0.1666667  (x, y)
## extent      : -85.83333, -82.5, 8, 11.16667  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source(s)   : memory
## names       :    bio01,     bio02,    bio03,    bio04,    bio05,    bio06, ... 
## min values  : 11.86418,  6.975149, 71.24990, 52.38264, 17.60925,  5.98050, ... 
## max values  : 27.48187, 11.785084, 82.39379, 94.06358, 35.58375, 21.73693, ...

Let’s plot a few variables to get a feel for the data.

# Mean annual temperature (divide by 10 for °C)
plot(clim_cr[['bio01']] / 10,
     main = "Mean annual temperature (°C)")

# Annual precipitation (mm)
plot(clim_cr[['bio12']],
     main = "Annual precipitation (mm)")

# Precipitation seasonality
plot(clim_cr[['bio15']],
     main = "Precipitation seasonality")

Notice how different variables tell different stories about the landscape. Temperature broadly tracks elevation — cooler highlands, warmer coasts. Precipitation is patchier, reflecting the complex rain shadow effects of the central mountains, which create a drier Pacific slope and a wetter Caribbean side.

Checking that layers align

Before stacking and combining layers, we need to confirm that they share the same resolution, extent, and CRS. Layers that don’t align can’t be combined without error.

# Resolution
res(elev_cr)
## [1] 0.1666667 0.1666667
res(clim_cr)
## [1] 0.1666667 0.1666667
# CRS
crs(elev_cr, describe = TRUE)
##     name authority code  area             extent
## 1 WGS 84      EPSG 4326 World -180, 180, -90, 90
crs(clim_cr, describe = TRUE)
##     name authority code  area             extent
## 1 WGS 84      EPSG 4326 World -180, 180, -90, 90

The resolutions should both be approximately 0.167° (10 arc-minutes). If they differ, we need to resample. Here we resample elevation to exactly match the climate grid, which ensures perfect alignment.

elev_resamp <- terra::resample(elev_cr, clim_cr, method = 'bilinear')

# Confirm match
res(elev_resamp)
## [1] 0.1666667 0.1666667
res(clim_cr)
## [1] 0.1666667 0.1666667

Now we can stack them.

env_stack <- c(clim_cr, elev_resamp)
names(env_stack)[20] <- 'elev'

nlyr(env_stack)   # should be 20
## [1] 20
names(env_stack)
##  [1] "bio01" "bio02" "bio03" "bio04" "bio05" "bio06" "bio07" "bio08" "bio09"
## [10] "bio10" "bio11" "bio12" "bio13" "bio14" "bio15" "bio16" "bio17" "bio18"
## [19] "bio19" "elev"

Extracting values at points

To check everything’s working, let’s extract environmental values at a few howler monkey locations.

library(rgbif)
## Warning: package 'rgbif' was built under R version 4.5.3
species <- occ_search(
  scientificName = "Alouatta palliata",
  country = "CR",
  limit = 2000,
  hasCoordinate = TRUE
)

howler_sf <- st_as_sf(
  species$data,
  coords = c("decimalLongitude", "decimalLatitude"),
  crs = 4326
)
##   ID    bio01     bio02    bio03    bio04    bio05    bio06    bio07    bio08
## 1  1 27.36959 10.549562 75.11258 81.58629 35.13100 21.08600 14.04500 26.83204
## 2  2 27.23806 10.420375 75.84937 76.99428 34.80850 21.07025 13.73825 26.75783
## 3  3 26.26099  7.571301 75.19080 79.95874 31.53285 21.46340 10.06945 26.43492
## 4  4 25.65477  8.225204 78.03919 73.83220 31.10515 20.56531 10.53984 24.88519
## 5  5 25.44864  8.554354 76.85506 77.53883 31.35675 20.22625 11.13050 25.61754
##      bio09    bio10    bio11 bio12 bio13 bio14    bio15 bio16 bio17 bio18 bio19
## 1 27.65667 28.55733 26.52408  1879   366     4 87.82879   944    18   292   487
## 2 27.52625 28.33908 26.41275  1871   356     6 85.86667   922    25   298   486
## 3 26.94654 27.37181 25.36921  3768   510   123 37.64306  1247   479   582  1175
## 4 26.21247 26.62674 24.76807  2977   316   144 23.76402   896   505   603   819
## 5 26.12367 26.52971 24.59117  3621   457   115 37.58236  1227   414   566   991
##   elev
## 1   25
## 2   40
## 3   19
## 4  174
## 5  176

Each row is an occurrence point; each column is an environmental variable at that location. This is the foundation of species distribution modelling — the numerical pairing of “where was the species?” with “what was the environment there?”

Save your work

We’ll use these objects in every subsequent module.

terra::writeRaster(env_stack, 'data/env_stack.tif', overwrite = TRUE)
terra::writeRaster(clim_cr,   'data/clim_cr.tif',   overwrite = TRUE)
# Note: region_vect is a terra SpatVector and cannot be saved with save() —
# terra's external pointer objects become invalid when serialised.
# We save region (an sf object) and recreate region_vect with vect() as needed.
save(region, region_bbox, file = 'data/region.RData')

Exercise

  1. Plot bio04 (temperature seasonality). Where in Costa Rica is temperature most seasonal? Does this make geographical sense given what you know about the country’s climate?
  2. Use terra::extract() to get the mean annual temperature (bio01 / 10) at all howler locations. What is the range of temperatures where howlers have been recorded? Does this align with what you’d expect for a tropical forest species?
  3. Use terra::global(clim_cr[['bio12']], fun = 'mean', na.rm = TRUE) to calculate mean annual precipitation across Costa Rica. Then do the same for bio15 (precipitation seasonality). Which part of the country do you think is the most challenging environment for a forest-dependent species?

References

Fick, S.E. & Hijmans, R.J. (2017). WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37, 4302–4315.

Hutchinson, G.E. (1957). Concluding remarks. Cold Spring Harbor Symposia on Quantitative Biology, 22, 415–427.

Janzen, D.H. (1983). No park is an island: increase in interference from outside as park size decreases. Oikos, 41, 402–410.

← Previous Next →