Skip to contents

The World Checklist of Vascular Plants (WCVP) provides distribution data for the > 340,000 vascular plant species known to science. We can use this data to map various metrics of vascular plant biodiversity.

As well as rWCVP, we’ll use the tidyverse packages for data manipulation, the sf package for handling spatial data, and the gt package for formatting tables.

In this example we use the pipe operator (%>%),dplyr and ggplot - if these are unfamiliar we’d suggest checking out https://tidyverse.org/ and some of the help pages therein, or this code might be difficult to interpret.

Now, let’s get started!

Species richness

The obvious place to start is global species richness. We can use wcvp_summary to condense the global occurrence data for all species into raw counts per WGSRPD Level 3 Area.

This gives us a list, with information in the first five slots and the data in the slot named $Summary. It’s this $Summary data frame that we’ll be working with.

global_summary <- wcvp_summary()
#> i No area specified. Generating global summary.
#> i No taxon specified. Generating summary of all species.
glimpse(global_summary)
#> List of 6
#>  $ Taxon                               : NULL
#>  $ Area                                : chr "the world"
#>  $ Grouping_variable                   : chr "area_code_l3"
#>  $ Total_number_of_species             : int 347525
#>  $ Number_of_regionally_endemic_species: int 347527
#>  $ Summary                             : tibble [368 x 6] (S3: tbl_df/tbl/data.frame)
#>   ..$ area_code_l3: chr [1:368] "ABT" "AFG" "AGE" "AGS" ...
#>   ..$ Native      : int [1:368] 1608 4510 5331 2082 5357 2990 3382 198 3506 2578 ...
#>   ..$ Endemic     : int [1:368] 0 862 241 252 852 30 60 19 178 75 ...
#>   ..$ Introduced  : int [1:368] 300 101 711 412 404 928 182 48 345 101 ...
#>   ..$ Extinct     : int [1:368] 0 1 1 2 7 2 2 0 30 0 ...
#>   ..$ Total       : int [1:368] 1908 4618 6045 2497 5769 3921 3602 246 3934 2680 ...

We could display this using wcvp_summary_gt but that’s going to be a big table, so let’s skip straight to plotting.

rWCVPdata includes the area polygons for the WGSRPD Level 3 Areas - this is going to be the base that we add things to. Let’s add the global_summary to the spatial data, and plot a map where species richness defines colour.

#load the spatial data
area_polygons <- rWCVPdata::wgsrpd3 %>% 
  #add the summary data, allowing for the different column names
  left_join(global_summary$Summary, by=c("LEVEL3_COD"="area_code_l3"))

ggplot(area_polygons)+
  geom_sf(aes(fill=Native))

Hmm, it’s not as pretty as it could be and we’re losing all the islands - let’s make a few tweaks.

#wrapping n brackets so it assigns as well as prints
(p_native_richness <- ggplot(area_polygons) +
  #remove borders
  geom_sf(aes(fill=Native), col="transparent") +
  # add points for islands
  stat_sf_coordinates(aes(col=Native))+
  #remove borders
  theme_void() +
  #use a better colour palette
  scale_fill_viridis_c(name = "Native\nspecies")+ 
  scale_colour_viridis_c(name = "Native\nspecies")+ 
  #remove extra whitespace
  coord_sf(expand=FALSE)
)

Much better!

Endemic species richness

We can also incorporate other metrics - let’s look at the proportion of species in each Level 3 Area that are endemic to that Area. We’ll use a slightly different colour palette too, just for fun.

area_polygons <- area_polygons %>% 
  mutate(percent_endemic = Endemic/Native)

(p_prop_endemic <- ggplot(area_polygons) +
  #remove borders
  geom_sf(aes(fill=percent_endemic), col="transparent") +
  # add points for islands
  stat_sf_coordinates(aes(col=percent_endemic))+
  #remove borders
  theme_void()+
  #use a better colour palette
  scale_fill_distiller(palette="Reds", direction=1, name="% of species\nthat are\nendemic")+ 
  scale_colour_distiller(palette="Reds", direction=1, 
                         name="% of species\nthat are\nendemic")+ 
  #remove extra whitespace
  coord_sf(expand=FALSE)
)

Endemism hotspots really jump out! We can use this information to produce a table of the top ten.

area_polygons %>% 
  #get the region and continent names
  left_join(wgsrpd_mapping) %>% 
  #get the top 10
  slice_max(percent_endemic,n=10) %>% 
  #drop the spatial data
  st_drop_geometry() %>% 
  select(LEVEL3_NAM, percent_endemic, LEVEL1_NAM) %>% 
  group_by(LEVEL1_NAM) %>% 
  #format as a table
  gt() %>% 
  cols_label(
    percent_endemic = "% Endemic",
    LEVEL3_NAM = "WGSRPD Level 3 Area"
  ) %>% 
  tab_options(
    # some nice formatting
        column_labels.font.weight = "bold",
        row_group.font.weight = "bold",
        row_group.as_column = TRUE,
        data_row.padding = px(1),
        table.font.size = 12,
        table_body.hlines.color = "transparent",
        ) %>%
  #format the number as %
  fmt_percent(
    columns = percent_endemic,
    decimals = 1
  )
#> Joining, by = c("LEVEL3_NAM", "LEVEL3_COD", "LEVEL2_COD", "LEVEL1_COD")
WGSRPD Level 3 Area % Endemic
PACIFIC Hawaii 91.7%
New Caledonia 76.8%
AFRICA Madagascar 82.1%
Cape Provinces 67.7%
St.Helena 61.7%
ASIA-TROPICAL New Guinea 67.8%
Borneo 52.8%
Philippines 52.1%
AUSTRALASIA Western Australia 67.2%
SOUTHERN AMERICA Juan Fernández Is. 51.0%

We can attach any variable to area_polygons - it doesn’t have to be from the World Checklist!

Limiting maps to an area of interest

One more example, showing how we can limit our maps to a specific area of interest. Let’s look at orchids in Africa.

orchid_africa_summary <- wcvp_summary("Orchidaceae", "family", get_wgsrpd3_codes("Africa"))
#> i Matches to input geography found at Continent (Level 1)

Remember that the dataset is saved in the $Summary slot (trying to join orchid_africa_summary will cause an error).

#reset the spatial data by loading it in afresh
area_polygons <- rWCVPdata::wgsrpd3 %>% 
  #add the summary data, allowing for the different column names
  left_join(orchid_africa_summary$Summary, by=c("LEVEL3_COD"="area_code_l3")) %>% 
  #add continent and region names for easy filtering
  left_join(wgsrpd_mapping) 
#> Joining, by = c("LEVEL3_NAM", "LEVEL3_COD", "LEVEL2_COD", "LEVEL1_COD")

We could filter our polygons to only those in Africa, but from a mapping perspective it makes more sense to leave them in (though greyed out). So, to constrain our map, we need to set up custom limits.

#get a sensible bounding box for the polygons in Africa
bounding_box <- st_bbox(area_polygons %>% filter(LEVEL1_NAM =="AFRICA"))
#add a buffer as we set limits
xmin <- bounding_box["xmin"] - 2
xmax <- bounding_box["xmax"] + 2
ymin <- bounding_box["ymin"] - 2
ymax <- bounding_box["ymax"] + 2

(p_orchids <- ggplot(area_polygons) +
  #remove borders
  geom_sf(aes(fill=Native), col="transparent") +
  # add points for islands
  stat_sf_coordinates(aes(col=Native), size=4)+
  #bounding box we set up above
  coord_sf(xlim=c(xmin, xmax), ylim=c(ymin, ymax))+
  #remove borders
  theme_void()+
  #set colour palette (fill for polygons, colour for points)
  scale_fill_viridis_c(name = "Native\nspecies")+ 
  scale_colour_viridis_c(name = "Native\nspecies")+
    ggtitle("Orchidaceae species richness")
)