Skip to contents

The World Checklist of Vascular Plants (WCVP) provides distribution data for the > 340,000 vascular plant species known to science. This distribution data can be used to build occurrence matrices for checklists of plant species, which rWCVP can help with.

As well as rWCVP, well use the tidyverse packages for data manipulation and plotting and the gt package for formatting tables.

In this example I use the pipe operator (%>%) and dplyr syntax - if these are unfamiliar I suggest checking out https://dplyr.tidyverse.org/ and some of the help pages therein.

Now, let’s get started!

Finding an example group

For this example, we don’t have a particular area or group of plants that we want to examine, but this gives us a chance to showcase one of other the functions in rWCVP!

We want a group of species that is a) not too large and b) distributed across a few WGSRPD Level 3 Areas. Brazil has good potential because it has five Level 3 Areas (a good number for this purpose because the table will fit on a portrait-oriented page). Let’s see if there are some nice-sized example genera, using the wcvp_summary function:

wcvp_summary(taxon="Myrtaceae", taxon_rank="family", area=get_wgsrpd3_codes("Brazil"), 
              grouping_var = "genus") %>% 
  wcvp_summary_gt()

Myrtaceae of Brazil
Total number of species: 1127
Number of regionally endemic species: 878
Genus Native Endemic Introduced Extinct Total
Accara 1 1 1
Algrizea 2 2 2
Blepharocalyx 3 1 3
Calycolpus 8 4 8
Calycorectes 9 9 9
Campomanesia 41 31 1 42
Curitiba 1 1 1
Eugenia 430 338 431
Feijoa 1 1
Myrceugenia 34 30 34
Myrcia 440 351 1 441
Myrcianthes 7 3 7
Myrciaria 23 15 23
Myrrhinium 1 1
Neomitranthes 13 13 13
Pimenta 1 1
Plinia 41 35 41
Psidium 57 36 57
Siphoneugena 10 8 10
Syzygium 1 1
Calycolpus looks nice and tidy - let’s see how the 8 species are distributed across the 5 areas. We can use the same function, but limit our taxon and change our grouping variable to area.
wcvp_summary(taxon="Calycolpus", taxon_rank="genus", area=get_wgsrpd3_codes("Brazil"),
              grouping_var = "area_code_l3") %>% 
  wcvp_summary_gt()

Calycolpus of Brazil
Total number of species: 8
Number of regionally endemic species: 4
Native Endemic Introduced Extinct Total
BZE 3 2 3
BZL 1 1 1
BZN 5 1 5
Hmm, maybe a bit too small - it only occurs in 3 of the 5 regions. What about Myrciaria?
wcvp_summary(taxon="Myrciaria", taxon_rank="genus", area=get_wgsrpd3_codes("Brazil"), 
              grouping_var="area_code_l3") %>% 
  wcvp_summary_gt()
Myrciaria of Brazil
Total number of species: 23
Number of regionally endemic species: 15
Native Endemic Introduced Extinct Total
BZC 6 6
BZE 13 2 13
BZL 16 4 16
BZN 6 6
BZS 5 1 1 6

Perfect! 23 species (rows) won’t take up too much space, and there are enough occurrences to make it interesting.

Generating and formatting the occurrence matrix

Generating an occurrence matrix for this genus is as simple as using the generate_occurence_matrix function.

m <- wcvp_occ_mat(taxon="Myrciaria", taxon_rank="genus", 
                                area=get_wgsrpd3_codes("Brazil"))
m
#> # A tibble: 23 x 7
#>    plant_name_id taxon_name             BZC   BZE   BZL   BZN   BZS
#>            <dbl> <chr>                <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1        473796 Myrciaria alagoana       0     1     0     0     0
#>  2        534878 Myrciaria alta           0     0     1     0     0
#>  3        534776 Myrciaria cambuca        0     1     1     0     0
#>  4        131799 Myrciaria cordata        0     0     0     1     0
#>  5        131802 Myrciaria cuspidata      1     1     1     0     1
#>  6        131803 Myrciaria delicatula     1     0     1     0     1
#>  7        131806 Myrciaria disticha       0     1     1     0     0
#>  8        131810 Myrciaria dubia          1     0     0     1     0
#>  9        491614 Myrciaria evanida        0     0     1     0     0
#> 10        131814 Myrciaria ferruginea     0     1     1     0     0
#> # ... with 13 more rows

It’s OK, but we can make it much prettier using the gt package. Let’s do the following:

  • remove the WCVP ID column
  • change taxon_id to ‘Species’
  • make species names italic
  • bold the column titles
  • reduce the space around the text and make font size 12
  • remove the internal borders
  • change the 1s and 0s into X and blank
m_gt <- m %>% 
  select(-plant_name_id) %>% #remove ID col
  gt() %>% 
  cols_label(
    taxon_name = "Species"
  ) %>% 
  #make species names italic
        tab_style(
        style=cell_text(style="italic"),
        locations = cells_body(
          columns= taxon_name
        )
      ) %>% 
  tab_options(
    # some nice formatting
        column_labels.font.weight = "bold",
        data_row.padding = px(1),
        table.font.size = 12,
        table_body.hlines.color = "transparent",
        ) %>%
  # change the zeroes into blanks
      text_transform(
        locations = cells_body(),
        fn = function(x){
          ifelse(x == 0, "", x)
        }
      ) %>% 
  # change the 1s into X
        text_transform(
        locations = cells_body(),
        fn = function(x){
          ifelse(x == 1, "X", x)
        }
      )
m_gt
Species BZC BZE BZL BZN BZS
Myrciaria alagoana X
Myrciaria alta X
Myrciaria cambuca X X
Myrciaria cordata X
Myrciaria cuspidata X X X X
Myrciaria delicatula X X X
Myrciaria disticha X X
Myrciaria dubia X X
Myrciaria evanida X
Myrciaria ferruginea X X
Myrciaria floribunda X X X X X
Myrciaria glanduliflora X
Myrciaria glazioviana X X X
Myrciaria glomerata X X X
Myrciaria guaquiea X X
Myrciaria pallida X
Myrciaria pilosa X X
Myrciaria plinioides X
Myrciaria rojasii X
Myrciaria strigipes X X
Myrciaria tenella X X X X X
Myrciaria una X
Myrciaria vismeifolia X

Much nicer! We can save this gt table as an HTML table or as a picture. If we plan on making a few more tables, we can save space by saving our table style as a theme (see https://themockup.blog/posts/2020-09-26-functions-and-themes-for-gt-tables/ for more details on this)

occ_mat_theme <- function(x){
  x %>% cols_label(
    taxon_name = "Species"
  ) %>% 
  #make species names italic
        tab_style(
        style=cell_text(style="italic"),
        locations = cells_body(
          columns= taxon_name
        )
      ) %>% 
  tab_options(
    # some nice formatting
        column_labels.font.weight = "bold",
        data_row.padding = px(1),
        table.font.size = 12,
        table_body.hlines.color = "transparent",
        ) %>%
  # change the zeroes into blanks
      text_transform(
        locations = cells_body(),
        fn = function(x){
          ifelse(x == 0, "", x)
        }
      ) %>% 
  # change the 1s into X
        text_transform(
        locations = cells_body(),
        fn = function(x){
          ifelse(x == 1, "X", x)
        }
      )
}

The biggest issue with gt() is that it doesn’t support Word - for exporting directly to a docx file, check out flextable (https://ardata-fr.github.io/flextable-book/). ### Including or excluding occurrence types What if we only want to know about native or introduced species? This function has the option to filter for one or the other. Brazilian Myrciaria doesn’t look very interesting on that front (we can see from the summary table that only one species is introduced), so let’s look at a more invasive group - Poa in Northern Europe (Level 2 Region).

wcvp_summary(taxon="Poa", taxon_rank="genus", area=get_wgsrpd3_codes("Northern Europe"), 
              grouping_var = "area_code_l3") %>% 
  wcvp_summary_gt()

Poa of Northern Europe
Total number of species: 23
Number of regionally endemic species: 0
Native Endemic Introduced Extinct Total
DEN 11 1 12
FIN 16 1 17
FOR 8 8
GRB 12 3 15
ICE 9 1 11
IRE 6 2 8
NOR 17 1 18
SVA 7 7
SWE 17 1 18
A few more there to work with. First, let’s look at the native species only:
m <- wcvp_occ_mat(taxon="Poa", taxon_rank="genus",
                                area=get_wgsrpd3_codes("Northern Europe"), 
                                introduced=FALSE, extinct=FALSE, 
                                location_doubtful=FALSE)
m
#> # A tibble: 20 x 11
#>    plant_name_id taxon_name         DEN   FIN   FOR   GRB   ICE   IRE   NOR   SVA   SWE
#>            <dbl> <chr>            <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1        435004 Poa abbreviata       0     0     0     0     0     0     0     1     0
#>  2        435078 Poa alpigena         0     1     1     0     1     0     1     1     1
#>  3        435085 Poa alpina           0     1     1     1     1     1     1     1     1
#>  4        435167 Poa angustifolia     1     1     0     1     0     0     1     0     1
#>  5        435194 Poa annua            1     1     1     1     1     1     1     0     1
#>  6        435235 Poa arctica          0     1     0     0     0     0     1     1     1
#>  7        435458 Poa bulbosa          1     1     0     1     0     0     1     0     1
#>  8        435622 Poa compressa        1     1     0     0     0     0     1     0     1
#>  9        435932 Poa flexuosa         0     0     0     1     1     0     1     0     1
#> 10        435996 Poa glauca           0     1     1     1     1     0     1     1     1
#> 11        436089 Poa hartzii          0     0     0     0     0     0     0     1     0
#> 12        436146 Poa humilis          1     1     1     1     1     1     1     0     1
#> 13        436189 Poa infirma          0     0     0     1     0     0     0     0     0
#> 14        436383 Poa lindebergii      0     1     0     0     0     0     1     0     1
#> 15        436600 Poa nemoralis        1     1     1     1     1     1     1     0     1
#> 16        436739 Poa palustris        1     1     0     1     0     0     1     0     1
#> 17        436906 Poa pratensis        1     1     1     1     1     1     1     1     1
#> 18        437092 Poa remota           1     1     0     0     0     0     1     0     1
#> 19        437424 Poa supina           1     1     0     0     0     0     1     0     1
#> 20        437547 Poa trivialis        1     1     1     1     1     1     1     0     1

We can format this matrix just like we did above, but let’s skip that and go straight to introduced species only. We’re doing all the same formatting as before, but also adding a heading - the html function makes it possiible to italicise our genus name and everything!

m <- wcvp_occ_mat(taxon="Poa", taxon_rank="genus",
                                area=get_wgsrpd3_codes("Northern Europe"), 
                                native=FALSE,
                                introduced=TRUE, extinct=FALSE, 
                                location_doubtful = FALSE)
m %>% 
  select(-plant_name_id) %>% #remove ID col
  gt() %>% 
  occ_mat_theme() %>%  #the theme we defined above
  #add a header
  tab_header(title=html("Introduced <em>Poa</em> species in Northern Europe")) 
Introduced Poa species in Northern Europe
Species DEN FIN FOR GRB ICE IRE NOR SVA SWE
Poa angustifolia X
Poa chaixii X X X X X
Poa compressa X
Poa flabellata X
Poa palustris X
Poa persica X

Bonus: adding a country spanner

Tables created with gt are extremely flexible - let’s say we want to look at occurrences across the US-Canadian border:

m <- wcvp_occ_mat("Fritillaria", "genus", 
                                area=c("WAS", "ORE", "IDA","MNT", "ABT", "BRC"))


m_gt <- m %>% 
  select(-plant_name_id) %>% #remove ID col
  gt() %>% 
  occ_mat_theme() %>%  #the theme we defined above
  #add a header
  tab_header(title=html("<em>Fritillaria</em> species in Northwest USA and Southwest Canada")) 
m_gt
Fritillaria species in Northwest USA and Southwest Canada
Species ABT BRC IDA MNT ORE WAS
Fritillaria affinis X X X X X
Fritillaria atropurpurea X X X
Fritillaria camschatcensis X X X
Fritillaria eastwoodiae X
Fritillaria gentneri X
Fritillaria glauca X
Fritillaria pudica X X X X X X
Fritillaria purdyi X
Fritillaria recurva X

It would be really useful to know which of those codes are in the US and which are in Canada. We could use the data included in rWCVP to create a key.

wgsrpd_mapping %>% 
  filter(LEVEL3_COD %in% c("WAS", "ORE", "IDA","MNT", "ABT", "BRC")) %>% 
  select(LEVEL3_NAM, LEVEL3_COD, COUNTRY) %>% 
  gt() %>% 
  #some formatting
  tab_options(
        column_labels.font.weight = "bold",
        data_row.padding = px(1),
        table.font.size = 12,
        table_body.hlines.color = "transparent",
        )

LEVEL3_NAM LEVEL3_COD COUNTRY
Alberta ABT Canada
British Columbia BRC Canada
Idaho IDA United States
Montana MNT United States
Oregon ORE United States
Washington WAS United States
It would really be nicer to have it on the occurrence matrix though. Enter tab_spanner():
m_gt %>% 
  tab_spanner(label="United States",
              columns = c(IDA, MNT, ORE, WAS)) %>% 
  tab_spanner(label="Canada",
              columns=c(ABT, BRC)) 
Fritillaria species in Northwest USA and Southwest Canada
Species Canada United States
ABT BRC IDA MNT ORE WAS
Fritillaria affinis X X X X X
Fritillaria atropurpurea X X X
Fritillaria camschatcensis X X X
Fritillaria eastwoodiae X
Fritillaria gentneri X
Fritillaria glauca X
Fritillaria pudica X X X X X X
Fritillaria purdyi X
Fritillaria recurva X

There is a lot more that can be done with gt - see https://gt.rstudio.com/ for help, examples and documentation.