Publication-ready occurrence matrices with rWCVP
Matilda Brown
24/05/2022
Source:vignettes/articles/occurrence-matrices.Rmd
occurrence-matrices.Rmd
The World Checklist of
Vascular Plants (WCVP) provides distribution data for the >
340,000 vascular plant species known to science. This distribution data
can be used to build occurrence matrices for checklists of plant
species, which rWCVP
can help with.
As well as rWCVP
, well use the tidyverse
packages for data manipulation and plotting and the gt
package for formatting tables.
In this example I use the pipe operator (%>%
) and
dplyr
syntax - if these are unfamiliar I suggest checking
out https://dplyr.tidyverse.org/ and some of the help pages
therein.
Now, let’s get started!
Finding an example group
For this example, we don’t have a particular area or group of plants
that we want to examine, but this gives us a chance to showcase one of
other the functions in rWCVP
!
We want a group of species that is a) not too large and b)
distributed across a few WGSRPD Level 3 Areas. Brazil has good potential
because it has five Level 3 Areas (a good number for this purpose
because the table will fit on a portrait-oriented page). Let’s see if
there are some nice-sized example genera, using the
wcvp_summary
function:
wcvp_summary(taxon="Myrtaceae", taxon_rank="family", area=get_wgsrpd3_codes("Brazil"),
grouping_var = "genus") %>%
wcvp_summary_gt()
Myrtaceae of Brazil | |||||
---|---|---|---|---|---|
Total number of species: 1127 Number of regionally endemic species: 878 |
|||||
Genus | Native | Endemic | Introduced | Extinct | Total |
Accara | 1 | 1 | 1 | ||
Algrizea | 2 | 2 | 2 | ||
Blepharocalyx | 3 | 1 | 3 | ||
Calycolpus | 8 | 4 | 8 | ||
Calycorectes | 9 | 9 | 9 | ||
Campomanesia | 41 | 31 | 1 | 42 | |
Curitiba | 1 | 1 | 1 | ||
Eugenia | 430 | 338 | 431 | ||
Feijoa | 1 | 1 | |||
Myrceugenia | 34 | 30 | 34 | ||
Myrcia | 440 | 351 | 1 | 441 | |
Myrcianthes | 7 | 3 | 7 | ||
Myrciaria | 23 | 15 | 23 | ||
Myrrhinium | 1 | 1 | |||
Neomitranthes | 13 | 13 | 13 | ||
Pimenta | 1 | 1 | |||
Plinia | 41 | 35 | 41 | ||
Psidium | 57 | 36 | 57 | ||
Siphoneugena | 10 | 8 | 10 | ||
Syzygium | 1 | 1 |
wcvp_summary(taxon="Calycolpus", taxon_rank="genus", area=get_wgsrpd3_codes("Brazil"),
grouping_var = "area_code_l3") %>%
wcvp_summary_gt()
Calycolpus of Brazil | |||||
---|---|---|---|---|---|
Total number of species: 8 Number of regionally endemic species: 4 |
|||||
Native | Endemic | Introduced | Extinct | Total | |
BZE | 3 | 2 | 3 | ||
BZL | 1 | 1 | 1 | ||
BZN | 5 | 1 | 5 |
wcvp_summary(taxon="Myrciaria", taxon_rank="genus", area=get_wgsrpd3_codes("Brazil"),
grouping_var="area_code_l3") %>%
wcvp_summary_gt()
Myrciaria of Brazil | |||||
---|---|---|---|---|---|
Total number of species: 23 Number of regionally endemic species: 15 |
|||||
Native | Endemic | Introduced | Extinct | Total | |
BZC | 6 | 6 | |||
BZE | 13 | 2 | 13 | ||
BZL | 16 | 4 | 16 | ||
BZN | 6 | 6 | |||
BZS | 5 | 1 | 1 | 6 |
Perfect! 23 species (rows) won’t take up too much space, and there are enough occurrences to make it interesting.
Generating and formatting the occurrence matrix
Generating an occurrence matrix for this genus is as simple as using
the generate_occurence_matrix
function.
m <- wcvp_occ_mat(taxon="Myrciaria", taxon_rank="genus",
area=get_wgsrpd3_codes("Brazil"))
m
#> # A tibble: 23 x 7
#> plant_name_id taxon_name BZC BZE BZL BZN BZS
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 473796 Myrciaria alagoana 0 1 0 0 0
#> 2 534878 Myrciaria alta 0 0 1 0 0
#> 3 534776 Myrciaria cambuca 0 1 1 0 0
#> 4 131799 Myrciaria cordata 0 0 0 1 0
#> 5 131802 Myrciaria cuspidata 1 1 1 0 1
#> 6 131803 Myrciaria delicatula 1 0 1 0 1
#> 7 131806 Myrciaria disticha 0 1 1 0 0
#> 8 131810 Myrciaria dubia 1 0 0 1 0
#> 9 491614 Myrciaria evanida 0 0 1 0 0
#> 10 131814 Myrciaria ferruginea 0 1 1 0 0
#> # ... with 13 more rows
It’s OK, but we can make it much prettier using the gt
package. Let’s do the following:
- remove the WCVP ID column
- change taxon_id to ‘Species’
- make species names italic
- bold the column titles
- reduce the space around the text and make font size 12
- remove the internal borders
- change the 1s and 0s into X and blank
m_gt <- m %>%
select(-plant_name_id) %>% #remove ID col
gt() %>%
cols_label(
taxon_name = "Species"
) %>%
#make species names italic
tab_style(
style=cell_text(style="italic"),
locations = cells_body(
columns= taxon_name
)
) %>%
tab_options(
# some nice formatting
column_labels.font.weight = "bold",
data_row.padding = px(1),
table.font.size = 12,
table_body.hlines.color = "transparent",
) %>%
# change the zeroes into blanks
text_transform(
locations = cells_body(),
fn = function(x){
ifelse(x == 0, "", x)
}
) %>%
# change the 1s into X
text_transform(
locations = cells_body(),
fn = function(x){
ifelse(x == 1, "X", x)
}
)
m_gt
Species | BZC | BZE | BZL | BZN | BZS |
---|---|---|---|---|---|
Myrciaria alagoana | X | ||||
Myrciaria alta | X | ||||
Myrciaria cambuca | X | X | |||
Myrciaria cordata | X | ||||
Myrciaria cuspidata | X | X | X | X | |
Myrciaria delicatula | X | X | X | ||
Myrciaria disticha | X | X | |||
Myrciaria dubia | X | X | |||
Myrciaria evanida | X | ||||
Myrciaria ferruginea | X | X | |||
Myrciaria floribunda | X | X | X | X | X |
Myrciaria glanduliflora | X | ||||
Myrciaria glazioviana | X | X | X | ||
Myrciaria glomerata | X | X | X | ||
Myrciaria guaquiea | X | X | |||
Myrciaria pallida | X | ||||
Myrciaria pilosa | X | X | |||
Myrciaria plinioides | X | ||||
Myrciaria rojasii | X | ||||
Myrciaria strigipes | X | X | |||
Myrciaria tenella | X | X | X | X | X |
Myrciaria una | X | ||||
Myrciaria vismeifolia | X |
Much nicer! We can save this gt
table as an HTML table
or as a picture. If we plan on making a few more tables, we can save
space by saving our table style as a theme (see https://themockup.blog/posts/2020-09-26-functions-and-themes-for-gt-tables/
for more details on this)
occ_mat_theme <- function(x){
x %>% cols_label(
taxon_name = "Species"
) %>%
#make species names italic
tab_style(
style=cell_text(style="italic"),
locations = cells_body(
columns= taxon_name
)
) %>%
tab_options(
# some nice formatting
column_labels.font.weight = "bold",
data_row.padding = px(1),
table.font.size = 12,
table_body.hlines.color = "transparent",
) %>%
# change the zeroes into blanks
text_transform(
locations = cells_body(),
fn = function(x){
ifelse(x == 0, "", x)
}
) %>%
# change the 1s into X
text_transform(
locations = cells_body(),
fn = function(x){
ifelse(x == 1, "X", x)
}
)
}
The biggest issue with gt()
is that it doesn’t support
Word - for exporting directly to a docx file, check out
flextable
(https://ardata-fr.github.io/flextable-book/). ###
Including or excluding occurrence types What if we only want to know
about native or introduced species? This function has the option to
filter for one or the other. Brazilian Myrciaria doesn’t look
very interesting on that front (we can see from the summary table that
only one species is introduced), so let’s look at a more invasive group
- Poa in Northern Europe (Level 2 Region).
wcvp_summary(taxon="Poa", taxon_rank="genus", area=get_wgsrpd3_codes("Northern Europe"),
grouping_var = "area_code_l3") %>%
wcvp_summary_gt()
Poa of Northern Europe | |||||
---|---|---|---|---|---|
Total number of species: 23 Number of regionally endemic species: 0 |
|||||
Native | Endemic | Introduced | Extinct | Total | |
DEN | 11 | 1 | 12 | ||
FIN | 16 | 1 | 17 | ||
FOR | 8 | 8 | |||
GRB | 12 | 3 | 15 | ||
ICE | 9 | 1 | 11 | ||
IRE | 6 | 2 | 8 | ||
NOR | 17 | 1 | 18 | ||
SVA | 7 | 7 | |||
SWE | 17 | 1 | 18 |
m <- wcvp_occ_mat(taxon="Poa", taxon_rank="genus",
area=get_wgsrpd3_codes("Northern Europe"),
introduced=FALSE, extinct=FALSE,
location_doubtful=FALSE)
m
#> # A tibble: 20 x 11
#> plant_name_id taxon_name DEN FIN FOR GRB ICE IRE NOR SVA SWE
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 435004 Poa abbreviata 0 0 0 0 0 0 0 1 0
#> 2 435078 Poa alpigena 0 1 1 0 1 0 1 1 1
#> 3 435085 Poa alpina 0 1 1 1 1 1 1 1 1
#> 4 435167 Poa angustifolia 1 1 0 1 0 0 1 0 1
#> 5 435194 Poa annua 1 1 1 1 1 1 1 0 1
#> 6 435235 Poa arctica 0 1 0 0 0 0 1 1 1
#> 7 435458 Poa bulbosa 1 1 0 1 0 0 1 0 1
#> 8 435622 Poa compressa 1 1 0 0 0 0 1 0 1
#> 9 435932 Poa flexuosa 0 0 0 1 1 0 1 0 1
#> 10 435996 Poa glauca 0 1 1 1 1 0 1 1 1
#> 11 436089 Poa hartzii 0 0 0 0 0 0 0 1 0
#> 12 436146 Poa humilis 1 1 1 1 1 1 1 0 1
#> 13 436189 Poa infirma 0 0 0 1 0 0 0 0 0
#> 14 436383 Poa lindebergii 0 1 0 0 0 0 1 0 1
#> 15 436600 Poa nemoralis 1 1 1 1 1 1 1 0 1
#> 16 436739 Poa palustris 1 1 0 1 0 0 1 0 1
#> 17 436906 Poa pratensis 1 1 1 1 1 1 1 1 1
#> 18 437092 Poa remota 1 1 0 0 0 0 1 0 1
#> 19 437424 Poa supina 1 1 0 0 0 0 1 0 1
#> 20 437547 Poa trivialis 1 1 1 1 1 1 1 0 1
We can format this matrix just like we did above, but let’s skip that
and go straight to introduced species only. We’re doing all the same
formatting as before, but also adding a heading - the html
function makes it possiible to italicise our genus name and
everything!
m <- wcvp_occ_mat(taxon="Poa", taxon_rank="genus",
area=get_wgsrpd3_codes("Northern Europe"),
native=FALSE,
introduced=TRUE, extinct=FALSE,
location_doubtful = FALSE)
m %>%
select(-plant_name_id) %>% #remove ID col
gt() %>%
occ_mat_theme() %>% #the theme we defined above
#add a header
tab_header(title=html("Introduced <em>Poa</em> species in Northern Europe"))
Introduced Poa species in Northern Europe | |||||||||
---|---|---|---|---|---|---|---|---|---|
Species | DEN | FIN | FOR | GRB | ICE | IRE | NOR | SVA | SWE |
Poa angustifolia | X | ||||||||
Poa chaixii | X | X | X | X | X | ||||
Poa compressa | X | ||||||||
Poa flabellata | X | ||||||||
Poa palustris | X | ||||||||
Poa persica | X |
Bonus: adding a country spanner
Tables created with gt
are extremely flexible - let’s
say we want to look at occurrences across the US-Canadian border:
m <- wcvp_occ_mat("Fritillaria", "genus",
area=c("WAS", "ORE", "IDA","MNT", "ABT", "BRC"))
m_gt <- m %>%
select(-plant_name_id) %>% #remove ID col
gt() %>%
occ_mat_theme() %>% #the theme we defined above
#add a header
tab_header(title=html("<em>Fritillaria</em> species in Northwest USA and Southwest Canada"))
m_gt
Fritillaria species in Northwest USA and Southwest Canada | ||||||
---|---|---|---|---|---|---|
Species | ABT | BRC | IDA | MNT | ORE | WAS |
Fritillaria affinis | X | X | X | X | X | |
Fritillaria atropurpurea | X | X | X | |||
Fritillaria camschatcensis | X | X | X | |||
Fritillaria eastwoodiae | X | |||||
Fritillaria gentneri | X | |||||
Fritillaria glauca | X | |||||
Fritillaria pudica | X | X | X | X | X | X |
Fritillaria purdyi | X | |||||
Fritillaria recurva | X |
It would be really useful to know which of those codes are in the US
and which are in Canada. We could use the data included in
rWCVP
to create a key.
wgsrpd_mapping %>%
filter(LEVEL3_COD %in% c("WAS", "ORE", "IDA","MNT", "ABT", "BRC")) %>%
select(LEVEL3_NAM, LEVEL3_COD, COUNTRY) %>%
gt() %>%
#some formatting
tab_options(
column_labels.font.weight = "bold",
data_row.padding = px(1),
table.font.size = 12,
table_body.hlines.color = "transparent",
)
LEVEL3_NAM | LEVEL3_COD | COUNTRY |
---|---|---|
Alberta | ABT | Canada |
British Columbia | BRC | Canada |
Idaho | IDA | United States |
Montana | MNT | United States |
Oregon | ORE | United States |
Washington | WAS | United States |
tab_spanner()
:
m_gt %>%
tab_spanner(label="United States",
columns = c(IDA, MNT, ORE, WAS)) %>%
tab_spanner(label="Canada",
columns=c(ABT, BRC))
Fritillaria species in Northwest USA and Southwest Canada | ||||||
---|---|---|---|---|---|---|
Species | Canada | United States | ||||
ABT | BRC | IDA | MNT | ORE | WAS | |
Fritillaria affinis | X | X | X | X | X | |
Fritillaria atropurpurea | X | X | X | |||
Fritillaria camschatcensis | X | X | X | |||
Fritillaria eastwoodiae | X | |||||
Fritillaria gentneri | X | |||||
Fritillaria glauca | X | |||||
Fritillaria pudica | X | X | X | X | X | X |
Fritillaria purdyi | X | |||||
Fritillaria recurva | X |
There is a lot more that can be done with gt
- see https://gt.rstudio.com/
for help, examples and documentation.