Skip to contents

Match names to WCVP, first using exact matching and then using fuzzy matching on any remaining unmatched names.

Usage

wcvp_match_names(
  names_df,
  wcvp_names = NULL,
  name_col = NULL,
  id_col = NULL,
  author_col = NULL,
  join_cols = NULL,
  fuzzy = TRUE,
  progress_bar = TRUE
)

Arguments

names_df

Data frame of names for matching.

wcvp_names

Data frame of taxonomic names from WCVP version 7 or later. If NULL (the default), names will be loaded from rWCVPdata::wcvp_names.

name_col

Character. The column in names_df that has the taxon name for matching.

id_col

Character. A column in names_df with a unique ID for each name. Will be created from the row number if not provided.

author_col

the column in names_df that has the name authority, to aid matching. Set to NULL to match with no author string.

join_cols

Character. A vector of name parts to make the taxon name, if name_col is not provided.

fuzzy

Logical; whether or not fuzzy matching should be used for names that could not be matched exactly.

progress_bar

Logical. Show progress bar when matching? Defaults to TRUE; should be changed to FALSE if used in a markdown report.

Value

Match results from WCVP bound to the original data from names_df.

Details

By default, exact matching uses only the taxon name (supplied by name_col) unless a column specifying the author string is provided (as author_col).

Columns setting out name parts can be supplied as join_cols in place of a taxon name, but must be supplied in the order you want them joined (e.g. c("genus", "species", "infra_rank", "infra")).

Fuzzy matching uses a combination of phonetic and edit distance matching, and can optionally be turned off using fuzzy=FALSE.

The WCVP can be loaded for matching from rWCVPdata::wcvp_names.

See here for an example workflow.

See also

Other name matching functions: wcvp_match_exact(), wcvp_match_fuzzy()

Examples

 # these examples require 'rWCVPdata'
if(requireNamespace("rWCVPdata")){
wcvp_names <- rWCVPdata::wcvp_names

# without author
wcvp_match_names(redlist_example, wcvp_names,
  name_col = "scientificName",
  id_col = "assessmentId"
)

# with author
wcvp_match_names(redlist_example, wcvp_names,
  name_col = "scientificName",
  id_col = "assessmentId", author_col = "authority"
)
}
#> 
#> ── Matching names to WCVP ──────────────────────────────────────────────────────
#>  Using the `scientificName` column
#> ! No author information supplied - matching on taxon name only
#> 
#> ── Exact matching  names ──
#> 
#> ── Fuzzy matching 8 names ──
#> 
#> Matching ■■■■■■■                           20% ETA 19s
#> Matching ■■■■■■■■■■■■■                     40% ETA 10s
#> Matching ■■■■■■■■■■■■■■■■■■■               60% ETA  8s
#> Matching ■■■■■■■■■■■■■■■■■■■■■■■■■         80% ETA  4s
#> ── Matching complete! ──
#> 
#>  Matched 19 of  names
#>  Exact (without author): 12
#>  Fuzzy (edit distance): 4
#>  Fuzzy (phonetic): 3
#> ! Names with multiple matches: 3
#> 
#> ── Matching names to WCVP ──────────────────────────────────────────────────────
#>  Using the `scientificName` column
#>  Also using the `authority` column
#> 
#> ── Exact matching 20 names ──
#> 
#> ── Fuzzy matching 8 names ──
#> 
#> Matching ■■■■■■■                           20% ETA 10s
#> Matching ■■■■■■■■■■■■■                     40% ETA  9s
#> Matching ■■■■■■■■■■■■■■■■■■■               60% ETA  8s
#> Matching ■■■■■■■■■■■■■■■■■■■■■■■■■         80% ETA  3s
#> ── Matching complete! ──
#> 
#>  Matched 19 of 20 names
#>  Exact (with author): 6
#>  Exact (without author): 6
#>  Fuzzy (edit distance): 4
#>  Fuzzy (phonetic): 3
#> ! Names with multiple matches: 2
#> # A tibble: 22 × 18
#>    assessmentId scientificName           redlistCategory authority    match_type
#>           <dbl> <chr>                    <chr>           <chr>        <chr>     
#>  1     11081542 Antimima quartzitica     Least Concern   (Dinter) H.… Fuzzy (ed…
#>  2     19395021 Avena hybrida            Data Deficient  Peterm.      Exact (wi…
#>  3     64135503 Citrus garrawayi         Least Concern   F.M.Bailey   Fuzzy (ph…
#>  4    189601563 Croton campanulatus      Endangered      Caruzo & Co… Exact (wi…
#>  5    115968141 Cynanchum freemani       Endangered      (N.E.Br.) W… Fuzzy (ph…
#>  6     11047751 Echinacanthus longipes   Vulnerable      H.S.Lo & D.… Exact (wi…
#>  7     11001316 Geissanthus pinchinchana Endangered      (Lundell) P… Fuzzy (ed…
#>  8    126598076 Juglans pyriformis       Endangered      Liebm.       Exact (wi…
#>  9    198678856 Leichhardtia variifolia  Vulnerable      (Guillaumin… Exact (wi…
#> 10    135836392 Mouriri myrtilloides     Least Concern   (Sw.) Poir.  Exact (wi…
#> # ℹ 12 more rows
#> # ℹ 13 more variables: multiple_matches <lgl>, match_similarity <dbl>,
#> #   match_edit_distance <dbl>, wcvp_id <dbl>, wcvp_name <chr>,
#> #   wcvp_authors <chr>, wcvp_rank <chr>, wcvp_status <chr>,
#> #   wcvp_homotypic <lgl>, wcvp_ipni_id <chr>, wcvp_accepted_id <dbl>,
#> #   wcvp_author_edit_distance <dbl>, wcvp_author_lcs <int>