Match names to WCVP, first using exact matching and then using fuzzy matching on any remaining unmatched names.
Usage
wcvp_match_names(
names_df,
wcvp_names = NULL,
name_col = NULL,
id_col = NULL,
author_col = NULL,
join_cols = NULL,
fuzzy = TRUE,
progress_bar = TRUE
)
Arguments
- names_df
Data frame of names for matching.
- wcvp_names
Data frame of taxonomic names from WCVP version 7 or later. If
NULL
(the default), names will be loaded fromrWCVPdata::wcvp_names
.- name_col
Character. The column in
names_df
that has the taxon name for matching.- id_col
Character. A column in
names_df
with a unique ID for each name. Will be created from the row number if not provided.- author_col
the column in
names_df
that has the name authority, to aid matching. Set toNULL
to match with no author string.- join_cols
Character. A vector of name parts to make the taxon name, if
name_col
is not provided.- fuzzy
Logical; whether or not fuzzy matching should be used for names that could not be matched exactly.
- progress_bar
Logical. Show progress bar when matching? Defaults to
TRUE
; should be changed toFALSE
if used in a markdown report.
Details
By default, exact matching uses only the taxon name (supplied by name_col
)
unless a column specifying the author string is provided (as author_col
).
Columns setting out name parts can be supplied as join_cols
in place of a
taxon name, but must be supplied in the order you want them joined
(e.g. c("genus", "species", "infra_rank", "infra")
).
Fuzzy matching uses a combination of phonetic and edit distance matching,
and can optionally be turned off using fuzzy=FALSE
.
The WCVP can be loaded for matching from rWCVPdata::wcvp_names
.
See here for an example workflow.
See also
Other name matching functions:
wcvp_match_exact()
,
wcvp_match_fuzzy()
Examples
# these examples require 'rWCVPdata'
if(requireNamespace("rWCVPdata")){
wcvp_names <- rWCVPdata::wcvp_names
# without author
wcvp_match_names(redlist_example, wcvp_names,
name_col = "scientificName",
id_col = "assessmentId"
)
# with author
wcvp_match_names(redlist_example, wcvp_names,
name_col = "scientificName",
id_col = "assessmentId", author_col = "authority"
)
}
#>
#> ── Matching names to WCVP ──────────────────────────────────────────────────────
#> ℹ Using the `scientificName` column
#> ! No author information supplied - matching on taxon name only
#>
#> ── Exact matching names ──
#>
#> ── Fuzzy matching 8 names ──
#>
#> Matching ■■■■■■■ 20% ETA 19s
#> Matching ■■■■■■■■■■■■■ 40% ETA 10s
#> Matching ■■■■■■■■■■■■■■■■■■■ 60% ETA 8s
#> Matching ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% ETA 4s
#> ── Matching complete! ──
#>
#> ✔ Matched 19 of names
#> ℹ Exact (without author): 12
#> ℹ Fuzzy (edit distance): 4
#> ℹ Fuzzy (phonetic): 3
#> ! Names with multiple matches: 3
#>
#> ── Matching names to WCVP ──────────────────────────────────────────────────────
#> ℹ Using the `scientificName` column
#> ℹ Also using the `authority` column
#>
#> ── Exact matching 20 names ──
#>
#> ── Fuzzy matching 8 names ──
#>
#> Matching ■■■■■■■ 20% ETA 10s
#> Matching ■■■■■■■■■■■■■ 40% ETA 9s
#> Matching ■■■■■■■■■■■■■■■■■■■ 60% ETA 8s
#> Matching ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% ETA 3s
#> ── Matching complete! ──
#>
#> ✔ Matched 19 of 20 names
#> ℹ Exact (with author): 6
#> ℹ Exact (without author): 6
#> ℹ Fuzzy (edit distance): 4
#> ℹ Fuzzy (phonetic): 3
#> ! Names with multiple matches: 2
#> # A tibble: 22 × 18
#> assessmentId scientificName redlistCategory authority match_type
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 11081542 Antimima quartzitica Least Concern (Dinter) H.… Fuzzy (ed…
#> 2 19395021 Avena hybrida Data Deficient Peterm. Exact (wi…
#> 3 64135503 Citrus garrawayi Least Concern F.M.Bailey Fuzzy (ph…
#> 4 189601563 Croton campanulatus Endangered Caruzo & Co… Exact (wi…
#> 5 115968141 Cynanchum freemani Endangered (N.E.Br.) W… Fuzzy (ph…
#> 6 11047751 Echinacanthus longipes Vulnerable H.S.Lo & D.… Exact (wi…
#> 7 11001316 Geissanthus pinchinchana Endangered (Lundell) P… Fuzzy (ed…
#> 8 126598076 Juglans pyriformis Endangered Liebm. Exact (wi…
#> 9 198678856 Leichhardtia variifolia Vulnerable (Guillaumin… Exact (wi…
#> 10 135836392 Mouriri myrtilloides Least Concern (Sw.) Poir. Exact (wi…
#> # ℹ 12 more rows
#> # ℹ 13 more variables: multiple_matches <lgl>, match_similarity <dbl>,
#> # match_edit_distance <dbl>, wcvp_id <dbl>, wcvp_name <chr>,
#> # wcvp_authors <chr>, wcvp_rank <chr>, wcvp_status <chr>,
#> # wcvp_homotypic <lgl>, wcvp_ipni_id <chr>, wcvp_accepted_id <dbl>,
#> # wcvp_author_edit_distance <dbl>, wcvp_author_lcs <int>