Dealing with country namesSource:
is_country() allows to check whether a
string is a country name. The argument
fuzzy_mmatch can be
used to increase tolerance and allow for small typos in the names.
is_country(c("United States","Unated States","dot","DNK",123), fuzzy_match = FALSE) # FALSE is the default and will run faster #>  TRUE FALSE FALSE TRUE FALSE is_country(c("United States","Unated States","dot","DNK",123), fuzzy_match = TRUE) #>  TRUE TRUE FALSE TRUE FALSE
is_country() can also be used to check for
a specific subset of countries. In the following example, the function
is used to test whether the string relates to India or Sri Lanka, while
allowing for different naming conventions and languages.
is_country(x=c("Ceylon","LKA","Indonesia","Inde"), check_for=c("India","Sri Lanka")) #>  TRUE TRUE FALSE TRUE
Finally, the package also provides the function
find_countrycol(), which can be used to find which columns
in a data frame contain country names.
country_name() can be used to convert
country names to different naming conventions or to translate them to
example <- c("United States","DR Congo", "Morocco") # Getting 3-letters ISO code country_name(x= example, to="ISO3") #>  "USA" "COD" "MAR" # Translating to Spanish country_name(x= example, to="name_es") #>  "Estados Unidos" "República Democrática del Congo" #>  "Marruecos"
If multiple arguments are passed to the argument
function will output a
data.frame object, with one column
corresponding to every naming convention.
# Requesting 2-letter ISO codes and translation to Spanish and French country_name(x= example, to=c("ISO2","name_es","name_fr")) #> ISO2 name_es name_fr #> 1 US Estados Unidos États-Unis #> 2 CD República Democrática del Congo République démocratique du Congo #> 3 MA Marruecos Maroc
to argument supports all the following naming
|simple||This is a simple english version of the name containing only ASCII characters. This nomenclature is available for all countries.|
|ISO3||3-letter country codes as defined in ISO standard
|ISO2||2-letter country codes as defined in ISO standard
|ISO_code||Numeric country codes as defined in ISO standard
|UN_xx||Official UN name in 6 official UN languages. Arabic
|WTO_xx||Official WTO name in 3 official WTO languages: English
|name_xx||Translation of ISO country names in 28 different
languages: Arabic (
|GTAP||GTAP country and region codes.|
|all||Converts to all the nomenclatures and languages in this table|
country_name() can identify countries even when they are
provided in mixed formats or in different languages. It is robust to
small misspellings and recognises many alternative country names and old
fuzzy_example <- c("US","C@ète d^Ivoire","Zaire","FYROM","Estados Unidos","ITA") country_name(x= fuzzy_example, to=c("UN_en")) #> Multiple country IDs have been matched to the same country name #> #> Set - verbose - to TRUE for more details #>  "United States of America" "Côte d’Ivoire" #>  "Democratic Republic of the Congo" "North Macedonia" #>  "United States of America" "Italy"
More information on the country matching process can be obtained by
verbose=TRUE. The function will print information
- The number of unique values provided by the user. In the example below 6 distinct strings have been provided.
- How many country names correspond exactly to the ones in the
function’s reference list and how many have been recognised with fuzzy
matching. In the example below,
"C@ète d^Ivoire"is the only name recognised with fuzzy matching. The function’s reference table can be accessed with the command
- The function prints summary statistics on fuzzy matching. The
DISTANCE metric is the Jaro-Winkler
distance between the provided string (
"C@ète d^Ivoire") and the closest reference (
"Côte d'Ivoire"). Lower DISTANCE statistics indicate more reliable fuzzy matching.
country_name(x= fuzzy_example, to=c("UN_en"), verbose=TRUE) #> #> In total 6 unique country names were provided #> 5/6 have been matched with EXACT matching #> 1/6 have been matched with FUZZY matching #> #> #> Multiple arguments have been matched to the same country name: #> - Estados Unidos : United States of America #> - US : United States of America #>  "United States of America" "Côte d’Ivoire" #>  "Democratic Republic of the Congo" "North Macedonia" #>  "United States of America" "Italy"
In addition, setting
verbose=TRUE will also print
additional informations relating to specific warnings that are normally
given by the function:
Multiple country IDs have been matched to the same country name: This warning is issued if multiple strings have been matched to the same country. In verbose mode, the strings and corresponding countries will be listed. In the example above, both
"Estados Unidos"are matched to the same country. If the vector of country names is a unique identifier, this could indicate that some country name was not recognised correctly. The user might consider using custom tables (refer to the next section).
Some country IDs have no match in one or more country naming conventions: indicates that it is impossible to find an exact match for one or more country names with
fuzzy_match=FALSE. The user might consider using
fuzzy_match=TRUEor custom tables (refer to the next section).
There is low confidence on the matching of some country names: This warning indicates that some strings have been matched poorly. Thus indicating that the country might have been misidentified. In verbose mode the function will provide a list of problematic strings (see the example below). If
poor_matchesis set to
FALSE(the default), the function will return
NAfor these uncertain string. On the other hand, if
poor_matches=TRUEthe function will always return the closest match, even if poor. The user might consider using custom tables to solve issues with misidentification of country names (refer to the next section).
Some country IDs have no match in one or more country naming conventions: Conversion is requested to a nomenclature for which there is no information on the country. For instance, in the example below “Taiwan” has no correspondence in the UN M49 standard. In verbose mode, the function will print all the country names affected by this problem. The user might consider using custom tables to solve this type of issues (refer to the next section).
country_name(x= c("Taiwan","lsajdèd"), to=c("UN_en"), verbose=FALSE) #> Some country IDs have no match in one or more country naming conventions #> There is low confidence on the matching of some country names #> #> Set - verbose - to TRUE for more details #>  NA NA
All the information from verbose mode can be accessed by setting ´simplify=FALSE´. This will return a list object containing:
converted_data: the normal output of the function
match_table: the conversion table with information on the closest match for each country name and distance metrics.
summary: summary values for the distance metrics
warning: logical value indicating whether a warning is issued by the function
call: the arguments passed by the user
In some cases, the user might be unhappy with the naming conversion
or no valid conversion might exist for the provided territory. In these
cases, it might be useful to tweak the conversion table. The package
contains a utility function called
match_table(), which can
be used to generate conversion tables for small adjustments.
example_custom <- c("Siam","Burma","H#@°)Koe2") #suppose we are unhappy with how "H#@°)Koe2" is interpreted by the function country_name(x = example_custom, to = "name_en") #> There is low confidence on the matching of some country names #> #> Set - verbose - to TRUE for more details #>  "Thailand" "Myanmar" NA #match_table can be used to generate a table for small adjustments tab <- match_table(x = example_custom, to = "name_en") #> There is low confidence on the matching of some country names tab$name_en <- "Hong Kong" #which can then be used for conversion country_name(x = example_custom, to = "name_en", custom_table = tab) #>  "Thailand" "Myanmar" "Hong Kong"