taxize
to get taxonomic informationIn this example, I use the taxize
package to:
This is a very simple (but very useful) functionality of taxize
: you can look up the family name for a given genus. For our study, we wanted to calculate the probability of mislabeling by family instead of by genus.
## X Genus Prob
## 1 1 Acanthocybium 1.0000000
## 2 2 Acipenser 0.8235294
## 3 13 Arapaima 0.0000000
## 4 14 Argyrops 0.0000000
## 5 15 Argyrosomus 0.7411765
## 6 16 Argyrozona 0.1666667
This is what the data look like, a list of probabilities of mislabeling, by genus. Use taxize
to get the family name for each of these genera:
library(dplyr)
library(taxize)
mislabel.by.fam <- mislabel.by.genus[1:10,] %>% # subsample bc whole dataset is time-consuming
mutate(fam = tax_name(query = as.character(Genus),
get = "family",
db = "itis")$family)
head(mislabel.by.fam)
## X Genus Prob fam
## 1 1 Acanthocybium 1.0000000 Scombridae
## 2 2 Acipenser 0.8235294 Acipenseridae
## 3 13 Arapaima 0.0000000 Osteoglossidae
## 4 14 Argyrops 0.0000000 Sparidae
## 5 15 Argyrosomus 0.7411765 Sciaenidae
## 6 16 Argyrozona 0.1666667 Sparidae
Now we can plot the median probability of mislabeling by family:
library(ggplot2)
mislabel.by.fam %>%
group_by(fam) %>%
summarize(mProb = median(Prob)) %>%
mutate(orderedfam = factor(fam,
levels = fam[order(mProb)])) %>%
ggplot(aes(x=orderedfam,y=mProb)) +
geom_point(size=4,colour='grey') +
theme_classic(base_size=14) +
labs(x="Family",y="Median P(mislabeled)") +
coord_flip()
This functionality we used in order to figure out whether, on average, the species written on the label (or menu) of seafood products was a higher conservation status (i.e., “more sustainable”) or a lower (more endangered) conservation status.
####Getting a token for the IUCN API In order to use iucn_summary()
you need an API token. You can get one here. I put mine in a separate file and just use source()
to read it in.
####Look up an IUCN status Once you have an API token, getting IUCN statuses is a cinch! It can take a while:
source("IUCN_API.R") #file with API token
st <- Sys.time()
iucn_summary('Epinephelus diacanthus',key = IUCN_key)
## $`Epinephelus diacanthus`
## $`Epinephelus diacanthus`$status
## [1] "LC"
##
## $`Epinephelus diacanthus`$history
## year code category
## 1 2018 LC Least Concern
## 2 2008 NT Near Threatened
##
## $`Epinephelus diacanthus`$distr
## [1] NA
##
## $`Epinephelus diacanthus`$trend
## [1] NA
##
##
## attr(,"class")
## [1] "iucn_summary"
## Time difference of 1.354387 secs
####Lookup several IUCN statuses
## Sci.labels Sci.actuals Mislabeled
## 1 Arapaima gigas Arapaima gigas 0
## 2 Brachyplatystoma rousseauxii Brachyplatystoma rousseauxii 0
## 3 Argyrops spinifer Argyrops sp 0
## 4 Cheimerius nufar Cheimerius nufar 0
## 5 Anabas testudineus Anabas testudineus 0
## 6 Clarias fuscus Clarias sp 0
## Generally.labeled Country.of.sample Loc Study Prop N
## 1 NA Brazil Manaus and Novo Airao Study A NA 5
## 2 NA Brazil Manaus and Novo Airao Study A NA 5
## 3 NA Italy <NA> Study B 1 2
## 4 NA Italy <NA> Study B 1 4
## 5 NA Italy Pisa or Prato Study B 1 1
## 6 NA Italy Pisa or Prato Study B 1 1
spps <- data.frame(species = unique(c(as.character(rawdata$Sci.labels),
as.character(rawdata$Sci.actuals))),
IUCNstatus = NA)
for(i in 1:nrow(spps)){
s <- try(iucn_summary(as.character(spps$species[i]),key=IUCN_key))
spps[i,'IUCNstatus'] <- iucn_status(s)
}
head(spps)
## species IUCNstatus
## 1 Arapaima gigas DD
## 2 Brachyplatystoma rousseauxii LC
## 3 Argyrops spinifer LC
## 4 Cheimerius nufar DD
## 5 Anabas testudineus DD
## 6 Clarias fuscus LC