Interested in our work?

Thanks for your interest in the Leaf Lab. I am interested in recruiting strong Masters and Ph.D. candidates with an interest and enthusiasm for using quantitative methods. Current students in the lab are engaged in a range of fisheries and ecological research. I generally accept students only when I have funding available, and advertise these available positions on the AFS job opportunities website. The Division of Coastal Sciences does offer funded graduate fellowships for exceptional students.

If you are interested in learning more about the work done in my lab your first step is to spend some time on this website and the Division of Coastal Sciences and USM Graduate School’s website to learn more about graduate study at USM.

If you are interested in applying for a position in the lab, contact me by email (robert.leaf@usm.edu) and send:

  1. Cover letter (one page that details relevant and specific information about your experiences and abilities beyond those itemized in your CV).
  2. CV –include your GRE scores (unofficial are fine but percentile scores must be included for each) and contact info
  3. Legible copies of transcripts (Unofficial or Official).

Include these in a single PDF file labeled with your last and first name; example: “Leaf_Robert_application_packet.PDF”. Sending a file titled “CV.doc” or “research statement.PDF” is a guarantee that it will be lost or overwritten.

The application process:

  1. Your first step will be for you to decide if the program is right for you by looking at this website and the website of the Division of Coastal Sciences at USM. You may want to contact some current graduate students to discuss with them their experience in the laboratory (highly recommended).
  2. If you are interested please submit the requested single PDF file containing your cover letter, CV, and transcripts. Give me an opportunity to look at your packet.
  3. I will contact you by email and we can arrange a visit to USM.
  4. We should work together to determine if the program is right for you and to identify whether it is appropriate for you to apply to the graduate school at USM.
  5. Faculty at USM’s Div. of Coastal Sciences will generally not admit students to the program without arranging sponsorship by an advisor – applying to the Div. of Coastal Sciences without talking with an advisor and also understanding the availability of funding is a waste of your valuable time and energy.

Dependency Installation in R

Re: Problem using Levene’s Test in the “car” package

Nour Salam, MSc, Graduate research assistant, Division of Coastal Sciences

I had the package ‘car’ installed, but every time I tried running the Levene test, the error message was that I did not have one of the dependencies of ‘car’ installed. That dependency is called ‘Rcpp’. When I tried installing ‘Rcpp’ using install.packages(‘Rcpp’), the installation would run but I think that ‘Rcpp’ was not installed in the right directory with all the other packages.

The magic code that solved this:

install.packages(“Rcpp”, repos=c(“http://rstudio.org/_packages“, “http://cran.rstudio.com“))

The Levene test worked after that.

FishBase “Web Scraping”

Meg Oshima, Grant Adams, Robert Leaf
July 21, 2016

Many websites have databases that are not available in an easily accessible form (e.g. .txt, .csv etc.). In our work we were interested in obtaining data that was nested within several layers of web pages and required multiple steps of web scraping.

FishBase is a global online database of species data for finfish. From FishBase we collected growth data (L-infinity, k, and t0) for hundreds of species. On FishBase, after selecting a species, the first page provides a table of values from every study available.

Growth tables

From there, clicking on the parameter value k leads to a page of more detail for that particular study and provides the data reference ID number that accesses the page with the actual citation for the data.

detail table
citation table

The full citation was “scraped”” from fishbase.org using the ‘rvest’ R package. We started with a .csv file containing a vector of the URL addresses (I have attached an example spreadsheet).

library(rvest)
library(dplyr)
library(scales)
library(stringr)
fish.df <- read.csv(file = "r.bloggers.fb.example.csv", header = T, sep = ",")
head(fish.df)
##                    CommonName      Family     Genus   Species Linf..cm.
## 1 Emperor, Pacific yellowtail Lethrinidae Lethrinus atkinsoni      41.3
## 2 Emperor, Pacific yellowtail Lethrinidae Lethrinus atkinsoni      41.4
## 3 Emperor, Pacific yellowtail Lethrinidae Lethrinus atkinsoni      42.8
## 4           Emperor, spangled Lethrinidae Lethrinus nebulosus      50.2
## 5           Emperor, spangled Lethrinidae Lethrinus nebulosus      61.4
##   Length.Type    K t0..years.
## 1          SL 0.31         NA
## 2          SL 0.31         NA
## 3          SL 0.29         NA
## 4          SL 0.21         NA
## 5          FL 0.34         NA
##                                                                                                                                                                                                                                                         URL
## 1 http://www.fishbase.us/PopDyn/FishPopGrowthSummary.php?ID=2050&pref=2291&sex=unsexed&loo=41.30000&k=0.31000&id2=1854&genusname=Lethrinus&speciesname=atkinsoni&fc=328&gm_loo=39.331044377911&gm_lm=1&gm_m=1&gm_k=0.2998332870113&vautoctr=3307&gm_lm_rl=1
## 2 http://www.fishbase.us/PopDyn/FishPopGrowthSummary.php?ID=2050&pref=2291&sex=unsexed&loo=41.40000&k=0.31000&id2=1854&genusname=Lethrinus&speciesname=atkinsoni&fc=328&gm_loo=39.331044377911&gm_lm=1&gm_m=1&gm_k=0.2998332870113&vautoctr=3308&gm_lm_rl=1
## 3 http://www.fishbase.us/PopDyn/FishPopGrowthSummary.php?ID=2050&pref=2291&sex=unsexed&loo=42.80000&k=0.29000&id2=1854&genusname=Lethrinus&speciesname=atkinsoni&fc=328&gm_loo=39.331044377911&gm_lm=1&gm_m=1&gm_k=0.2998332870113&vautoctr=3309&gm_lm_rl=1
## 4 http://www.fishbase.us/PopDyn/FishPopGrowthSummary.php?ID=2042&pref=2291&sex=unsexed&loo=50.20000&k=0.21000&id2=1854&genusname=Lethrinus&speciesname=nebulosus&fc=328&gm_loo=39.331044377911&gm_lm=1&gm_m=1&gm_k=0.2998332870113&vautoctr=3278&gm_lm_rl=1
## 5 http://www.fishbase.us/PopDyn/FishPopGrowthSummary.php?ID=2042&pref=3679&sex=unsexed&loo=61.40000&k=0.34000&id2=1854&genusname=Lethrinus&speciesname=nebulosus&fc=328&gm_loo=39.331044377911&gm_lm=1&gm_m=1&gm_k=0.2998332870113&vautoctr=3289&gm_lm_rl=1

Then, an object called “web.id” for each entry was created that opened the html code. The reference ID was extracted and put into a new column in our data frame.

fish.df$Data_ref <- NA
for (i in 1:nrow(fish.df)){
  #created a web ID with the URL from fish.df$URL
  web.id <- as.character(fish.df[i,"URL"]) 
  #opened the html code for the webpage 
  web.html <- read_html(web.id) 
  #pulled information from the table on the webpage
  species.id <- as.character(html_attr(html_nodes(web.html, "a"), "href"))
  #pulled out line with data reference number
  species.id <- species.id[grep("References", species.id)[1]] 
  #split after ID= and called the number (2nd object)
  species.id <- strsplit(species.id, "ID=")[[1]][2]
  #inserted data reference number into the new column in fish.df
  fish.df$Data_ref[i] <- species.id 
  print(i)
}

Next, a for() loop was used to create a unique web address for each entry based on the data reference number. The information stored in the table was retrieved and only the citation was isolated and added into a new column of the data frame. Some of the web pages had a “.org” address and the others had a “.com” address, therefore, two loops were created, if the webpage wasn’t found with a “.com” address, it was redone with a “.org” address.

fish.df$Citation<-NA

for (i in 1:nrow(fish.df)){
  # http://www.fishbase.org/References/FBRefSummary.php?ID=105366 
  #created the URL that leads to the page with the citation 
  #(general address + data_ref number)
  citation.web.id <- paste("http://www.fishbase.us/references/FBRefSummary.php?ID=",fish.df$Data_ref[i], sep ="") 
  #read information from the table on the webpage
  web.html <- read_html(citation.web.id) 
  # lets you skip errors #get text from the table, including the full citation
  citation.table <- tryCatch(web.html %>% html_nodes("table") %>% .[[1]] %>% html_table(fill=T), error = function(e)"Redo_with_.org_fishbase_site") 
  #isolates the citation
  citation <- tryCatch(citation.table[1,2], error = function(e)"Redo_with_.org_fishbase_site")
  #adds into the new column in fish.df
  fish.df$Citation[i]<-citation 
  print(i)
}
for (i in which( fish.df$Citation %in% "Redo_with_.org_fishbase_site")){
  citation.web.id <- paste("http://www.fishbase.org/references/FBRefSummary.php?ID=",fish.df$Data_ref[i], sep ="")
  web.html <- read_html(citation.web.id)
  citation.table <- web.html %>% html_nodes("table") %>% .[[1]] %>% html_table(fill=T)
  citation <- citation.table[1,2]
  fish.df$Citation[i]<-citation
}

Lastly, to isolate the year that the study was published, all non-number characters were removed and a strsplit isolated the second number (the year). That value was added to a new column in the data frame.

fish.df$Year<-NA

for (i in 1:nrow(fish.df)){
  
fish.df$Year[i] <- strsplit(unlist(fish.df$Citation[i]), "[^0-9]+")[[1]][2]
#removed all non-number characters and split the numbers, 
#then called the 2nd number in the list which was the year 
  
 }

With the rvest package and this code, we quickly retrieved text that normally takes several steps to access and added it into our dataframe. Author contact information: megumi.oshima@usm.edu, robert.leaf@usm.edu, grant.adams@eagles.usm.edu