Member-only story
Data Crawler
Crawl data from sites in Go
To get along, what we will need to prepare is :
- Go
- IDE (Goland) or any Code Editor like Visual Studio
- goquery (library)
The example site we will interact with is https://www.ncbi.nlm.nih.gov. This site is a platform for bioinformatics. We will crawl some gene data.
Because the data is presented in static html so we don’t actually need to trigger a browser to get javascript calling. We only need to send http request to the endpoint using native http in Go.
resp, _ := client.Get("https://www.ncbi.nlm.nih.gov/gene/" + geneId)
The goquery lib support for finding element information by selector.
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatal(err)
}
geneOfficialSymbol := doc.Find("#summaryDl > dd.noline").Contents().Text()