Member-only story

Data Crawler

Crawl data from sites in Go

for your specific needs.

Donald Le

--

Photo by Robert Anasch on Unsplash

To get along, what we will need to prepare is :

  • Go
  • IDE (Goland) or any Code Editor like Visual Studio
  • goquery (library)

The example site we will interact with is https://www.ncbi.nlm.nih.gov. This site is a platform for bioinformatics. We will crawl some gene data.

Because the data is presented in static html so we don’t actually need to trigger a browser to get javascript calling. We only need to send http request to the endpoint using native http in Go.

resp, _ := client.Get("https://www.ncbi.nlm.nih.gov/gene/" + geneId)

The goquery lib support for finding element information by selector.

doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatal(err)
}
geneOfficialSymbol := doc.Find("#summaryDl > dd.noline").Contents().Text()

--

--

No responses yet