SERP avatar

SERP

Try for free

2 hours trial then $5.55/month - No credit card required now

Go to Store
SERP

SERP

yuriy_chistyakov/serp
Try for free

2 hours trial then $5.55/month - No credit card required now

Images are considered as a valuable field of Google Search Engine Results Page (SERP) and always added to Dataset if any.

How it works

This code is a JavaScript script that uses Cheerio to scrape data from Google Search Engine Results Pages.

  • The crawler starts with startUrl https://www.google.com/search?q=${q}&start=0&${num}&lr=lang_en&hl=en where "q" and "num" parameters provided from the input q and num fields defined by the input schema. Any spaces in the q field being encoded to plus(+) automatically.
  • The crawler uses requestHandler for each URL to extract the data from the page with the Cheerio library and, while the results array is not empty, saves the url, pageTitle and results of each page to the Dataset, otherwise Actor exits. Every result in array consists of site, link, title, description, image. After the data has been saved, the crawler enqueues request to the next page with the "start" parameter increased by the value of "num".
  • Number of scraped pages is limited by Max Requests per Crawl field from the input schema. It also logs out the url of each page visited.

Included features

  • Serpifier - a free tool I created for visualizing datasets produced by SERP

Let's take a closer look at how it works using serpifier

  • By entering a search query and leaving the num and maxRequestsPerCrawl parameters unchanged, we get the following graph:

num: 100, maxRequestsPerCrawl: 100

  • Now let's limit the number of results by setting num to 10 and maxRequestsPerCrawl to 10. That is, 10 pages with a maximum of 10 results per page:

num: 10, maxRequestsPerCrawl: 10

  • And here is the result with the maxRequestsPerCrawl set to 20. The graph will look something like this:

num: 10, maxRequestsPerCrawl: 20 In fact, we don’t know the number of search results in advance, so we’re just playing with the parameters.

  • Curious, how looks the graph with num: 2 and maxRequestsPerCrawl: 100?

num: 2, maxRequestsPerCrawl: 100

  • By setting a sitesearch input parameter to "apollographql.com", we will get a graph similar to the one shown below:

sitesearch: "apollographql.com"

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No stars yet

  • >99% runs succeeded

  • Created in Jul 2024

  • Modified 5 months ago