SERP
Try for free
2 hours trial then $5.55/month - No credit card required now
Go to Store
SERP
yuriy_chistyakov/serp
Try for free
2 hours trial then $5.55/month - No credit card required now
Images are considered as a valuable field of Google Search Engine Results Page (SERP) and always added to Dataset if any.
How it works
This code is a JavaScript script that uses Cheerio to scrape data from Google Search Engine Results Pages.
- The crawler starts with startUrl
https://www.google.com/search?q=${q}&start=0&${num}&lr=lang_en&hl=en
where "q" and "num" parameters provided from the inputq
andnum
fields defined by the input schema. Any spaces in theq
field being encoded to plus(+) automatically. - The crawler uses
requestHandler
for each URL to extract the data from the page with the Cheerio library and, while the results array is not empty, saves the url, pageTitle and results of each page to the Dataset, otherwise Actor exits. Every result in array consists of site, link, title, description, image. After the data has been saved, the crawler enqueues request to the next page with the "start" parameter increased by the value of "num". - Number of scraped pages is limited by
Max Requests per Crawl
field from the input schema. It also logs out the url of each page visited.
Included features
Let's take a closer look at how it works using serpifier
- By entering a search query and leaving the
num
andmaxRequestsPerCrawl
parameters unchanged, we get the following graph:
- Now let's limit the number of results by setting
num
to 10 andmaxRequestsPerCrawl
to 10. That is, 10 pages with a maximum of 10 results per page:
- And here is the result with the
maxRequestsPerCrawl
set to 20. The graph will look something like this:
In fact, we don’t know the number of search results in advance, so we’re just playing with the parameters.
- Curious, how looks the graph with
num
: 2 andmaxRequestsPerCrawl
: 100?
- By setting a
sitesearch
input parameter to "apollographql.com", we will get a graph similar to the one shown below:
Developer
Maintained by Community
Actor Metrics
1 monthly user
-
0 No stars yet
>99% runs succeeded
Created in Jul 2024
Modified 5 months ago
Categories