Serp CWD
Under maintenancePricing
Pay per usage
Pricing
Pay per usage
Rating
5.0
(1)
Developer
LR
Maintained by CommunityActor stats
0
Bookmarked
4
Total users
3
Monthly active users
4 days ago
Last modified
Categories
Share
Google SERP Company Discovery
This Actor finds likely official company websites from Google search results.
It is built for company-website discovery workflows like:
"Company Name" + town- parsing the top organic results
- rejecting directories, social pages, job boards, and obvious junk
- returning either:
- an accepted website
- a review candidate
- or a rejected/no-site result
What It Does
For each input company row, the Actor:
- Builds a Google query from
companyNameandtown - Fetches raw HTML through either:
- Apify
GOOGLE_SERPproxy by default - or your own proxy URLs if you provide them
- Apify
- Parses the organic SERP results
- Scores candidate domains with either:
strictlooseraw
- Pushes normalized rows into the default dataset
Why This Actor Exists
The goal is to keep the discovery engine portable and publishable:
- portable because the core matching logic is not tied to a specific SERP SaaS
- publishable because Apify handles proxying, hosting, scaling, scheduling, and monetization
Input
You can provide company rows in either of these ways:
searches- inline array of
{ companyNumber, companyName, town }
- inline array of
sourceDatasetId- a dataset where items expose:
companyNameorcompany_name- optional
companyNumberorcompany_number - optional
town
- a dataset where items expose:
Useful input fields:
limitmaxConcurrencygoogleDomainlanguagepagesPerQuerymatchModeproxySettingscustomProxyUrlsproxyProviderLabelresumeFromCheckpoint
Output
Each dataset row includes:
company_numbercompany_nametownqueryclassificationselected_urlselected_domainselected_titleselected_positionselected_scorereview_candidate_urlreview_candidate_domainorganic_result_countraw_organic_resultshttp_statuselapsed_secondsresponse_bytesproxy_providererror
Classification meanings
accepted- strong heuristic match to the company’s own site
review- plausible candidate, but not strong enough to auto-accept
rejected- no credible official website found
raw- SERP parsed only, no selection applied
error- request or parse failure
Match Modes
strict
Production-oriented heuristic.
Accepts only strong brand/domain matches. Borderline results are marked review.
loose
Fast benchmark mode.
Uses token-domain matching that is useful for quick smoke tests but more permissive.
raw
Returns parsed SERP results without choosing a winner.
Checkpointing
The Actor writes resumable state to the default key-value store using checkpointKey.
If you rerun with:
- the same input order
- the same
checkpointKey resumeFromCheckpoint = true
the Actor skips rows already completed in the earlier run.
Proxy options
By default, the Actor uses Apify GOOGLE_SERP.
If you want to test another provider such as DataImpulse, pass one or more full proxy URLs in customProxyUrls. Example:
{"searches": [{ "companyName": "Example Engineering Ltd", "town": "Leeds" }],"customProxyUrls": ["http://LOGIN:PASSWORD@gw.dataimpulse.com:823"],"proxyProviderLabel": "dataimpulse_residential"}
If customProxyUrls is present, it overrides Apify proxy usage. The Apify SDK rotates the provided URLs round-robin. If you provide only one rotating gateway URL, the provider's own rotation still happens server-side.
Benchmark script
Use scripts/benchmark_proxy_providers.py to run the same fixed sample through:
- Apify
GOOGLE_SERP - DataImpulse datacenter
- DataImpulse residential
- DataImpulse mobile
- DataImpulse premium residential
Expected environment variables:
APIFY_TOKENDATAIMPULSE_DATACENTER_PROXY_URLDATAIMPULSE_RESIDENTIAL_PROXY_URLDATAIMPULSE_MOBILE_PROXY_URLDATAIMPULSE_PREMIUM_RESIDENTIAL_PROXY_URL
Example:
$python scripts/benchmark_proxy_providers.py --input-json sample_searches.json --max-concurrency 25
Notes
- This Actor uses raw HTTP requests, not browser automation.
pagesPerQuery > 1increases proxy spend because each page counts separately.- Google HTML changes over time, so parsing logic should be revalidated periodically.
Suggested internal benchmark
Compare this Actor against your current SERP providers on the same fixed 100-company sample and track:
- HTTP success rate
- accepted count
- review count
- obvious false positives
- average
response_bytes - estimated proxy cost per 1k searches
- cost per accepted website