π Yahoo Scraper
Pricing
from $3.99 / 1,000 results
π Yahoo Scraper
Scrape Yahoo Search results in bulk β titles, snippets, links, favicons, sub-links and a clean Markdown excerpt for every page. Smart proxy auto-escalation (direct β datacenter β residential) keeps you unblocked.
Pricing
from $3.99 / 1,000 results
Rating
0.0
(0)
Developer
ScrapeFlow
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
π Yahoo Search Scraper
Scrape Yahoo Search results at scale β titles, URLs, snippets, favicons, in-article sub-links, and a clean Markdown excerpt for every result. Bulk queries, time-window filtering and smart proxy auto-escalation (direct β datacenter β residential) keep your runs fast and unblocked.
β Why Choose Us?
- Bulk-first β paste dozens of queries (or full Yahoo URLs) and walk every result page until the cap.
- Smart proxy ladder β starts direct, only escalates if Yahoo blocks. You don't pay for residential traffic you didn't need.
- Rich back-fill β when Yahoo's snippet is thin, the actor visits the result page and harvests in-article sub-links + a Markdown summary.
- Live results β rows stream to the dataset as they're scraped, so a mid-run interruption never loses your data.
- Production-grade error handling β 3-tier proxy retries, graceful PPE limit handling, exponential cool-downs.
π Key Features
- π Bulk queries β plain keywords or full Yahoo SERP URLs, mixed freely.
- π Time-window filter β Anytime / Past day / Past week / Past month.
- π‘οΈ Auto-escalating proxy: direct β Apify Datacenter β Apify Residential (3 retries), then sticky.
- π§© Optional second-pass back-fill of sub-links + Markdown excerpts.
- π Per-section dataset views: Overview, Snippet, Sub-links.
- π Custom proxy URLs supported β they go first, then the smart ladder.
π§Ύ Input
{"queries": ["java developer","https://search.yahoo.com/search?p=python+jobs"],"maxItems": 10,"timePeriod": "Anytime","backfillEmptyResults": true,"backfillConcurrency": 8,"backfillMaxLinks": 10,"proxyConfiguration": { "useApifyProxy": false }}
| Field | Type | Description |
|---|---|---|
queries | string[] | One or more search terms or Yahoo SERP URLs. |
maxItems | integer | Hard cap on unique results per query (1β500). |
timePeriod | string | Anytime / Past day / Past week / Past month. |
backfillEmptyResults | boolean | Visit each result page to harvest sub-links + Markdown excerpt. |
backfillConcurrency | integer | Parallelism for back-fill (1β32). |
backfillMaxLinks | integer | Max in-article sub-links per result page (1β50). |
proxyConfiguration | object | Apify proxy config. Defaults to direct (no proxy). |
π€ Output
Each row matches the per-section views in the dataset.
{"query": "java developer","title": "How to become a Java Developer? - GeeksforGeeks","url": "https://www.geeksforgeeks.org/gfg-academy/how-to-become-a-java-developer/","description": "A Java developer is a software engineer who builds...","text": " * Core Java\n\nCore Fundamentals: Learn concepts and practice DSA...\n","logo_url": "https://s.yimg.com/pv/.../32x32_7eae5aac8b7f7402.png","links": ["https://www.geeksforgeeks.org/java/java","https://www.geeksforgeeks.org/advance-java/spring"],"domain": "www.geeksforgeeks.org"}
| Field | Description |
|---|---|
query | The query (or URL) the row was scraped under. |
title | The result's headline. |
url | The clean target URL (Yahoo's tracker is stripped). |
description | Yahoo's SERP snippet, rendered as Markdown. |
text | Markdown excerpt β either Yahoo's list block or, after back-fill, an in-article summary. |
logo_url | The result's favicon. |
links | Up to N harvested in-article sub-links (after back-fill). |
domain | The host portion of url. |
π How to Use (Apify Console)
- Open Apify Console β Actors.
- Find this actor and open it.
- Paste your queries (one per line) into π Search Queries / URLs.
- Pick a π Maximum results cap and a π Time window.
- (Optional) Leave proxy on direct β the actor will auto-escalate only when needed.
- Click Start.
- Watch live logs β rows appear in the Output tab as they're scraped.
- Export results as JSON / CSV / XLSX.
π€ Use via API
curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"queries": ["java developer"],"maxItems": 10,"timePeriod": "Anytime"}'
πΌ Best Use Cases
- SEO & SERP monitoring on Yahoo.
- Competitive intelligence β track who appears for a query over time.
- Lead generation β feed result URLs into your own enrichment pipeline.
- Content discovery β harvest in-article sub-links for further crawling.
π³ Pricing
This actor uses Apify's Pay-per-event model. The primary event is result-item β one charge per result row pushed to the dataset. You pay only for the rows you actually receive; back-fill, retries and failed attempts are not billed.
You also pay the underlying Apify platform usage (compute units, proxy traffic when used). Direct (no-proxy) requests cost no proxy traffic at all β which is why the actor stays on direct until Yahoo forces it to escalate.
β Frequently Asked Questions
Does it work when Yahoo blocks me? Yes. The default no-proxy run is the fastest, but the moment Yahoo returns a block (HTTP 429/503 or a captcha page), the actor auto-escalates to the Apify Datacenter pool, then to Residential with up to 3 retries. Once a tier works, it's locked in for the rest of the run.
Can I bring my own proxies? Yes β paste them into the proxy field's Custom proxy URLs. Your URLs are tried first (3 retries), then the datacenter β residential fallback ladder kicks in.
Does it follow pagination?
Yes. Yahoo returns ~7 results per page; the actor walks pages until your maxItems cap is hit or 3 consecutive pages return nothing.
What about non-Latin queries? Yahoo handles UTF-8 queries natively β paste them as-is.
Why is my back-filled text empty for some rows?
Some sites block all bots (or render with JS only). In that case the actor falls back to a minimal block built from Yahoo's own title + description so the field is never blank.
π¨ Support & Feedback
- Issues / feature requests β please open a thread on the actor's detail page.
- Custom solutions β dev.scraperengine@gmail.com.