πŸ”Ž Yahoo Scraper avatar

πŸ”Ž Yahoo Scraper

Pricing

from $3.99 / 1,000 results

Go to Apify Store
πŸ”Ž Yahoo Scraper

πŸ”Ž Yahoo Scraper

Scrape Yahoo Search results in bulk β€” titles, snippets, links, favicons, sub-links and a clean Markdown excerpt for every page. Smart proxy auto-escalation (direct β†’ datacenter β†’ residential) keeps you unblocked.

Pricing

from $3.99 / 1,000 results

Rating

0.0

(0)

Developer

Scraply

Scraply

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

πŸ”Ž Yahoo Search Scraper

Scrape Yahoo Search results at scale β€” titles, URLs, snippets, favicons, in-article sub-links, and a clean Markdown excerpt for every result. Bulk queries, time-window filtering and smart proxy auto-escalation (direct β†’ datacenter β†’ residential) keep your runs fast and unblocked.


⭐ Why Choose Us?

  • Bulk-first β€” paste dozens of queries (or full Yahoo URLs) and walk every result page until the cap.
  • Smart proxy ladder β€” starts direct, only escalates if Yahoo blocks. You don't pay for residential traffic you didn't need.
  • Rich back-fill β€” when Yahoo's snippet is thin, the actor visits the result page and harvests in-article sub-links + a Markdown summary.
  • Live results β€” rows stream to the dataset as they're scraped, so a mid-run interruption never loses your data.
  • Production-grade error handling β€” 3-tier proxy retries, graceful PPE limit handling, exponential cool-downs.

πŸ”‘ Key Features

  • 🌐 Bulk queries β€” plain keywords or full Yahoo SERP URLs, mixed freely.
  • πŸ“… Time-window filter β€” Anytime / Past day / Past week / Past month.
  • πŸ›‘οΈ Auto-escalating proxy: direct β†’ Apify Datacenter β†’ Apify Residential (3 retries), then sticky.
  • 🧩 Optional second-pass back-fill of sub-links + Markdown excerpts.
  • πŸ“‹ Per-section dataset views: Overview, Snippet, Sub-links.
  • πŸ”„ Custom proxy URLs supported β€” they go first, then the smart ladder.

🧾 Input

{
"queries": [
"java developer",
"https://search.yahoo.com/search?p=python+jobs"
],
"maxItems": 10,
"timePeriod": "Anytime",
"backfillEmptyResults": true,
"backfillConcurrency": 8,
"backfillMaxLinks": 10,
"proxyConfiguration": { "useApifyProxy": false }
}
FieldTypeDescription
queriesstring[]One or more search terms or Yahoo SERP URLs.
maxItemsintegerHard cap on unique results per query (1–500).
timePeriodstringAnytime / Past day / Past week / Past month.
backfillEmptyResultsbooleanVisit each result page to harvest sub-links + Markdown excerpt.
backfillConcurrencyintegerParallelism for back-fill (1–32).
backfillMaxLinksintegerMax in-article sub-links per result page (1–50).
proxyConfigurationobjectApify proxy config. Defaults to direct (no proxy).

πŸ“€ Output

Each row matches the per-section views in the dataset.

{
"query": "java developer",
"title": "How to become a Java Developer? - GeeksforGeeks",
"url": "https://www.geeksforgeeks.org/gfg-academy/how-to-become-a-java-developer/",
"description": "A Java developer is a software engineer who builds...",
"text": " * Core Java\n\nCore Fundamentals: Learn concepts and practice DSA...\n",
"logo_url": "https://s.yimg.com/pv/.../32x32_7eae5aac8b7f7402.png",
"links": [
"https://www.geeksforgeeks.org/java/java",
"https://www.geeksforgeeks.org/advance-java/spring"
],
"domain": "www.geeksforgeeks.org"
}
FieldDescription
queryThe query (or URL) the row was scraped under.
titleThe result's headline.
urlThe clean target URL (Yahoo's tracker is stripped).
descriptionYahoo's SERP snippet, rendered as Markdown.
textMarkdown excerpt β€” either Yahoo's list block or, after back-fill, an in-article summary.
logo_urlThe result's favicon.
linksUp to N harvested in-article sub-links (after back-fill).
domainThe host portion of url.

πŸš€ How to Use (Apify Console)

  1. Open Apify Console β†’ Actors.
  2. Find this actor and open it.
  3. Paste your queries (one per line) into 🌐 Search Queries / URLs.
  4. Pick a 🎁 Maximum results cap and a πŸ“… Time window.
  5. (Optional) Leave proxy on direct β€” the actor will auto-escalate only when needed.
  6. Click Start.
  7. Watch live logs β€” rows appear in the Output tab as they're scraped.
  8. Export results as JSON / CSV / XLSX.

πŸ€– Use via API

curl -X POST "https://api.apify.com/v2/acts/<ACTOR_ID>/run-sync-get-dataset-items?token=$APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"queries": ["java developer"],
"maxItems": 10,
"timePeriod": "Anytime"
}'

πŸ’Ό Best Use Cases

  • SEO & SERP monitoring on Yahoo.
  • Competitive intelligence β€” track who appears for a query over time.
  • Lead generation β€” feed result URLs into your own enrichment pipeline.
  • Content discovery β€” harvest in-article sub-links for further crawling.

πŸ’³ Pricing

This actor uses Apify's Pay-per-event model. The primary event is result-item β€” one charge per result row pushed to the dataset. You pay only for the rows you actually receive; back-fill, retries and failed attempts are not billed.

You also pay the underlying Apify platform usage (compute units, proxy traffic when used). Direct (no-proxy) requests cost no proxy traffic at all β€” which is why the actor stays on direct until Yahoo forces it to escalate.


❓ Frequently Asked Questions

Does it work when Yahoo blocks me? Yes. The default no-proxy run is the fastest, but the moment Yahoo returns a block (HTTP 429/503 or a captcha page), the actor auto-escalates to the Apify Datacenter pool, then to Residential with up to 3 retries. Once a tier works, it's locked in for the rest of the run.

Can I bring my own proxies? Yes β€” paste them into the proxy field's Custom proxy URLs. Your URLs are tried first (3 retries), then the datacenter β†’ residential fallback ladder kicks in.

Does it follow pagination? Yes. Yahoo returns ~7 results per page; the actor walks pages until your maxItems cap is hit or 3 consecutive pages return nothing.

What about non-Latin queries? Yahoo handles UTF-8 queries natively β€” paste them as-is.

Why is my back-filled text empty for some rows? Some sites block all bots (or render with JS only). In that case the actor falls back to a minimal block built from Yahoo's own title + description so the field is never blank.


πŸ“¨ Support & Feedback

  • Issues / feature requests β†’ please open a thread on the actor's detail page.
  • Custom solutions β†’ dev.scraperengine@gmail.com.