๐Ÿ” Baidu Search Scraper avatar

๐Ÿ” Baidu Search Scraper

Pricing

from $5.99 / 1,000 results

Go to Apify Store
๐Ÿ” Baidu Search Scraper

๐Ÿ” Baidu Search Scraper

Scrape Baidu search results at scale. Extract organic listings, answer boxes, related videos, related searches, and top searches. Supports bulk queries, proxy fallback, date filters, and device/language options for SEO and market research.

Pricing

from $5.99 / 1,000 results

Rating

0.0

(0)

Developer

Scrapier

Scrapier

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

๐Ÿ” Baidu Search Scraper

The ๐Ÿ” Baidu Search Scraper is a fast, scalable Baidu SERP scraper that extracts organic listings, answer boxes, related videos, people also search for, related searches, and top searches from public Baidu results pages. It solves the challenge of collecting clean, structured SERP data at scale with a smart proxy fallback strategy and bulk query support. Built for marketers, developers, data analysts, and researchers, this Baidu search results scraper tool helps you run SEO analysis, market research, and competitor tracking with consistent, repeatable outputโ€”unlocking automation-ready Baidu SERP data at scale.

What data / output can you get?

Below are the structured fields pushed to the Apify dataset in real time for each SERP element:

Data fieldDescriptionExample value
queryThe search query associated with the result row"python tutorial"
resultTypeResult category: organic, answer_box, related_video, people_also_search_for, related_search, top_search"organic"
titleTitle of the organic result or video"Python Tutorial - W3Schools"
linkCanonical URL to the result (decoded from Baidu redirect when applicable)"https://www.w3schools.com/python/"
snippetText snippet/description for organic results"Learn Python with examples, exercises, and projects..."
displayedLinkDisplayed domain or path extracted from the link"www.w3schools.com"
thumbnailImage URL (if present for organic or video blocks)"https://img.example.com/thumb.jpg"
positionCalculated position among organic results across pages1
contentAnswer/content text for answer boxes"Python is a high-level programming language..."
sourceSource attribution for answer boxes"Baidu Baike"
searchTermRelated search term (for people_also_search_for, related_search, top_search)"learn python online"
richSnippetAdditional rich text extracted from organic blocks (if present)"Beginner-friendly ยท Free certificate"

Notes:

  • Results are stored as individual rows for real-time visibility during the run.
  • You can export data to JSON or CSV from the dataset.
  • Optionally, if you set outputFile, a summary JSON with totals and grouped results is saved to the key-value store.

Key features

  • ๐Ÿง  Intelligent proxy fallback
    Starts with no proxy to save cost. If Baidu blocks, automatically falls back to datacenter and then residential proxies (3 retries). Once residential works, it sticks with it for the remaining requests.

  • ๐Ÿ“ฆ Bulk keyword scraping
    Supply multiple Baidu URLs or plain search terms and process them in a single run for high-throughput workflows.

  • ๐Ÿ“ฑ Device & language targeting
    Control deviceType (desktop, mobile, tablet) and languageLocalization (All, Simplified Chinese, Traditional Chinese) to compare different SERP layouts and regions.

  • ๐Ÿ“… Time period filtering
    Use timePeriod to filter by startDate/endDate or daysAgo and narrow results to recent content.

  • ๐Ÿ“Š Structured SERP coverage
    Extracts organic results, answer boxes, related videos, people also search for, related searches, and top searches with clean fields.

  • โšก Real-time dataset output
    Pushes each result row to the Apify dataset during the run, so you can monitor progress live and export JSON/CSV at completion.

  • ๐Ÿ’พ Optional summary export
    Set outputFile to also save a consolidated JSON (with totals and results_by_query) to the key-value store for easy retrieval.

  • ๐Ÿ›ก๏ธ Production-ready robustness
    Retries, fallbacks, and clear logging help keep your runs successful and predictableโ€”even for large batches.

How to use Baidu Search Scraper - step by step

  1. Create or log in to your Apify account at https://console.apify.com.
  2. Go to Actors and open โ€œbaidu-search-scraperโ€.
  3. Add your input:
    • Paste Baidu search URLs or plain terms into urls (one per line).
    • Choose deviceType (desktop/mobile/tablet) and languageLocalization.
    • Set numResults and maxPagination to control depth.
    • Optionally configure timePeriod and proxyConfiguration.
  4. Click Start to run the actor.
  5. Watch progress in real timeโ€”rows appear in the dataset as theyโ€™re extracted.
  6. Open the OUTPUT tab to view the dataset and export to JSON or CSV.
  7. (Optional) Set outputFile to save a consolidated summary JSON to the key-value store.

Pro Tip: To compare mobile vs. desktop rankings programmatically, run two jobs with different deviceType values and diff results by position.

Use cases

Use caseDescription
SEO research & competitor analysisTrack competitor rankings and SERP features using a reliable Baidu ranking scraper with device and language targeting.
Market research & trend monitoringMonitor โ€œtop searchesโ€ and โ€œpeople also search forโ€ to identify rising topics and audience interests.
Content discovery & topic planningGather related searches to inform content briefs, clusters, and internal linking strategies.
Academic/behavioral researchAnalyze SERP structures and related queries for research into search behavior in Chinese markets.
Bulk keyword auditingRun large keyword sets in one batch to audit performance and identify low-competition opportunities.
SERP feature mappingCapture answer boxes and related videos to understand how Baidu SERP features influence visibility.

Why choose Baidu Search Scraper?

Build for precision and scale, this Baidu search engine scraper delivers structured SERP data with smart proxy management and clean output.

  • ๐ŸŽฏ Accurate, structured output with clearly defined fields per result type
  • ๐ŸŒ Language and device controls for regional and layout comparisons
  • ๐Ÿ“ˆ Scales to large keyword lists with consistent performance
  • ๐Ÿ‘จโ€๐Ÿ’ป Developer-friendly JSON/CSV exports via the Apify dataset
  • ๐Ÿ›ก๏ธ Safe and ethical: collects only publicly available data
  • ๐Ÿ’ธ Cost-aware: no proxy by default, with automatic fallback only when needed
  • ๐Ÿงฑ More reliable than browser extensions or ad-hoc tools, with robust retries and logging

Bottom line: a dependable Baidu SERP data extractor thatโ€™s production-ready for recurring workflows.

Yesโ€”when used responsibly. This actor collects data from publicly available Baidu search results and does not access private or password-protected content. As with any web data collection:

  • Only scrape publicly available information.
  • Ensure compliance with applicable regulations (e.g., GDPR, CCPA).
  • Review Baiduโ€™s terms and your organizationโ€™s policies.
  • Do not use the tool for spam or misuse of data.

Users are responsible for ensuring legal compliance for their specific use case.

Input parameters & output format

Example JSON input

{
"urls": [
"python tutorial",
"machine learning"
],
"deviceType": "desktop",
"languageLocalization": 1,
"startPage": 1,
"numResults": 10,
"timePeriod": {
"daysAgo": 7
},
"maxPagination": 3,
"outputFile": "baidu_serp_summary",
"proxyConfiguration": {
"useApifyProxy": false
}
}

Input parameters

  • urls (array, required)
    Description: Baidu search URLs (e.g., https://www.baidu.com/s?wd=python) OR plain search terms. Add one per line for bulk scraping. Default: none
  • deviceType (string)
    Description: Desktop = www.baidu.com (default). Mobile/Tablet = m.baidu.com. Use to scrape mobile vs desktop SERP. Default: "desktop"
  • languageLocalization (integer)
    Description: 1 = All languages (default). 2 = Simplified Chinese (็ฎ€ไฝ“ไธญๆ–‡). 3 = Traditional Chinese (็น้ซ”ไธญๆ–‡). Default: 1
  • startPage (integer)
    Description: Page number to start scraping from. 1 = first page. Default: 1
  • numResults (integer)
    Description: Number of results per page (1โ€“50). Baidu typically shows 10. Default: 10
  • timePeriod (object)
    Description: Optional date range filter. Use startDate + endDate (YYYY-MM-DD) for custom range, or daysAgo for โ€œlast N daysโ€. Default: empty object with defaults
    • startDate (string) โ€“ From date (YYYY-MM-DD). Default: ""
    • endDate (string) โ€“ To date (YYYY-MM-DD). Default: ""
    • daysAgo (integer) โ€“ Alternative: filter to last N days. Set 0 to disable. Default: 0
  • maxPagination (integer)
    Description: Max pages to scrape per query. 0 = no limit (capped at 10). Default: 3
  • outputFile (string)
    Description: Optional custom key for key-value store. Results are always saved to the Apify dataset; if set, also saves a consolidated JSON to KVS with this name. Default: ""
  • proxyConfiguration (object)
    Description: By default: no proxy. If Baidu blocks โ†’ datacenter โ†’ residential (3 retries). Enable Apify proxy here if you want to start with proxy (fallback still applies). Default: { "useApifyProxy": false }

Example dataset row output (pushed during the run)

{
"query": "python tutorial",
"resultType": "organic",
"title": "Python Tutorial - W3Schools",
"link": "https://www.w3schools.com/python/",
"snippet": "Learn Python with examples, exercises, and projects...",
"displayedLink": "www.w3schools.com",
"thumbnail": "https://img.example.com/thumb.jpg",
"position": 1,
"richSnippet": "Beginner-friendly ยท Free certificate"
}

Other result types use the same row structure with type-specific fields:

  • answer_box rows include: title, content, source.
  • related_video rows include: title, link, thumbnail.
  • people_also_search_for, related_search, top_search rows include: searchTerm and (when available) link.

Optional summary JSON (saved to key-value store when outputFile is set)

{
"summary": {
"total_queries": 2,
"queries": ["python tutorial", "machine learning"],
"total_organic_results": 20,
"total_answer_boxes": 2,
"total_related_videos": 1,
"total_people_also_search_for": 8,
"total_related_searches": 10,
"total_top_searches": 6
},
"results_by_query": {
"python tutorial": {
"query": "python tutorial",
"organic_results": [],
"answer_box": [],
"related_videos": [],
"people_also_search_for": [],
"related_searches": [],
"top_searches": []
}
}
}

FAQ

Does it work without a proxy?

Yes. By default it uses no proxy. If Baidu blocks requests, it automatically falls back to Apify datacenter proxy and then residential proxy with up to 3 retries.

Can I use my own proxy or start with a proxy?

Yes. Configure proxyConfiguration in the input to enable Apify Proxy from the start. The automatic fallback still applies if a block occurs.

Can I target mobile vs. desktop SERPs?

Yes. Set deviceType to desktop, mobile, or tablet. Mobile/Tablet uses m.baidu.com, which can produce different SERP layouts and results.

How do I filter results by date?

Use timePeriod. Provide startDate and endDate for a custom range, or set daysAgo (e.g., 7 for โ€œlast weekโ€). Leave it empty to disable filtering.

How many results can I extract per query?

Control depth with numResults (1โ€“50 per page) and maxPagination (0โ€“10 pages; 0 caps at 10). The actor aggregates organic positions across pages.

What data types are included beyond organic results?

In addition to organic results, the scraper extracts answer boxes, related videos, people also search for, related searches, and top searches when present.

Where do results go and how can I export them?

Rows are pushed to the Apify dataset during the run. You can view them in the OUTPUT tab and export to JSON or CSV. If you set outputFile, a consolidated summary JSON is also saved to the key-value store.

Is this a Baidu SERP API I can use with Python?

You can run the actor on Apify and access results programmatically via the dataset (download JSON/CSV) to integrate with Python or other workflows, effectively using it as a Baidu search results API for your pipelines.

Final thoughts

The ๐Ÿ” Baidu Search Scraper is built for structured, scalable Baidu SERP data extraction. With intelligent proxy fallback, bulk query support, and precise output fields, itโ€™s ideal for marketers, developers, analysts, and researchers. Export clean JSON/CSV from the dataset or save a consolidated summary to the key-value store for downstream automation. Start extracting smarter Baidu SEO insights and build repeatable workflows for analysis, enrichment, and reporting.