Baidu Search Scraper
Pricing
from $1.80 / 1,000 results
Pricing
from $1.80 / 1,000 results
Rating
0.0
(0)
Developer
Lofomachines
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Baidu Search Scraper | Extract Baidu Search Results
Baidu Search Scraper is a powerful web scraping tool designed to extract search engine results pages (SERPs) from Baidu (China's leading search engine). This scraper is built to reliably bypass captchas, anti-scraping systems, and bot detection mechanisms.
Whether you need to collect search results for SEO monitoring, brand protection, sentiment analysis, academic research, or market intelligence, this Baidu Scraper handles the complexities of pagination.
๐ Key Features
- Sequential Multi-Query Search: Input multiple search terms (one per line) and process them in a single run. The scraper reuses the browser container for ultra-fast execution.
- Real Destination URL Resolution: Baidu search results use encrypted redirect links (
baidu.com/link?url=...). This scraper automatically follows redirects to extract and output the real destination URLs. - Advanced Baidu Search Operators: Fully supports filters such as:
- Domain filtering (
site:domain.com) - supports multiple domains combined with OR. - Recency/Time range (last 24 hours, week, month, year).
- File type filtering (PDF, Word, Excel, PPT, RTF).
- Language Script Selection (Simplified or Traditional Chinese).
- Search in page titles only (
intitle:) and Exact phrase matching ("phrase").
- Domain filtering (
๐ฏ Use Cases
- Chinese Market SEO Monitoring: Track organic rankings, indexation status, and SERP visibility for keywords on Baidu.
- Brand Protection & Infringement Tracking: Search for unauthorized sellers, trademark violations, or fake brand representations on Chinese web properties.
- Competitor Intelligence: Analyze competitor landing pages, display domains, and search snippets rank for specific terms.
- Academic & Sentiment Analysis Research: Extract historical data, news snippets, and online discussions relevant to Chinese culture, business, or politics.
๐ ๏ธ How to Use
- Configure Queries: Enter one or more keywords/queries in the Queries / Search Terms input box (one per line).
- Define Max Results: Set the maximum number of results you want to retrieve per query (e.g.,
100). - Apply Filters: (Optional) Restrict results by site/domain, publication date (time range), language script, or file type.
- Enable URL Resolution: Keep Resolve real URLs checked to follow redirects and get the actual target URLs instead of raw Baidu redirect links.
- Configure Proxy: For heavy usage, enable the Apify Proxy (using residential proxies is recommended to avoid IP bans).
- Run the Actor: Click the Run button. The scraper will collect the data and store it in your default dataset.
๐ฅ Input Configuration
Here is a list of the available input parameters:
| Field Name | Type | Description | Default |
|---|---|---|---|
queries | array | List of search queries to run sequentially (one per line). | ["claude anthropic"] |
maxResults | integer | Max results to collect for each query. | 100 |
timeRange | string | Filter results by date: any, day, week, month, or year. | "any" |
sites | array | Limit search to specific domains (e.g. wikipedia.org). | [] |
filetype | string | Limit results to specific file types: pdf, doc, xls, ppt, rtf. | "any" |
language | string | Chinese script: any, simplified, or traditional. | "any" |
exactPhrase | string | Require results to contain this exact phrase. | "" |
excludeWords | array | Exclude results containing these words. | [] |
titleOnly | boolean | Restrict search matches to page titles only. | false |
resolveRealUrls | boolean | Follow Baidu redirect links to get the real target URL. | true |
proxyConfiguration | object | Proxy settings (apify proxy, custom proxies). | None |
๐ค Output Format
Each scraped search result item is stored as an object in the Apify dataset. The scraper outputs the following fields:
| Field | Type | Description |
|---|---|---|
query | string | The search query term. |
position | integer | 1-based ranking position of the result for this query. |
page | integer | The page number on Baidu where the result was found. |
title | string | The title of the search result page. |
url | string | The resolved, final destination URL (e.g., https://example.com/page). |
baiduUrl | string | The original Baidu redirect URL. |
displayUrl | string | The display domain name shown on Baidu. |
snippet | string | Description snippet text matching your search terms. |
date | string | Publication date of the page (if shown on Baidu). |
siteName | string | Displayed name of the website (if shown on Baidu). |
Output JSON Example
{"query": "apple","position": 1,"page": 1,"title": "Apple (ไธญๅฝๅคง้) - ๅฎๆน็ฝ็ซ","url": "https://www.apple.com.cn/","baiduUrl": "http://www.baidu.com/link?url=6lHipUPotM6NN3efDPvd4gZk1ZSQhtVwsIBdG3DGtmFUBe5LzfEdru89qaxDmtNy","displayUrl": "www.apple.com.cn/","snippet": "ๆข็ดขApple ็ๅๆฐไธ็,้่ดญๅๅผ iPhoneใiPadใApple Watch ๅ Mac,ๆต่งๅ็ฑป้ ไปถใๅจฑไนไบงๅ,ๅนถ่ทๅพ็ธๅ ณไบงๅ็ไธๅฎถๆๅกๆฏๆใ","date": null,"siteName": null}
๐ก Troubleshooting & Performance Tips
- Speeding up Runs: Setting
resolveRealUrlstofalsemakes the scraper significantly faster because it doesn't need to make HEAD/GET HTTP requests to every resolved target website. If you only need domain names or the raw Baidu redirect links, turn this off.
โ FAQ
Q: Can I scrape thousands of keywords?
A: Yes! You can input a large list of keywords in the queries field.
Q: Why are some destination URLs identical to the Baidu redirect URLs?
A: If the target website is offline, slow to respond, or blocks redirect resolution requests, the scraper falls back to the original Baidu redirect link to ensure you do not lose data.