Biketo China Cycling News & Product Scraper
Pricing
Pay per event
Biketo China Cycling News & Product Scraper
Scrapes Biketo (美骑网) — China's largest cycling portal — for news, product reviews, and race coverage since 2008. Enumerates articles by sequential ID across three channels. Returns title, author, publish date, channel, body text, lead image, and engagement metrics.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share
Scrapes Biketo (美骑网) — China's largest and longest-running cycling portal — for news articles, product reviews, and race coverage. The site has published continuously since 2008, accumulating over 56,000 articles across three channels. This actor enumerates the complete back-catalog using Biketo's sequential article ID scheme, making it ideal for building a Mandarin cycling corpus for LLM fine-tuning, market research, or trend detection.
What you get
Each scraped record contains:
| Field | Description |
|---|---|
articleId | Numeric article ID (e.g. 56323) |
articleUrl | Full canonical URL |
channel | Biketo's channel label in Chinese (e.g. 美骑快讯, 产品快讯, 赛事新闻) |
title | Article headline in Chinese |
tags | Comma-separated category tags from the article header |
author | Author or source attribution |
publishDate | Publish date-time (YYYY-MM-DD HH:MM:SS) |
leadImage | URL of the first image in the article body |
bodyText | Full article body text, whitespace-collapsed |
viewCount | Page view count (integer) |
commentCount | Comment count (integer) |
scrapedAt | ISO-8601 scrape timestamp |
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
startId | integer | 1 | Article ID to start enumeration from |
endId | integer | 56500 | Article ID to stop at (inclusive) |
channels | array | ["news","product","racing"] | Content channels to include |
maxItems | integer | — | Cap on total articles to return |
Content channels
- news — Cycling news, industry coverage, product announcements (
/news/<id>.html) - product — Gear reviews and product features (
/product/<id>.html) - racing — Race coverage and results (
/racing/<id>.html)
All three channels share the same sequential ID space. IDs are enumerated in parallel across selected channels; invalid IDs for a given channel are silently skipped.
Usage examples
Full back-catalog (all channels, ~56k articles):
{"startId": 1,"endId": 56500,"channels": ["news", "product", "racing"]}
Recent articles only (incremental update):
{"startId": 56200,"endId": 56500,"channels": ["news", "product", "racing"],"maxItems": 100}
Product reviews only:
{"startId": 1,"endId": 56500,"channels": ["product"]}
Notes
- Charset: Biketo serves pages in GB2312. The actor transparently decodes to UTF-8 via Crawlee's built-in charset handling — all output fields are clean UTF-8 Chinese text.
- Rate limiting: The actor uses moderate concurrency (5–15) with polite crawling. No proxy is required; the site is fully accessible to datacenter IPs.
- Invalid IDs: Not every ID exists in every channel. The actor skips URLs that return 404 or lack an article heading — no error is logged for these, keeping run logs clean.
- Resumability: For large runs, set
startIdandendIdto narrow ranges. Re-run with updatedstartIdfor incremental updates.