Biketo China Cycling News & Product Scraper avatar

Biketo China Cycling News & Product Scraper

Pricing

Pay per event

Go to Apify Store
Biketo China Cycling News & Product Scraper

Biketo China Cycling News & Product Scraper

Scrapes Biketo (美骑网) — China's largest cycling portal — for news, product reviews, and race coverage since 2008. Enumerates articles by sequential ID across three channels. Returns title, author, publish date, channel, body text, lead image, and engagement metrics.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

Scrapes Biketo (美骑网) — China's largest and longest-running cycling portal — for news articles, product reviews, and race coverage. The site has published continuously since 2008, accumulating over 56,000 articles across three channels. This actor enumerates the complete back-catalog using Biketo's sequential article ID scheme, making it ideal for building a Mandarin cycling corpus for LLM fine-tuning, market research, or trend detection.

What you get

Each scraped record contains:

FieldDescription
articleIdNumeric article ID (e.g. 56323)
articleUrlFull canonical URL
channelBiketo's channel label in Chinese (e.g. 美骑快讯, 产品快讯, 赛事新闻)
titleArticle headline in Chinese
tagsComma-separated category tags from the article header
authorAuthor or source attribution
publishDatePublish date-time (YYYY-MM-DD HH:MM:SS)
leadImageURL of the first image in the article body
bodyTextFull article body text, whitespace-collapsed
viewCountPage view count (integer)
commentCountComment count (integer)
scrapedAtISO-8601 scrape timestamp

Input parameters

ParameterTypeDefaultDescription
startIdinteger1Article ID to start enumeration from
endIdinteger56500Article ID to stop at (inclusive)
channelsarray["news","product","racing"]Content channels to include
maxItemsintegerCap on total articles to return

Content channels

  • news — Cycling news, industry coverage, product announcements (/news/<id>.html)
  • product — Gear reviews and product features (/product/<id>.html)
  • racing — Race coverage and results (/racing/<id>.html)

All three channels share the same sequential ID space. IDs are enumerated in parallel across selected channels; invalid IDs for a given channel are silently skipped.

Usage examples

Full back-catalog (all channels, ~56k articles):

{
"startId": 1,
"endId": 56500,
"channels": ["news", "product", "racing"]
}

Recent articles only (incremental update):

{
"startId": 56200,
"endId": 56500,
"channels": ["news", "product", "racing"],
"maxItems": 100
}

Product reviews only:

{
"startId": 1,
"endId": 56500,
"channels": ["product"]
}

Notes

  • Charset: Biketo serves pages in GB2312. The actor transparently decodes to UTF-8 via Crawlee's built-in charset handling — all output fields are clean UTF-8 Chinese text.
  • Rate limiting: The actor uses moderate concurrency (5–15) with polite crawling. No proxy is required; the site is fully accessible to datacenter IPs.
  • Invalid IDs: Not every ID exists in every channel. The actor skips URLs that return 404 or lack an article heading — no error is logged for these, keeping run logs clean.
  • Resumability: For large runs, set startId and endId to narrow ranges. Re-run with updated startId for incremental updates.