Toutiao Scraper 📰 avatar

Toutiao Scraper 📰

Pricing

$30.00/month + usage

Go to Apify Store
Toutiao Scraper 📰

Toutiao Scraper 📰

Easily scrape articles and videos from Toutiao (头条), a popular Chinese news and information content platform.

Pricing

$30.00/month + usage

Rating

0.0

(0)

Developer

Nataniel Santos

Nataniel Santos

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 hours ago

Last modified

Share

🔍 What does Toutiao Scraper do?

Toutiao Scraper is a web scraper designed to extract data from Toutiao, a popular Chinese news and information content platform. It collects various types of content, including articles and videos, along with associated metadata such as author information, statistics, and more.


✨ What Does This Tool Do?

  • 📰 Scrape using article URLs, video URLs or profile URLs
  • 📊 Extract detailed information about articles and videos, including title, cover image, abstract, publish time, author details, and statistics like play count and comment count.
  • ⚡ Fast and reliable scraping
  • 🔒 No need for authentication
  • 🔄 Automatically handles retries for failed requests
  • 📦 Outputs data in JSON format for easy integration with other tools

🎯 Who Is This For?

  • 🕵️‍♂️ Researchers and analysts interested in Chinese media trends
  • 📈 Marketers looking to analyze content performance on Toutiao
  • 🧑‍💻 Developers needing to integrate Toutiao data into their applications
  • 📰 Journalists and content creators seeking insights into popular topics and authors on Toutiao

⬇️ What You Need to Provide

  • 🔗 Start URLs from Toutiao You can provide the scraper with the following types of URLs:
    • Article URLs (e.g., https://www.toutiao.com/article/7450114952884503059)
    • Video URLs (e.g., https://www.toutiao.com/video/7450114952884503059)
    • Profile URLs (e.g., https://www.toutiao.com/c/user/token/MS4wLjABAAAA3KU9ifwN12G41E6PJRpZgIfzVrXWQ4WPF66ns54c0uw/)
  • 🗂️ Profile Tab (optional) If you are providing a profile URL, specify the tab you want to scrape (e.g., all, article, video or short_video)
  • 🔍 Profile Search Query (optional) If you are providing a profile URL, you can also specify a search query to filter the profile content
  • 📊 Max Items (optional) You can set a maximum number of items to scrape per profile limit the amount of data collected. Set to 0 for no limit.

💡 Example Inputs

  • Scraping an Article
{
"startUrls": [
"https://www.toutiao.com/article/7450114952884503059"
]
}
  • Scraping a Video
{
"startUrls": [
"https://www.toutiao.com/video/7450114952884503059"
]
}
  • Scraping 100 articles from a Profile
{
"startUrls": [
"https://www.toutiao.com/c/user/token/MS4wLjABAAAA3KU9ifwN12G41E6PJRpZgIfzVrXWQ4WPF66ns54c0uw/"
],
"profileTab": "article",
"maxItems": 100
}
  • Scraping 50 items from a Profile with a search query
{
"startUrls": [
"https://www.toutiao.com/c/user/token/MS4wLjABAAAA3KU9ifwN12G41E6PJRpZgIfzVrXWQ4WPF66ns54c0uw/"
],
"profileSearchQuery": "dance",
"maxItems": 50
}

📦 What You’ll Get (Output)

Each result item includes:

  • 📰 Article Data - url, ID, title, full content, publish time, type
  • 📸 Media Data - cover image, image list, video URL, video cover, video metadata
  • 👤 Author Data - name, user ID, verification status, avatar URL, description, authentication type and info
  • 📊 Statistics - comment count, digg count, share count, read count, etc.
  • 📍 Local City Info - code and name of the city associated with the content

📥 Output Example

Article Output

{
"url": "https://www.toutiao.com/article/7450114952884503059/",
"title": "“海阳船”再发“海阳箭”——谷神星一号海遥四运载火箭发射成功",
"abstract": "齐鲁网·闪电新闻12月19日讯 12月19日下午,万众瞩目之下,由海阳总装出厂的谷神星一号海遥四运载火箭搭乘“东方航天港”号发射船,喷射烈焰,拔地而起,将天启星座04组卫星(33星至36星)共4颗卫星搭载送入高度850km、倾角45°的近地轨道,发射任务取得圆满成功。",
"itemId": "7450114952884503059",
"publishTime": "2024-12-19 21:31",
"content": "<img src=\"https://p3-sign.toutiaoimg.com/tos-cn-i-axegupay5k/3f3872e7c89e4f2baccea3632c7077bf~tplv-tt-origin-web:gif.jpeg?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1771785537&x-signature=9GJlXcze83Z4U6upcZ%2BxTfOOsrQ%3D\" img_width=\"6000\" img_height=\"4000\" image_type=\"1\" mime_type=\"image/jpeg\" web_uri=\"tos-cn-i-tjoges91tu/0db73b409bfcbc388e7995c7460eff7a\"><p>齐鲁网·闪电新闻12月19日讯 12月19日下午,万众瞩目之下,由海阳总装出厂的谷神星一号海遥四运载火箭搭乘“东方航天港”号发射船,喷射烈焰,拔地而起,将天启星座04组卫星(33星至36星)共4颗卫星搭载送入高度850km、倾角45°的近地轨道,发射任务取得圆满成功。</p><p>本次发射的天启星座04组卫星由北京国电高科科技有限公司抓总、山东华宇航天空间技术有限公司(航科513所)承担研制,用户为北京国电高科科技有限公司。4颗卫星入轨后将利用自身的推进系统完成四星在轨布相,使相邻两星的相位间隔90°,并将轨道高度抬升至900km,通过搭载DCS载荷,可提供全球范围内的数据采集服务。设计由38颗卫星组网的天启星座,建成后将有效解决地面网络覆盖不到的海上、空中及偏远区域物联网络接入问题。</p><p>“谷神星一号海射型运载火箭为北京星河动力装备科技有限公司基于谷神星一号运载火箭为满足海上发射适应性改进的产品。”星河动力相关负责人介绍,此次飞行试验为火箭方案验证性试验任务,用于进一步验证谷神星一号海射型运载火箭设计方案的正确性,火箭系统发射场工作流程的合理性,火箭系统、卫星系统、发射与保障系统、测控系统、发射船系统等各大系统的协调性,同时获取飞行试验数据,进一步验证火箭性能,积累飞行试验可靠性数据。</p><p>谷神星一号海遥四运载火箭的成功发射也意味着海阳成功保障了14次海上发射任务,累计将79颗卫星送入太空,海上发射流程逐步标准化,彰显了海阳常态化海上发射能力加速形成,海上发射地位全面确立。</p><p>东方航天港海上发射列入中国航天白皮书,海阳航空航天产业集群被确定为省级战略性新兴产业集群……2019年以来,抢抓我国第一次海上发射机遇,海阳深耕产业链条,聚力精准招商,不断释放产业倍增效应。先后引入中国长征火箭、星河动力、九天行歌等标杆企业,招引集聚航空航天产业项目23个,总投资约273亿元,构建起集海上发射、星箭产研、卫星应用、配套集成、航天文旅为一体的全产业链生态圈。</p><p>如今在海阳,围绕火箭发射及配套服务等展开的航天产业相关项目正不断传来好消息——今年上半年,东方空间、星河动力、长征火箭二期、卫星数据产业园、智算中心一期、来福士北区扩建等13个重点项目相继建成投产,山东陆海装备海阳基地、箭元科技、中核科技馆等重点项目陆续开工建设。一个个项目相连,串起一条动能澎湃的产业链,海阳“向天图强”前景光明。</p><p>星辰大海步履不息。未来,海阳将围绕聚焦聚力发射能力提升、火箭产业链条完善、卫星产业培育、航天文旅发展、低空经济领域开拓和园区配套提升等六大板块,加快打造“链条完备、配套齐全、共建共享、融合耦合”的商业航天海上发射“策源地”。争取2030年前实现每年百次发射、量产百发火箭、天上百颗卫星、营收超过百亿,将海阳市商业航天产业打造成山东省新质生产力新标杆。</p><p>闪电新闻记者 马帅 通讯员 姜祥 徐盛世 报道</p>",
"cover": "https://p3-sign.toutiaoimg.com/tos-cn-i-tjoges91tu/0db73b409bfcbc388e7995c7460eff7a~tplv-tt-shrink-asy2-web:640:0:5aS05p2hQOmXqueUteaWsOmXuw==:gif.jpeg?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1771785537&x-signature=PBt%2FjTLll7FqfI556mMNEHRY1w8%3D",
"imageList": [
"https://p3-sign.toutiaoimg.com/tos-cn-i-axegupay5k/3f3872e7c89e4f2baccea3632c7077bf~tplv-tt-origin-web:gif.jpeg?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1771785537&x-signature=9GJlXcze83Z4U6upcZ%2BxTfOOsrQ%3D"
],
"source": "闪电新闻",
"authorData": {
"name": "闪电新闻",
"secUserId": "51050126444",
"userId": "MS4wLjABAAAA3KU9ifwN12G41E6PJRpZgIfzVrXWQ4WPF66ns54c0uw",
"userVerified": true,
"avatarUrl": "https://sf3-cdn-tos.toutiaostatic.com/img/user-avatar/14194628f142c924cb14b638f6b04d91~300x300.image",
"description": "闪电新闻,先声夺人。山东广播电视台倾力打造的融媒体客户端。",
"userAuthType": "5",
"userAuthInfo": "闪电新闻官方账号"
},
"statistics": {
"commentCount": 1,
"buryCount": 0,
"diggCount": 2,
"shareCount": 9,
"readCount": 95,
"showCount": 3432,
"repinCount": 1,
"forwardCount": 0,
"danmuCount": 0
},
"localCityInfo": {
"code": "110000",
"name": "北京"
},
"type": "article"
}

Video Output

{
"url": "https://www.toutiao.com/video/7464576063700468260/",
"itemId": "7464576063700468260",
"type": "video",
"title": "谁懂@张艺兴 《开天》Dance Break的震撼~#山东春晚",
"cover": "https://p3-sign.toutiaoimg.com/tos-cn-p-0004/b21bf563d6894858a92773f249f6dad3_1737982050~tplv-pk90l89vgd-crop-center-v4:864:486.jpeg?_iz=31127&bid=16&from=ttvideo.&gid=7464576063700468260&lk3s=06827d14&x-expires=1771786871&x-signature=BNGUEbZ%2F%2BWVDJB4PNAVOkP7XYY4%3D",
"abstract": "",
"publishTime": "2025-01-27 20:47",
"firstFrame": "https://p3-sign.toutiaoimg.com/tos-cn-p-0004/d341e5d215e34dbe94744b714e7271a4_1737982051~tplv-pk90l89vgd-crop-center-v4:864:486.jpeg?_iz=31127&bid=16&from=ttvideo.&gid=7464576063700468260&lk3s=06827d14&x-expires=1771786871&x-signature=IAuQrVP8gPAecnqfBzbPyuVsLrQ%3D",
"statistics": {
"playCount": 1077,
"commentCount": 2,
"buryCount": 0,
"diggCount": 29,
"shareCount": 0,
"readCount": 1077,
"showCount": 22814,
"repinCount": 0,
"forwardCount": 1,
"danmuCount": 0
},
"authorData": {
"name": "闪电新闻",
"userId": "51050126444",
"secUserId": "MS4wLjABAAAA3KU9ifwN12G41E6PJRpZgIfzVrXWQ4WPF66ns54c0uw",
"avatarUrl": "https://p3-sign.toutiaoimg.com/user-avatar/14194628f142c924cb14b638f6b04d91~300x300.image?_iz=112761&from=tt_user.video&lk3s=06827d14&x-expires=1771804800&x-signature=9VGBHCFsmJQBudILBsb7lVxcLl0%3D",
"userVerified": false
},
"videoData": {
"duration": 18.552,
"height": 720,
"width": 1280,
"size": 3357052,
"bitrate": 1447629,
"codecType": "h264",
"playUrl": "https://v3-web.toutiaovod.com/71c575df277d279914919aa4dd73e270/69922619/video/tos/cn/tos-cn-v-0004c800/oQ3ATlg0iF6gABDZvnqt9egG7EQGfAODIvAOiB/?a=24&ch=0&cr=0&dr=0&er=0&lr=unwatermarked&net=5&cd=0%7C0%7C0%7C0&br=1413&bt=1413&cs=0&ds=3&ft=IAEvTW79QPRU0~fuu.6.AYWSY7~q7JDtGRN72WAq8_45a&mime_type=video_mp4&qs=13&rc=M3Zvdng5cnJyeDMzNDczM0BpM3Zvdng5cnJyeDMzNDczM0BjaW1oMmRjLV9gLS1kLTBzYSNjaW1oMmRjLV9gLS1kLTBzcw%3D%3D&btag=c0000e00010000&dy_q=1771182071&feature_id=aa7df520beeae8e397df15f38df0454c&l=202602160301115F6682FD548F27397D2D"
},
"localCityInfo": {
"name": "北京",
"code": "110000"
}
}

🧩 Integrations and Toutiao Scraper

This scraper can be connected with almost any cloud service or web app thanks to integrations on the Apify platform. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Or you can use webhooks to carry out an action whenever an event occurs, e.g. get a notification whenever Toutiao Scraper successfully finishes a run.


🔌 Using Toutiao Scraper with the Apify API

The Apify API gives you programmatic access to the Apify platform. The API is organized around RESTful HTTP endpoints that enable you to manage, schedule, and run Apify actors. The API also lets you access any datasets, monitor actor performance, fetch results, create and update versions, and more.

To access the API using Node.js, use the apify-client NPM package. To access the API using Python, use the apify-client PyPI package.

Check out the Apify API reference docs for full details.


💬 Giving feedback

If you have any feature requests or bug reports, please create an issue on the Issues page or contact me directly via email.

If you need a custom solution of this actor, reach out to me through my email: nathan.santos159@hotmail.com