Pricing

Pay per event

BAAI / Zhiyuan AI Research Papers Scraper

Scrapes curated AI research papers from BAAI (Beijing Academy of AI, hub.baai.ac.cn). Extracts paper titles, authors, abstracts, arxiv IDs, venues, curator notes in Chinese, and links.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What You Get

Each record includes:

Field	Description
`paper_title_en`	Paper title in English
`arxiv_id`	ArXiv paper ID (e.g. `2606.06624`)
`authors`	List of author names
`publication_date`	Release date (ISO 8601)
`abstract_zh`	Full Chinese-language abstract
`keywords_zh`	Chinese subject tags (e.g. 机器学习, 生成模型)
`keywords_en`	ArXiv category codes (e.g. cs.LG, cs.RL)
`pdf_url`	Direct PDF download link (BAAI-hosted mirror)
`baai_curator_note`	Structured editorial notes: [简介] abstract, [问题] problem addressed, [思路] key approach, [亮点] highlights, [相关] related work
`baai_url`	Canonical BAAI paper page URL
`cited_by_count`	BAAI hotness score
`source`	Always `hub.baai.ac.cn`

Why BAAI?

BAAI (智源研究院) is China's premier government-backed AI research institute, behind the WuDao foundation model series, the BGE embedding family, and the Aquila LLM. Their curated daily paper feed covers ~10–30 papers per day with Chinese-language editorial summaries not available on arXiv — the editorial value add is the key moat.

Use cases:

Track Chinese AI research output for competitive intelligence
Build a joinable dataset with an ArXiv scraper (shared arxiv_id key)
Monitor BAAI's curated AI research highlights in Chinese for sino-watchers
Feed into downstream LLM pipelines with Chinese-language summaries

Input

Parameter	Required	Default	Description
`maxItems`	Yes	5	Maximum number of papers to return (current feed has ~9 per run)

How It Works

Fetches hub.baai.ac.cn/papers — a Nuxt SSR page that embeds the current hotness feed in window.__NUXT__ state (no JavaScript execution required)
Extracts up to 9 paper UUIDs from the SSR data
Fetches each paper's detail page (hub.baai.ac.cn/paper/<uuid>) — also fully SSR-rendered
Merges listing data (basic fields) with detail data (curator notes, extended keywords)
Emits one record per paper

Note on scope: The BAAI listing page renders the current editorial feed (~9 papers) via server-side rendering. Further pagination is client-side only (infinite scroll). Each run captures the current curated snapshot — run daily to build a historical archive.

Sample Output

{
  "paper_title_en": "Rethinking the Trust Region in LLM Reinforcement Learning",
  "arxiv_id": "2602.04879",
  "authors": ["Penghui Qi", "Xiangxin Zhou", "Zichen Liu"],
  "publication_date": "2026-02-04",
  "abstract_zh": "强化学习（RL）已成为大语言模型（LLM）微调的基石...",
  "keywords_zh": ["机器学习", "强化学习", "大语言模型"],
  "keywords_en": ["cs.LG", "cs.CL", "cs.AI"],
  "pdf_url": "https://simg.baai.ac.cn/paperfile/572bbeac-4516-4c34-8bc2-15ee9ef5bbb7.pdf",
  "baai_curator_note": "[简介] 强化学习（RL）已成为大语言模型...\n\n[问题] 如何设计更合理的信任域约束...\n\n[思路] 提出散度近端策略优化（DPPO）...",
  "baai_url": "https://hub.baai.ac.cn/paper/572bbeac-4516-4c34-8bc2-15ee9ef5bbb7",
  "cited_by_count": 120,
  "source": "hub.baai.ac.cn"
}

Notes

China-hosted: The site is hosted in China. Cross-border latency is factored into timeouts (45 seconds per request). Runs from US/EU Apify datacenters may experience occasional delays.
No authentication required: The papers feed is publicly accessible without login.
Daily curation: BAAI curates ~10–30 papers per day. Running this actor daily gives you a rolling archive of their editorial picks.

arXiv Paper Scraper - Research Papers & Abstracts

viralanalyzer/arxiv-paper-intelligence

Search and extract ArXiv papers, abstracts, authors, and citations. Track research trends across any scientific field. AI-powered analysis.

viralanalyzer

5.0

arXiv Paper Scraper - AI ML Research Papers

openclawmara/arxiv-paper-scraper

Scrape arXiv research papers by keyword, category, or author. Extracts titles, abstracts, authors, citations, and metadata. Perfect for AI/ML research monitoring, literature reviews, and LLM training data collection.

OpenClaw Mara

ArXiv Papers Scraper — Research Paper API

fast_api/arxiv-papers-scraper

Search and extract ArXiv research papers as structured JSON: titles, authors, abstracts, categories, dates, PDFs, and metadata. Built for AI research monitoring, literature review, RAG datasets, and academic intelligence.

Fast API

arXiv Research Paper Scraper

seeb/arxiv-research-paper-scraper

Scrape arXiv papers by keyword or category and return research titles, abstracts, authors, dates, links, and topic signals.

Techionik

arXiv Papers Monitor for Research Alerts

skootle/arxiv-papers

Monitor arXiv papers by query, category, author, or date. Export titles, abstracts, authors, links, PDFs, categories, and agent-friendly summaries for research monitoring, literature review, and AI paper workflows.

Skootle

ArXiv Paper Search MCP

reverberant_equality/mcp-arxiv-search

Search ArXiv papers and retrieve paper details. AI agents can discover academic research, abstracts, authors, categories, and PDF links.

Jordan C

arXiv Paper Scraper — Search Academic Papers & Abstracts

puskin/arxiv-scraper

Search and retrieve academic papers from arXiv by keyword, author, or category. Extracts titles, authors, abstracts, and download links via the free arXiv API — no authentication needed.

Giovanni Bucci

arXiv Papers Scraper

resounding_diplomacy/arxiv-papers-scraper

Scrape academic papers from arXiv by category, keyword, or author. Extract titles, authors, abstracts, PDF URLs, DOIs, categories, and more. Perfect for AI/ML research datasets.

alars num

AI Paper / arXiv Monitor

civicdataworks/ai-paper-arxiv-monitor

Search arXiv for AI/LLM/agent papers and export normalized paper metadata.

Rowan Mercer

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.