AI Research Radar — compliant feed of new AI papers and news avatar

AI Research Radar — compliant feed of new AI papers and news

Pricing

from $0.50 / 1,000 results

Go to Apify Store
AI Research Radar — compliant feed of new AI papers and news

AI Research Radar — compliant feed of new AI papers and news

AI research feed of new ML papers and AI news from HuggingFace, Anthropic, Google, The Decoder — structured JSON, robots-compliant.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

Connor Teskey

Connor Teskey

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

20 days ago

Last modified

Share

AI Research Radar

New AI papers, lab announcements, and AI news from five permitted sources, delivered as one structured, schedule-ready feed.

Built for AI newsletter writers, research agents, and trend dashboards. Instead of hand-maintaining a scraper per site, you run one actor and get the latest items from HuggingFace papers and blog, the Anthropic and Google AI newsrooms, and The Decoder as uniform JSON records — ready to rank, summarize, alert on, or pipe into a RAG index.

What you get

FieldMeaning
titlePaper, post, or article headline
urlCanonical link on the source site
categorypapers, blog, labs, or news — set per source
sourceSource domain, e.g. huggingface.co
fetched_atUTC timestamp of the run (ISO 8601)
extractionExtractor version tag (selector_free_v1)

Quick start

{
"sources": [
{ "url": "https://huggingface.co/papers", "category": "papers" },
{ "url": "https://huggingface.co/blog", "category": "blog" },
{ "url": "https://www.anthropic.com/news", "category": "labs" }
],
"maxItemsPerSource": 25
}

This returns up to 75 fresh items (25 per source), typically in under a minute. Omit sources entirely to use the full five-source default set, which adds the Google AI blog and The Decoder.

Output example

{
"category": "papers",
"title": "Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution",
"url": "https://huggingface.co/papers/2606.10917",
"source": "huggingface.co",
"fetched_at": "2026-06-10T14:12:08.421337+00:00",
"extraction": "selector_free_v1"
}

Why this one

  • Selector-free extraction. Titles are pulled by link-text shape and URL structure rather than page-specific CSS selectors, so the site redesigns that break conventional scrapers do not break this one.
  • Layout drift is flagged, never hidden. A source that suddenly yields zero items is marked zero_yield_check_layout in the HEALTH report instead of quietly shrinking your feed.
  • Papers, labs, and press in one schema. The five default sources cover research papers, official lab announcements, and AI journalism, each record tagged with its category.
  • Bring your own sources. Pass any list of {url, category} pages; the same robots check, retry logic, and extraction apply to every source you add.
  • Fresh by design. Each run is a live snapshot of the source pages — schedule it hourly or daily and the radar stays current.

Compliance and reliability

Topsail actors are built compliance-first and ship with self-healing plumbing:

  • robots.txt is always respected — fail-closed. If a robots check cannot complete, the source is skipped, never scraped. There is no input to turn this off.
  • Sources are public listing and newsroom pages — HuggingFace papers and blog, Anthropic news, the Google AI blog, and The Decoder — pages these publishers serve openly to every visitor, with no account, paywall, or personal data involved.
  • Transient failures retry once with backoff; persistent failures are reported, not hidden.
  • Every run writes a per-source HEALTH report to the key-value store, so you can see exactly which sources delivered and which were blocked, empty, or erroring.
  • No PII, no paywalled or login-gated content, no circumvention.

Pricing

Pay per result: $0.50 per 1,000 dataset items — one item is one paper, post, or article. Sources that come back robots-blocked, erroring, or empty add nothing to the dataset and cost nothing — you pay only for delivered records. A typical default run of around 100 items costs about $0.05.

Honest limits

  • Titles and canonical links only — no abstracts, authors, publication dates, or article text. fetched_at is the run timestamp, not the publish date.
  • Extraction expects headline-shaped link text (at least 4 words and 24 characters), so very short titles can be missed and an occasional non-article link can slip through.
  • Only same-domain links are collected from each source page.
  • Pages that render their listings entirely with JavaScript yield zero items; the run flags them in HEALTH rather than failing.
  • No cross-run deduplication or diff detection — each run is a full snapshot. Dedupe by url downstream if you ingest continuously.

FAQ

Can I use this as an ML papers API? Yes. Trigger runs on a schedule through the Apify API and read the dataset as JSON or CSV — a lightweight ML papers API without maintaining your own scraper.

How fresh is the AI research feed? Each run is a live snapshot of the source pages at run time. Schedule the actor hourly or daily to keep an always-current AI news feed.

Can I add my own sources? Yes. sources accepts any list of {url, category} pages. The robots check and selector-free extraction apply to every source you add; blog-style listing pages work best.

Does it return abstracts or full article text? No — titles and canonical links only. Pair it with Topsail's Site to Markdown actor when you need full LLM-ready page content.

What happens when a source site redesigns? Usually nothing: extraction keys on link-text shape and URL structure, not page-specific selectors. If a source still drops to zero items, the run flags it as zero_yield_check_layout in the HEALTH report.

More compliant data feeds from Topsail