Go to Apify Store
User picture

Адилет Айылчиев

web-architect

MD & Python Developer. I bring 30+ years of surgical precision to code and data extraction. Building reliable, high-quality tools for the AI era

ACTOR STATS

1 public Actor

2 total users

1 monthly user

>99% runs succeeded

🤖 Universal AI Article Scraper

Turn any news article or blog post into clean, structured data for AI.

Building a Custom GPT, RAG pipeline, or market research bot? You need clean text, not messy HTML with ads and navigation menus. This Actor extracts the core content from URLs, strips away the noise, and returns JSON ready for LLM training.

✨ Key Features

  • AI-Ready Output: Get pure text, automatically cleaned of ads, pop-ups, and sidebars.
  • Smart NLP: Automatically generates a Summary and extracts Keywords for every article.
  • Cost-Effective: Works on standard HTTP requests (lightweight & fast).
  • Bulk Processing: Scrape hundreds of articles in one run.

🎯 Perfect For

  • RAG Pipelines: Feed your vector database with high-quality, noise-free context.
  • Custom GPTs: Create knowledge bases from industry news.
  • Competitor Monitoring: Track what competitors are publishing without visiting their sites.
  • Content Curation: Auto-summarize news for Telegram/Discord bots.

🚀 How to Use

  1. Click Run.
  2. Enter the URLs you want to scrape (e.g., from TechCrunch, BBC, Medium, specialized blogs).
  3. (Optional) Set max items or depth if crawling a list.
  4. Download the results in JSON (for developers) or Excel (for analysis).

📦 Input Example

{
"start_urls": [
{ "url": "[https://techcrunch.com/artificial-intelligence/](https://techcrunch.com/artificial-intelligence/)" },
{ "url": "[https://www.bbc.com/news/technology](https://www.bbc.com/news/technology)" }
]
}

Public Actors