Pricing

$15.00/month + usage

Try for free

Go to Apify Store

Articles Extractor

Try for free

The Article Extractor is an enterprise-grade web scraping solution designed specifically for extracting structured data from news articles, blog posts, and online publications. Our advanced HTML parsing engine delivers unmatched accuracy in content extraction across thousands of websites.

Pricing

$15.00/month + usage

Rating

5.0

(2)

Developer

Web Harvester

Actor stats

Bookmarked

732

Total users

Monthly active users

a year ago

Last modified

Articles Extractor API

Below, you can find a list of relevant HTTP API endpoints for calling the Articles Extractor Actor. For this, you’ll need an Apify account. Replace <YOUR_API_TOKEN> in the URLs with your Apify API token, which you can find under Integrations in Apify Console. For details, see the API reference.

Run Actor

POST
https://api.apify.com/v2/acts/web.harvester~articles-extractor/runs?token=<YOUR_API_TOKEN>

Note: By adding the method=POST query parameter, this API endpoint can be called using a GET request and thus used in third-party webhooks. Please refer to our Run Actor API documentation.

Run Actor synchronously and get dataset items

POST
https://api.apify.com/v2/acts/web.harvester~articles-extractor/run-sync-get-dataset-items?token=<YOUR_API_TOKEN>

Note: This endpoint supports both POST and GET request methods. However, only the POST method allows you to pass input data. For more information, please refer to our Run Actor synchronously and get dataset items API documentation.

Get Actor

GET
https://api.apify.com/v2/acts/web.harvester~articles-extractor?token=<YOUR_API_TOKEN>

For more information, please refer to our Get Actor API documentation.

Actors can be used to scrape web pages, extract data, or automate browser tasks. Use the Articles Extractor API programmatically via the Apify API.

You can choose from:

Articles Extractor API in Python

Articles Extractor API in JavaScript

Articles Extractor API through CLI

Articles Extractor OpenAPI definition

You can start Articles Extractor with the Apify API by sending an HTTP POST request to the Run Actorendpoint. An Actor’s input and its content type can be passed as a payload of the POST request, and additional options can be specified using URL query parameters. The Articles Extractor is identified within the API by its ID, which is the creator’s username and the name of the Actor.

When the Articles Extractor run finishes you can list the data from its default dataset(storage) via the API or you can preview the data directly on Apify Console.

Smart Article Extractor

lukaskrivka/article-extractor-smart

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

Lukáš Křivka

7.4K

4.1

Article Content Extractor 📄

easyapi/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

EasyApi

106

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

Data Pilot

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Xtech

371

4.8

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

169

Article Extractor & News Scraper

web.harvester/article-extractor-news-scraper

Extract articles from any news site, blog, or webpage. Get title, full text, author, date, images & metadata using 7 extraction engines (Newspaper4k, Trafilatura, Goose3). Anti-bot bypass, proxy rotation, automatic fallback. Perfect for news monitoring, NLP datasets & content aggregation.

Web Harvester

5.0

Surfline Forecast

fortuitous_pirate/surfline-forecast

Fortuitous Pirate

Universal News Article Intelligence Agent

workhard3000/news-intelligence-rag-extractor

High-fidelity news normalization for AI & Agentic RAG. Extract clean Markdown, full-text, and metadata from premium domains (Bloomberg, Wall Street Journal, Financial Times, New York Times, Washington Post, etc.). Success-only billing, only pay when full-text is verified.

WorkHard3000

5.0

New York Times Scraper

theo/new-york-times-scraper

Scrape news data from nytimes.com with this unofficial API. Extract articles, monitor their popularity and performance and automate the fight against fake news. Filter the results by authors, topics, categories, or publication dates. Preview or download the results in your preferred format.

Theo Vasilis

224

Universal Article Scraper

universal_scraping/universal-article-scraper

Universal article scraper for news websites, blogs, etc. It can scrape articles from multiple websites simultaneously, including metadata such as title, content, publication date, image, and author.