Pricing

$15.00/month + usage

Go to Store

Smart Article Scraper - Text, Data & Insights

Try for free

Developed by

Xtech

Unlock valuable insights from any article! Get clean text, publication data, keywords, summaries, and more. Ideal for research, content marketing, and competitive analysis. Fast, reliable, and easy to use.

0.0 (0)

Pricing

$15.00/month + usage

Total users

Monthly users

Runs succeeded

>99%

Last modified

6 months ago

News

SEO tools

Jobs

Article Scraper & News Content Extractor 📰🚀

Extract clean, structured data from news articles and blog posts with this powerful Apify Actor. Get article text, metadata, keywords, summaries, and more – perfect for content analysis, market research, news aggregation, and SEO monitoring. No coding required!

Features ✨

Comprehensive Article Extraction 📰 Get the full article text, cleanly extracted from the webpage
Key Metadata 📅 Retrieve publication date, author(s), and source URL
SEO & Content Analysis 🔍 Extract keywords, meta descriptions, and automatically generated summaries
Multimedia Extraction 🖼️ Get links to the main image, all images, and embedded videos
Language Detection 🌐 Automatically identifies the language of the article
Flexible Input 🔗 Use a list of URLs to scrape multiple articles
Proxy Support ⚙️ Use Apify Proxy or custom proxy URLs for reliable scraping
Customizable ⚙️ Set request timeout and user agent
Analysis-Ready Data (JSON) 💾 Structured data output, perfect for analysis and integration
Error Handling ✅ Robust error handling with informative messages

Why Use This Article Scraper? 🤔

This Actor is your one-stop solution for extracting valuable data from online articles. Whether you're a marketer tracking brand mentions, a researcher collecting data for analysis, or a developer building a news aggregation app, this tool saves you time and effort.

Designed for:

Speed: Get data quickly and efficiently
Accuracy: Reliable data extraction, even from complex websites
Ease of Use: No coding required – just provide the URLs
Scalability: Handles both small and large scraping tasks

Data Output 📦

The Actor returns a JSON dataset with the following fields for each article:

Field	Description
`articleURL`	The URL of the scraped article
`sourceURL`	The base URL of the website
`articleLanguage`	The language of the article (e.g., "en", "es")
`articleTitle`	The title of the article
`articleAuthors`	A comma-separated list of the article's authors
`articlePublishDate`	The publication date of the article (ISO 8601 format)
`articleText`	The full text content of the article
`articleTopImage`	The URL of the main image of the article
`articleAllImages`	A comma-separated list of URLs for all images found
`articleVideos`	A comma-separated list of URLs for embedded videos
`articleKeywords`	A comma-separated list of keywords extracted
`articleSummary`	A concise summary of the article
`scrapedAt`	The timestamp of when the article was scraped
`scrapeSuccess`	Boolean indicating scraping success
`articleMetaDescription`	The meta description of the article
`articleMetaKeywords`	A comma-separated list of the meta keywords
`scrapeErrorMessage`	An error message if `scrapeSuccess` is `false`

Example Output

[
  {
    "articleURL": "https://www.example.com/news/article1",
    "sourceURL": "https://www.example.com",
    "articleLanguage": "en",
    "articleTitle": "Example News Article",
    "articleAuthors": "John Doe, Jane Smith",
    "articlePublishDate": "2024-07-27T10:00:00Z",
    "articleText": "This is the full text of the example news article...",
    "articleTopImage": "https://www.example.com/images/article1.jpg",
    "articleAllImages": "https://www.example.com/images/article1.jpg,https://www.example.com/images/article2.png",
    "articleVideos": "",
    "articleKeywords": "news, example, article",
    "articleSummary": "A brief summary of the example news article.",
    "scrapedAt": "2024-07-27T12:34:56Z",
    "scrapeSuccess": true,
    "articleMetaDescription": "An example article for demonstration.",
    "articleMetaKeywords": "example, article, news, demo"
  }
]

Use Cases 💡

Content Marketing & SEO 📢

Competitor Analysis: Track what your competitors are writing about
Content Audits: Analyze your own website's content
Keyword Research: Identify trending topics and keywords
Backlink Monitoring: Find websites that are linking to your content
Brand Monitoring: Get alerts for every mention

Market Research & Business Intelligence 📊

News Aggregation: Build your own news feed
Trend Analysis: Identify emerging trends and topics
Sentiment Analysis: Analyze the tone and sentiment of articles
Information Gathering: Collect data about specific niches

Academic Research 🎓

Data Collection: Gather data for research papers
Text Analysis: Analyze large volumes of text data

Other Applications 🌐

Machine Learning: Train ML models with scraped article data
Content Curation: Find and share relevant articles with your audience

Getting Started 🚀

Find the "Article Scraper & News Content Extractor" in the Apify Store
Configure the input:
- startUrls: An array of URLs to scrape
- language: (Optional) The expected language of the articles (default: "en")
- requestTimeout: (Optional) The timeout for each request (default: 7 seconds)
- fetchImages: (Optional) Whether to fetch images (default: true)
- proxyConfiguration: Select a proxy configuration
- browserUserAgent: (Optional) Custom User-Agent
Run the Actor
Access results in JSON, CSV, Excel, or other formats
Optional: Schedule automatic runs, set up webhooks, or integrate with other Apify Actors

Key Benefits 🏆

Data Quality

✅ Reliable & Accurate: Uses the robust newspaper3k library
✅ Clean Data: Extracts only the relevant information
✅ Structured Format: Easy to use and integrate

Platform Advantages

✅ Scalable & Serverless: Handles large scraping tasks without infrastructure management
✅ Cost-Effective: Pay only for what you use
✅ Full Apify Integration: Seamlessly connects with other Apify tools
✅ User-Friendly: No coding required
✅ Automated Updates: The Actor is maintained and updated regularly

Start extracting valuable data from articles today! ➡️

On this page

Article Scraper & News Content Extractor 📰🚀

Share Actor:

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Xtech

128

Smart Article Extractor

lukaskrivka/article-extractor-smart

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

Lukáš Křivka

5.4K

4.7

Article Content Extractor 📄

easyapi/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

EasyApi

🤖 Any Website URL to Article Summarizer

easyapi/any-website-url-to-article-summarizer

Transform any article, blog post, or web content into concise, AI-powered summaries. Get key insights and main points instantly with smart text analysis and markdown formatting. Perfect for researchers, content creators, and busy professionals who need quick, accurate content digests.

EasyApi

5.0

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

Advanced News Scraper

dorcy/advanced-news-scraper

This scraper is crafted to extract the latest news articles based on custom search queries, providing a wealth of information, including article titles, sources, publication dates, full article text, and AI-generated summary.

Dorcy Shema

205

Article Text Extractor

mtrunkat/article-text-extractor

Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

Marek Trunkát

5.0

Ultimate Articles Extractor

web.harvester/ultimate-articles-extractor

A powerful and modular web scraping tool designed to extract content from any webpage, article, or news site. Get clean, structured data from any website with optimized extraction algorithms, anti-bot detection avoidance, and proxy support.

Web Harvester

5.0

Google News Scraper (Pay Per Result)

data_xplorer/google-news-scraper-fast

⚡️ Extract real-time news including Images and Descriptions from Google News with our powerful scraper. Get comprehensive structured data including titles, sources, publication dates and full article summaries. Perfect for news monitoring, market research and content aggregation.

Data Xplorer

146

5.0

Articles Extractor

web.harvester/articles-extractor

The Article Extractor is an enterprise-grade web scraping solution designed specifically for extracting structured data from news articles, blog posts, and online publications. Our advanced HTML parsing engine delivers unmatched accuracy in content extraction across thousands of websites.

Web Harvester

540

5.0