Smart Article Scraper - Text, Data & Insights avatar

Smart Article Scraper - Text, Data & Insights

Try for free

1 day trial then $15.00/month - No credit card required now

Go to Store
Smart Article Scraper - Text, Data & Insights

Smart Article Scraper - Text, Data & Insights

xtech/article-extractor
Try for free

1 day trial then $15.00/month - No credit card required now

Unlock valuable insights from any article! Get clean text, publication data, keywords, summaries, and more. Ideal for research, content marketing, and competitive analysis. Fast, reliable, and easy to use.

Article Scraper & News Content Extractor πŸ“°πŸš€

Extract clean, structured data from news articles and blog posts with this powerful Apify Actor. Get article text, metadata, keywords, summaries, and more – perfect for content analysis, market research, news aggregation, and SEO monitoring. No coding required!

Features ✨

  • Comprehensive Article Extraction πŸ“° Get the full article text, cleanly extracted from the webpage
  • Key Metadata πŸ“… Retrieve publication date, author(s), and source URL
  • SEO & Content Analysis πŸ” Extract keywords, meta descriptions, and automatically generated summaries
  • Multimedia Extraction πŸ–ΌοΈ Get links to the main image, all images, and embedded videos
  • Language Detection 🌐 Automatically identifies the language of the article
  • Flexible Input πŸ”— Use a list of URLs to scrape multiple articles
  • Proxy Support βš™οΈ Use Apify Proxy or custom proxy URLs for reliable scraping
  • Customizable βš™οΈ Set request timeout and user agent
  • Analysis-Ready Data (JSON) πŸ’Ύ Structured data output, perfect for analysis and integration
  • Error Handling βœ… Robust error handling with informative messages

Why Use This Article Scraper? πŸ€”

This Actor is your one-stop solution for extracting valuable data from online articles. Whether you're a marketer tracking brand mentions, a researcher collecting data for analysis, or a developer building a news aggregation app, this tool saves you time and effort.

Designed for:

  • Speed: Get data quickly and efficiently
  • Accuracy: Reliable data extraction, even from complex websites
  • Ease of Use: No coding required – just provide the URLs
  • Scalability: Handles both small and large scraping tasks

Data Output πŸ“¦

The Actor returns a JSON dataset with the following fields for each article:

FieldDescription
articleURLThe URL of the scraped article
sourceURLThe base URL of the website
articleLanguageThe language of the article (e.g., "en", "es")
articleTitleThe title of the article
articleAuthorsA comma-separated list of the article's authors
articlePublishDateThe publication date of the article (ISO 8601 format)
articleTextThe full text content of the article
articleTopImageThe URL of the main image of the article
articleAllImagesA comma-separated list of URLs for all images found
articleVideosA comma-separated list of URLs for embedded videos
articleKeywordsA comma-separated list of keywords extracted
articleSummaryA concise summary of the article
scrapedAtThe timestamp of when the article was scraped
scrapeSuccessBoolean indicating scraping success
articleMetaDescriptionThe meta description of the article
articleMetaKeywordsA comma-separated list of the meta keywords
scrapeErrorMessageAn error message if scrapeSuccess is false

Example Output

1[
2  {
3    "articleURL": "https://www.example.com/news/article1",
4    "sourceURL": "https://www.example.com",
5    "articleLanguage": "en",
6    "articleTitle": "Example News Article",
7    "articleAuthors": "John Doe, Jane Smith",
8    "articlePublishDate": "2024-07-27T10:00:00Z",
9    "articleText": "This is the full text of the example news article...",
10    "articleTopImage": "https://www.example.com/images/article1.jpg",
11    "articleAllImages": "https://www.example.com/images/article1.jpg,https://www.example.com/images/article2.png",
12    "articleVideos": "",
13    "articleKeywords": "news, example, article",
14    "articleSummary": "A brief summary of the example news article.",
15    "scrapedAt": "2024-07-27T12:34:56Z",
16    "scrapeSuccess": true,
17    "articleMetaDescription": "An example article for demonstration.",
18    "articleMetaKeywords": "example, article, news, demo"
19  }
20]

Use Cases πŸ’‘

Content Marketing & SEO πŸ“’

  • Competitor Analysis: Track what your competitors are writing about
  • Content Audits: Analyze your own website's content
  • Keyword Research: Identify trending topics and keywords
  • Backlink Monitoring: Find websites that are linking to your content
  • Brand Monitoring: Get alerts for every mention

Market Research & Business Intelligence πŸ“Š

  • News Aggregation: Build your own news feed
  • Trend Analysis: Identify emerging trends and topics
  • Sentiment Analysis: Analyze the tone and sentiment of articles
  • Information Gathering: Collect data about specific niches

Academic Research πŸŽ“

  • Data Collection: Gather data for research papers
  • Text Analysis: Analyze large volumes of text data

Other Applications 🌐

  • Machine Learning: Train ML models with scraped article data
  • Content Curation: Find and share relevant articles with your audience

Getting Started πŸš€

  1. Find the "Article Scraper & News Content Extractor" in the Apify Store

  2. Configure the input:

    • startUrls: An array of URLs to scrape
    • language: (Optional) The expected language of the articles (default: "en")
    • requestTimeout: (Optional) The timeout for each request (default: 7 seconds)
    • fetchImages: (Optional) Whether to fetch images (default: true)
    • proxyConfiguration: Select a proxy configuration
    • browserUserAgent: (Optional) Custom User-Agent
  3. Run the Actor

  4. Access results in JSON, CSV, Excel, or other formats

  5. Optional: Schedule automatic runs, set up webhooks, or integrate with other Apify Actors

Key Benefits πŸ†

Data Quality

  • βœ… Reliable & Accurate: Uses the robust newspaper3k library
  • βœ… Clean Data: Extracts only the relevant information
  • βœ… Structured Format: Easy to use and integrate

Platform Advantages

  • βœ… Scalable & Serverless: Handles large scraping tasks without infrastructure management
  • βœ… Cost-Effective: Pay only for what you use
  • βœ… Full Apify Integration: Seamlessly connects with other Apify tools
  • βœ… User-Friendly: No coding required
  • βœ… Automated Updates: The Actor is maintained and updated regularly

Start extracting valuable data from articles today! ➑️

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No stars yet

  • Created in Feb 2025

  • Modified 14 hours ago