Smart Article Scraper - Text, Data & Insights avatar
Smart Article Scraper - Text, Data & Insights

Pricing

$15.00/month + usage

Go to Apify Store
Smart Article Scraper - Text, Data & Insights

Smart Article Scraper - Text, Data & Insights

Developed by

Xtech

Xtech

Maintained by Community

๐—”๐—ฟ๐˜๐—ถ๐—ฐ๐—น๐—ฒ ๐—ฆ๐—ฐ๐—ฟ๐—ฎ๐—ฝ๐—ฒ๐—ฟ & ๐—–๐—ผ๐—ป๐˜๐—ฒ๐—ป๐˜ ๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ - Extract clean text, metadata, keywords & summaries from any web article or blog post. Perfect for ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต, ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฒ๐˜๐—ถ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐—ถ๐˜€ & ๐—ฐ๐—ผ๐—ป๐˜๐—ฒ๐—ป๐˜ ๐—บ๐—ฎ๐—ฟ๐—ธ๐—ฒ๐˜๐—ถ๐—ป๐—ด.

1.0 (1)

Pricing

$15.00/month + usage

3

61

15

Issues response

50 days

Last modified

16 days ago

Article Scraper & News Content Extractor ๐Ÿ“ฐ๐Ÿš€

Extract clean, structured data from news articles and blog posts with this powerful Apify Actor. Get article text, metadata, keywords, summaries, and more โ€“ perfect for content analysis, market research, news aggregation, and SEO monitoring. No coding required!

Features โœจ

  • Comprehensive Article Extraction ๐Ÿ“ฐ Get the full article text, cleanly extracted from the webpage
  • Key Metadata ๐Ÿ“… Retrieve publication date, author(s), and source URL
  • SEO & Content Analysis ๐Ÿ” Extract keywords, meta descriptions, and automatically generated summaries
  • Multimedia Extraction ๐Ÿ–ผ๏ธ Get links to the main image, all images, and embedded videos
  • Language Detection ๐ŸŒ Automatically identifies the language of the article
  • Flexible Input ๐Ÿ”— Use a list of URLs to scrape multiple articles
  • Proxy Support โš™๏ธ Use Apify Proxy or custom proxy URLs for reliable scraping
  • Customizable โš™๏ธ Set request timeout and user agent
  • Analysis-Ready Data (JSON) ๐Ÿ’พ Structured data output, perfect for analysis and integration
  • Error Handling โœ… Robust error handling with informative messages

Why Use This Article Scraper? ๐Ÿค”

This Actor is your one-stop solution for extracting valuable data from online articles. Whether you're a marketer tracking brand mentions, a researcher collecting data for analysis, or a developer building a news aggregation app, this tool saves you time and effort.

Designed for:

  • Speed: Get data quickly and efficiently
  • Accuracy: Reliable data extraction, even from complex websites
  • Ease of Use: No coding required โ€“ just provide the URLs
  • Scalability: Handles both small and large scraping tasks

Data Output ๐Ÿ“ฆ

The Actor returns a JSON dataset with the following fields for each article:

FieldDescription
articleURLThe URL of the scraped article
sourceURLThe base URL of the website
articleLanguageThe language of the article (e.g., "en", "es")
articleTitleThe title of the article
articleAuthorsA comma-separated list of the article's authors
articlePublishDateThe publication date of the article (ISO 8601 format)
articleTextThe full text content of the article
articleTopImageThe URL of the main image of the article
articleAllImagesA comma-separated list of URLs for all images found
articleVideosA comma-separated list of URLs for embedded videos
articleKeywordsA comma-separated list of keywords extracted
articleSummaryA concise summary of the article
scrapedAtThe timestamp of when the article was scraped
scrapeSuccessBoolean indicating scraping success
articleMetaDescriptionThe meta description of the article
articleMetaKeywordsA comma-separated list of the meta keywords
scrapeErrorMessageAn error message if scrapeSuccess is false

Example Output

[
{
"articleURL": "https://www.example.com/news/article1",
"sourceURL": "https://www.example.com",
"articleLanguage": "en",
"articleTitle": "Example News Article",
"articleAuthors": "John Doe, Jane Smith",
"articlePublishDate": "2024-07-27T10:00:00Z",
"articleText": "This is the full text of the example news article...",
"articleTopImage": "https://www.example.com/images/article1.jpg",
"articleAllImages": "https://www.example.com/images/article1.jpg,https://www.example.com/images/article2.png",
"articleVideos": "",
"articleKeywords": "news, example, article",
"articleSummary": "A brief summary of the example news article.",
"scrapedAt": "2024-07-27T12:34:56Z",
"scrapeSuccess": true,
"articleMetaDescription": "An example article for demonstration.",
"articleMetaKeywords": "example, article, news, demo"
}
]

Use Cases ๐Ÿ’ก

Content Marketing & SEO ๐Ÿ“ข

  • Competitor Analysis: Track what your competitors are writing about
  • Content Audits: Analyze your own website's content
  • Keyword Research: Identify trending topics and keywords
  • Backlink Monitoring: Find websites that are linking to your content
  • Brand Monitoring: Get alerts for every mention

Market Research & Business Intelligence ๐Ÿ“Š

  • News Aggregation: Build your own news feed
  • Trend Analysis: Identify emerging trends and topics
  • Sentiment Analysis: Analyze the tone and sentiment of articles
  • Information Gathering: Collect data about specific niches

Academic Research ๐ŸŽ“

  • Data Collection: Gather data for research papers
  • Text Analysis: Analyze large volumes of text data

Other Applications ๐ŸŒ

  • Machine Learning: Train ML models with scraped article data
  • Content Curation: Find and share relevant articles with your audience

Getting Started ๐Ÿš€

  1. Find the "Article Scraper & News Content Extractor" in the Apify Store

  2. Configure the input:

    • startUrls: An array of URLs to scrape
    • language: (Optional) The expected language of the articles (default: "en")
    • requestTimeout: (Optional) The timeout for each request (default: 7 seconds)
    • fetchImages: (Optional) Whether to fetch images (default: true)
    • proxyConfiguration: Select a proxy configuration
    • browserUserAgent: (Optional) Custom User-Agent
  3. Run the Actor

  4. Access results in JSON, CSV, Excel, or other formats

  5. Optional: Schedule automatic runs, set up webhooks, or integrate with other Apify Actors

Key Benefits ๐Ÿ†

Data Quality

  • โœ… Reliable & Accurate: Uses the robust newspaper3k library
  • โœ… Clean Data: Extracts only the relevant information
  • โœ… Structured Format: Easy to use and integrate

Platform Advantages

  • โœ… Scalable & Serverless: Handles large scraping tasks without infrastructure management
  • โœ… Cost-Effective: Pay only for what you use
  • โœ… Full Apify Integration: Seamlessly connects with other Apify tools
  • โœ… User-Friendly: No coding required
  • โœ… Automated Updates: The Actor is maintained and updated regularly

Start extracting valuable data from articles today! โžก๏ธ