Pricing

from $0.00001 / result

Generic Articles Main Content Extractor

Extract the main content of articles. Input can be article links or pages from which to identify and extract article links. Articles are scraped and cleaned to extract the main text and many useful metadatas. Search terms and date post filters can be applied and highlighted snippets produced.

Pricing

from $0.00001 / result

Rating

0.0

(0)

Developer

LilaK

Actor stats

Bookmarked

Total users

Monthly active users

21 days ago

Last modified

Generic Articles Main Content Extractor

Description

The tool extracts the main content of articles. The input can be direct article urls or page urls from which to extract article links. The tool uses specific algorithms to identify relevant article links and discard navigation links. Each article is scraped and cleaned (remove unimportant text such as navigation links and menus) to extract the main text and many useful metadatas.

Main features

✅ Scrapes article urls
✅ Scrapes links pages and identify relevant article links (customizable feature)
✅ For each scraped article, extract main text (plain text or markdown format) and various metadata (title, description, author, data, categories, tags)
✅ Searches given terms within the content of each article and produce highlighted snippets
✅ Checks if an article has been published since a given date
✅ Outputs results in CSV/JSON

Usage

☑️ Monitor selected websites for technological or economic intelligence
☑️ Keep up to date with the latest trends on a particular topic by monitoring specific websites
☑️ Crawl news or blog websites and build text corpora for various purposes (academic research, machine learning, etc.)

Main Input

➡️ A list of article urls and/or a list of pages with article links (required)
➡️ A set of search terms to look for in each article content (optional)

General Input Configuration

Post Filtering Options Configuration

Output

➡️ A dataset of articles including the main text content and various metadata. The output can be found in the default dataset storage in many formats (JSON, CSV, XML, Excel, RSS, etc).
➡️ Each article includes the following properties: url, title, description, author, source (source name), domain (website domain), date (publication or last updated date), categories (a list of detected categories), tags (a list of detected tags), search_terms (search terms found), search_highlights (highlighted text snippets), valid_date (Check if the article has been published since the given input date), valid (valid article according to the post-filters), text (main content in plain text or markdown format according to the input options)

➡️ If the compute_stats option is set, a dataset including the total count (articles count) for each occuring category, tag or search term is built. The dataset can be displayed by selecting Stats View in Output tab.

Here are some output examples:

Articles table view

Statistics JSON view

Your feedback

If you’ve got any technical feedback, a bug to report or any suggestion to improve the actor usage, please create an issue on the Actor’s Issues tab.

Website Main Content Extractor

sync-network/website-main-content-extractor

Alam

Webpage Text Extractor

automation-lab/webpage-text-extractor

This actor fetches web pages and extracts their clean text content by stripping all HTML tags, scripts, and styles. It identifies the main content area (article, main, etc.), extracts headings structure, page links, and metadata like author, publish date, and language. Use it for LLM input...

Stas Persiianenko

Vrbo Main Link Scraper

decorative_chimta/vrbo-main-link-scraper

Philipp Reuter

📰 Extract Google News Articles — AI & RAG Ready

muhammadafzal/google-news-scraper

Extract Google News articles by keyword, topic, or URL with full-text extraction for AI/RAG pipelines. Get headlines, sources, snippets, images, authors, and clean article text in structured JSON. Export scraped data, run the scraper via API, or integrate with other tools.

Muhammad Afzal

Dev.to Scraper

leftwinglautus/dev-to-scraper

Scrape articles from Dev.to via the official Dev.to API. Fetch latest articles, articles by tag, or articles by user.

Moeeze Hassan

Linkedin Articles Scraper

scraperoka/linkedin-articles-scraper

📌 LinkedIn Articles Scraper extracts high-quality LinkedIn Article data—titles, authors, dates, engagement & content snippets. ⚡ Perfect for B2B research, lead gen, competitive insights & content strategy. Built for accuracy & speed.

Scraperoka

News Articles Scraper

proscraper/news-articles-scraper

Scrape data for news articles. Takes in list of URL's in start_urls and returns the data. Can be used to feed LLM models or training.

Owais Nazir

RSS Content Scraper

zarthur/rss-content-scrape

Full-article content extraction from RSS - Input: Any RSS/Atom feed URL - Scrape:render each article page and extract full content - Clean: Intelligently identifies main content area, removes ads, navigation, footers

Arthur

🤖 Any Website URL to Article Summarizer

easyapi/any-website-url-to-article-summarizer

Transform any article, blog post, or web content into concise, AI-powered summaries. Get key insights and main points instantly with smart text analysis and markdown formatting. Perfect for researchers, content creators, and busy professionals who need quick, accurate content digests.

EasyApi

Analyze Website Content: Extract Keywords and Terminology

nlp_data_lni/analyze-website-content-extract-keywords-and-terminology

The tool analyzes the textual content of a website, scrapes pages, cleans the html, analyze text and extract the terminology (keywords, words, n-grams and seed related keywords). It can be used to identify the main topics covered, analyze competitor content, find new ideas or trends and help for SEO