Any Website URL to Article Summarizer
Pricing
from $4.99 / 1,000 results
Any Website URL to Article Summarizer
Extract and summarize articles from any website URL. Returns title, author, publish date, word count, reading time, full text, and a concise AI-style summary using extractive summarization.
Pricing
from $4.99 / 1,000 results
Rating
0.0
(0)
Developer
Coding Frontned
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Extract and summarize article content from any website URL. Works on news sites, blogs, Wikipedia, Medium, documentation, and more.
Uses extractive summarization (selects the most important sentences from the article itself) โ no external AI API key required.
Features
- ๐ฐ Extract articles from any URL โ news sites, blogs, Wikipedia, Medium, etc.
- ๐ Automatic extractive summary generation (selects most informative sentences)
- ๐ Key points โ top 5 most important sentences from the article
- ๐ค Extracts title, author, publish date, description, and hero image
- ๐ Word count and reading time estimation
- ๐ No AI API key required โ all summarization happens locally
- ๐ Supports multiple URLs per run
- ๐พ Optional: include full cleaned article text in output
How Summarization Works
This actor uses extractive summarization:
- Article text is cleaned and split into sentences
- Each sentence is scored by word frequency (TF-style scoring)
- The top-scoring sentences are selected and returned in their original reading order
- Position bias boosts early sentences (intros are typically more important)
This approach works across all languages and domains without requiring an LLM or external API.
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | array | required | List of article URLs to summarize |
summaryLength | string | "medium" | short (3 sentences), medium (5), long (8) |
includeFullText | boolean | false | Include full cleaned article text in output |
maxItems | integer | 10 | Maximum number of articles to process |
Example Input
{"urls": ["https://en.wikipedia.org/wiki/Artificial_intelligence","https://techcrunch.com/2024/01/01/sample-article/"],"summaryLength": "medium","includeFullText": false,"maxItems": 10}
Output
Each dataset record represents one summarized article:
| Field | Type | Description |
|---|---|---|
position | integer | Position in results |
url | string | Article URL |
domain | string | Website domain (e.g. "techcrunch.com") |
title | string | Article title |
author | string|null | Article author name |
publishDate | string|null | Publish date (ISO format or raw string) |
description | string|null | Meta description or excerpt |
summary | string | Extractive summary of the article |
keyPoints | array | Top 5 key sentences from the article |
wordCount | integer | Total word count |
readingTime | string | Estimated reading time (e.g. "5 min read") |
image | string|null | Hero image URL (og:image) |
siteName | string|null | Website name (og:site_name) |
language | string|null | Document language code |
fullText | string | Full cleaned article text (if includeFullText=true) |
scrapedAt | string | ISO 8601 scrape timestamp |
Example output
{"position": 1,"url": "https://en.wikipedia.org/wiki/Machine_learning","domain": "en.wikipedia.org","title": "Machine learning - Wikipedia","author": null,"publishDate": null,"description": "Machine learning (ML) is a field of study...","summary": "Machine learning is a subset of artificial intelligence...","keyPoints": ["Machine learning models are often vulnerable to...", "..."],"wordCount": 9653,"readingTime": "48 min read","image": null,"siteName": null,"language": "en","scrapedAt": "2025-08-01T12:00:00.000Z"}
Dataset Views
- Articles Overview โ table with title, author, date, word count, reading time, URL, and summary
- Summaries โ focused view showing title, summary, key points, URL, and domain
Technical Notes
- Uses real Google Chrome browser (Playwright) for handling JavaScript-rendered pages
- Fingerprint injection for natural browser behavior
- Article content is extracted using a multi-selector heuristic that prioritizes
<article>,[itemprop="articleBody"], and common blog/CMS CSS classes - Wikipedia
[edit]and footnote[1]markers are automatically removed - Reference sections (
.reflist,.references) are removed from Wikipedia pages - For paywalled articles, only publicly visible content is extracted
License
Apache-2.0