Article Content Extractor πŸ“„ avatar

Article Content Extractor πŸ“„

Try for free

2 hours trial then $19.99/month - No credit card required now

Go to Store
Article Content Extractor πŸ“„

Article Content Extractor πŸ“„

easyapi/article-content-extractor
Try for free

2 hours trial then $19.99/month - No credit card required now

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. πŸ”πŸ“„

Extract clean article content and metadata from any web pages automatically. This actor helps you get structured content from news sites, blogs, and other article-based websites.

Features ✨

  • Extract article content and metadata from any URL
  • Support batch processing of multiple URLs
  • Clean and structured JSON output
  • Built-in rate limiting to avoid overloading target sites
  • Robust error handling and validation
  • Fast and efficient processing

Output Data Structure πŸ“Š

The actor extracts the following information from each article:

  • Title
  • Description
  • Main content (both HTML and plain text)
  • Author
  • Publication date
  • Source domain
  • Featured image URL
  • Related links
  • Tags
  • Scraping timestamp

Use Cases πŸ’‘

  • Content aggregation and syndication
  • News monitoring and analysis
  • Research and data collection
  • Content migration
  • SEO analysis
  • Digital archiving

Limitations ⚠️

  • Respects robots.txt and implements polite scraping
  • 2-second delay between requests to avoid overwhelming target servers
  • URLs must be valid and accessible
  • Content extraction quality depends on page structure

Tips for Best Results πŸ’ͺ

  1. Provide valid, accessible URLs
  2. Use for public content only
  3. Consider target website's terms of service
  4. Monitor execution logs for any issues

Need help or have questions? Feel free to reach out!

Input Example

A full explanation of an input example in JSON.

1{
2    "urls": [
3        "https://cleartax.in/s/gst-hsn-lookup",
4        "https://www.fancode.com/pickleball/schedule"
5    ]
6}

Output sample

The results will be wrapped into a dataset which you can always find in theΒ StorageΒ tab. Here's an excerpt from the data you'd get if you apply the input parameters above:

And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.

1[
2    {
3        "url": "https://www.fancode.com/pickleball/schedule",
4        "title": "Pickleball Schedule - Check International and Domestic matches on FanCode",
5        "description": "ABOUT FANCODEIndia's Premium Live Streaming, Live Scores & Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years....",
6        "content": "<div><p><label>ABOUT FANCODE</label><label>India's Premium Live Streaming, Live Scores &amp; Sports Merchandise Shopping platform FanCode has grown to become one of the most loved and followed all-sports destination in the last few years. The FanCode app has been downloaded by more than 3+ crore users. It offers interactive live streaming of all major sporting events, premier cricket tournaments, women's cricket, live football, basketball, baseball, wrestling, badminton, and other major sports. It also offer real-time match highlights, match videos, cricket videos, India cricket highlights, highlights of today's match, highlights of yesterday's match, cricket data, statistics, cricket analysis, fantasy insights, cricket updates, breaking news from India cricket and world of sports. It also offers sports merchandise for all major sporting leagues and teams from across the world.</label></p></div>",
7        "author": "",
8        "publishedDate": "",
9        "source": "fancode.com",
10        "image": "https://www.fancode.com/skillup-uploads/fc-web/home-page-new-arc/hero-image/v1/hero-image-dweb-v4.png",
11        "links": [
12            "https://www.fancode.com/pickleball/schedule"
13        ],
14        "tags": [],
15        "scrapedAt": "2025-02-05T07:19:26.119Z"
16    },
17    ...
18]
Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No stars yet

  • Created in Feb 2025

  • Modified 13 hours ago