Pricing

Pay per usage

Go to Store

Smart Article Extractor

Try for free

Developed by

Lukáš Křivka

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

4.7 (6)

Pricing

Pay per usage

140

Total users

5.4K

Monthly users

345

Runs succeeded

>99%

Issues response

16 days

Last modified

4 months ago

News

Back to issues Create new issue

Empty results from medium.com

Closed

pzubkiewicz opened this issue

Nothing is returned from that page

2024-04-04T09:58:45.807Z INFO  Adding article URL: https://medium.com/@daniel.castillo_48013/streamlining-event-driven-architecture-documentation-with-event-catalog-a-step-by-step-guide-6d10d95abaa1
2024-04-04T09:58:46.294Z INFO  CheerioCrawler: Starting the crawler.
2024-04-04T09:58:47.526Z WARN  IS NOT VALID ARTICLE --- Reasons: [Article has too few words: 145 (should be at least 150)] --- https://medium.com/@daniel.castillo_48013/streamlining-event-driven-architecture-documentation-with-event-catalog-a-step-by-step-guide-6d10d95abaa1
2024-04-04T09:58:47.599Z INFO  CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-04-04T09:58:47.799Z INFO  CheerioCrawler: Final request statistics: {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":1195,"requestsFinishedPerMinute":37,"requestsFailedPerMinute":0,"requestTotalDurationMillis":1195,"requestsTotal":1,"crawlerRuntimeMillis":1616}
2024-04-04T09:58:47.804Z INFO  CheerioCrawler: Finished! Total 1 requests: 1 succeeded, 0 failed. {"terminal":true}

Lukáš Křivka (lukaskrivka)

Hello,

Thanks for the report. Unfortunately, Medium can be quite complex to parse and the actor only parsed text from the first paragraph.

You can improve that by using the Extend Output Function to point it to the correct text.

One example would be this

($) => {
    return {
        text: $('article p, article ol').text().trim()
    }
}

https://console.apify.com/view/runs/ClHPxNgQaEw3pOYSM

But it would require better selector to get really clean text.

Add comment

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Xtech

128

Articles Extractor

web.harvester/articles-extractor

The Article Extractor is an enterprise-grade web scraping solution designed specifically for extracting structured data from news articles, blog posts, and online publications. Our advanced HTML parsing engine delivers unmatched accuracy in content extraction across thousands of websites.

Web Harvester

540

5.0

Smart Article Scraper - Text, Data & Insights

xtech/article-extractor

Unlock valuable insights from any article! Get clean text, publication data, keywords, summaries, and more. Ideal for research, content marketing, and competitive analysis. Fast, reliable, and easy to use.

Xtech

Article Content Extractor 📄

easyapi/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

EasyApi

Advanced News Scraper

dorcy/advanced-news-scraper

This scraper is crafted to extract the latest news articles based on custom search queries, providing a wealth of information, including article titles, sources, publication dates, full article text, and AI-generated summary.

Dorcy Shema

205

Tech News Article Scraper

inquisitive_sarangi/news-article-scraper

Tech News Article Scraper is a simple yet powerful tool to extract news articles from a variety of popular news websites. Supported The Verge, CNET, Wired, TechCrunch, Ars Technica

API Master

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

News Articles Scraper

proscraper/news-articles-scraper

Scrape data for news articles. Takes in list of URL's in start_urls and returns the data. Can be used to feed LLM models or training.

Owais Nazir

🤖 Any Website URL to Article Summarizer

easyapi/any-website-url-to-article-summarizer

Transform any article, blog post, or web content into concise, AI-powered summaries. Get key insights and main points instantly with smart text analysis and markdown formatting. Perfect for researchers, content creators, and busy professionals who need quick, accurate content digests.

EasyApi

5.0

Ultimate Articles Extractor

web.harvester/ultimate-articles-extractor

A powerful and modular web scraping tool designed to extract content from any webpage, article, or news site. Get clean, structured data from any website with optimized extraction algorithms, anti-bot detection avoidance, and proxy support.

Web Harvester

5.0