Smart Article Extractor
No credit card required
Smart Article Extractor
No credit card required
📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.
Do you want to learn more about this Actor?
Get a demoOk so TechCrunch works fine and most of the famous ones as well.
But for example this article is returning no text: https://www.netokracija.rs/startech-2024-217152
And this is taking too long: https://www.netokracija.rs/garaza-open-source-documentation-218678
Here's another one job: https://console.apify.com/view/runs/d43yXragnqMqyK4CD
I've removed "what is an article" JSON, it had number of dashes and date set. But still nothing. The job above returns 1 word.
Hi, thanks for your patience.
The extractor is not perfect and sometimes struggles to locate all fields automatically, especially if the content has a non-standard structure.
There is a way to override the parser using the Extend Output Function field, like this:
1($) => { 2 const result = {}; 3 result.text = $('.post__content').text(); 4 5 return result; 6}
This should yield you the correct result for this website.
Actor Metrics
197 monthly users
-
65 stars
>99% runs succeeded
1.2 days response time
Created in Nov 2019
Modified 4 months ago