Smart Article Extractor avatar
Smart Article Extractor

Pricing

Pay per usage

Go to Store
Smart Article Extractor

Smart Article Extractor

lukaskrivka/article-extractor-smart

Developed by

Lukáš Křivka

Maintained by Apify

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

4.7 (6)

Pricing

Pay per usage

103

Monthly users

280

Runs succeeded

>99%

Response time

36 days

Last modified

6 days ago

RL

Not returning any results for article on a particular website

Closed
ralic opened this issue
9 months ago

Ok so TechCrunch works fine and most of the famous ones as well.

But for example this article is returning no text: https://www.netokracija.rs/startech-2024-217152

And this is taking too long: https://www.netokracija.rs/garaza-open-source-documentation-218678

RL

ralic

9 months ago

Here's another one job: https://console.apify.com/view/runs/d43yXragnqMqyK4CD

I've removed "what is an article" JSON, it had number of dashes and date set. But still nothing. The job above returns 1 word.

milunnn avatar

Hi, thanks for your patience.

The extractor is not perfect and sometimes struggles to locate all fields automatically, especially if the content has a non-standard structure.

There is a way to override the parser using the Extend Output Function field, like this:

1($) => {
2    const result = {};
3    result.text = $('.post__content').text();
4
5    return result;
6}

This should yield you the correct result for this website.

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.