Smart Article Extractor avatar
Smart Article Extractor

Pricing

Pay per usage

Go to Store
Smart Article Extractor

Smart Article Extractor

lukaskrivka/article-extractor-smart

Developed by

Lukáš Křivka

Maintained by Apify

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

4.7 (6)

Pricing

Pay per usage

103

Monthly users

277

Runs succeeded

>99%

Response time

35 days

Last modified

6 days ago

HE

Timeout and Storage check

Open
hectorlca opened this issue
3 months ago

Could you please help understand how to pass timeout parameter using API? I never see it reflected, even when using the web interface. Additionally, I believe it occasionally fails to check the articles that are already stored.

i have used this actor for many months now, and in the recent weeks, some of my runs have been running indefinitely. I don-t know what happened, but I didn't have to monitor every day because the results were as expected. Now I've had some terrible experiences with the actor running infinitely, costing me loads of money.

Edit: these two runs have identical input:

  1. https://console.apify.com/actors/runs/TrPjXGgDm5PHNiM48#output - 3,099 results. (Not desired)
  2. https://console.apify.com/actors/runs/SJcFnVa96s3OFZy2w#output - 66 results. (Desired).
ondrejklinovsky avatar

Hey,

when starting a new run through API, you can define the timeout with query parameter. For example, to start a run with timeout 60 seconds: POST https://api.apify.com/v2/acts/<ACTOR>/runs?timeout=60. Here's the docs.

The issue with the run is that the website changed the urls of the articles - they added ?amp query parameter. This caused that the actor scraped them again because their URL was different from those there were already stored. We'll need to figure out how to avoid situatoins like this. We cannot ignore the query parameters completely because they may define the article (e.g. ?articleId=edasdas), so this will require more thinking. Thank you for the report, we'll let you know when we have any updates on this. Let me know if you have any questions.

HU

HuongFi

2 months ago

Tôi không thể quét được các bài viết của 1 profile trên X.com

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.