Article Text Extractor avatar
Article Text Extractor

Pricing

Pay per usage

Go to Store
Article Text Extractor

Article Text Extractor

Developed by

Marek Trunkát

Marek Trunkát

Maintained by Community

Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

5.0 (1)

Pricing

Pay per usage

12

Total users

1K

Monthly users

13

Runs succeeded

99%

Last modified

2 years ago

MT

Possible to preserve images in-line?

Closed

AmateurHr opened this issue
a year ago

The article text extraction is working great. Is it possible to also extract any images/charts in the article and have those come through as well? Perhaps this could be a toggle in the article extractor?

mtrunkat avatar

Hello! It's not currently possible - if you want, then I can open-source the code, and you can add the functionality there.

But honestly, it's quite and old approach and I'd recommend you to try out the following Actor which support the image extraction - https://apify.com/apify/website-content-crawler

MT

AmateurHr

a year ago

Thanks for the suggestion!