Article Text Extractor avatar
Article Text Extractor

Pricing

Pay per usage

Go to Store
Article Text Extractor

Article Text Extractor

mtrunkat/article-text-extractor

Developed by

Marek Trunkát

Maintained by Community

Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

5.0 (1)

Pricing

Pay per usage

12

Monthly users

16

Runs succeeded

98%

Last modified

a year ago

MT

Possible to preserve images in-line?

Closed
AmateurHr opened this issue
9 months ago

The article text extraction is working great. Is it possible to also extract any images/charts in the article and have those come through as well? Perhaps this could be a toggle in the article extractor?

mtrunkat avatar

Hello! It's not currently possible - if you want, then I can open-source the code, and you can add the functionality there.

But honestly, it's quite and old approach and I'd recommend you to try out the following Actor which support the image extraction - https://apify.com/apify/website-content-crawler

MT

AmateurHr

9 months ago

Thanks for the suggestion!

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.