
Article Text Extractor
No credit card required

Article Text Extractor
No credit card required
Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.
Possible to preserve images in-line?
The article text extraction is working great. Is it possible to also extract any images/charts in the article and have those come through as well? Perhaps this could be a toggle in the article extractor?

Hello! It's not currently possible - if you want, then I can open-source the code, and you can add the functionality there.
But honestly, it's quite and old approach and I'd recommend you to try out the following Actor which support the image extraction - https://apify.com/apify/website-content-crawler
AmateurHr
Thanks for the suggestion!
Actor Metrics
33 monthly users
-
12 bookmarks
>99% runs succeeded
Created in Mar 2018
Modified a year ago