Article Text Extractor avatar

Article Text Extractor

Try for free

No credit card required

View all Actors
Article Text Extractor

Article Text Extractor

mtrunkat/article-text-extractor
Try for free

No credit card required

Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

MT

Possible to preserve images in-line?

Closed

AmateurHr opened this issue
2 months ago

The article text extraction is working great. Is it possible to also extract any images/charts in the article and have those come through as well? Perhaps this could be a toggle in the article extractor?

mtrunkat avatar

Hello! It's not currently possible - if you want, then I can open-source the code, and you can add the functionality there.

But honestly, it's quite and old approach and I'd recommend you to try out the following Actor which support the image extraction - https://apify.com/apify/website-content-crawler

MT

AmateurHr

2 months ago

Thanks for the suggestion!

Developer
Maintained by Community
Actor metrics
  • 23 monthly users
  • 9 stars
  • 94.1% runs succeeded
  • 7.3 hours response time
  • Created in Mar 2018
  • Modified 11 months ago
Categories