Smart Article Extractor
No credit card required
Smart Article Extractor
No credit card required
📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.
Do you want to learn more about this Actor?
Get a demoI've tried to run this scrapper on Hebrew websites and it didn't work. Is there a way to add Hebrew?
Hi Makan,
I'm sorry but this scraper very likely will perform poorly for Hebrew (it might work on some websites).
For less detailed parsing, you can try https://apify.com/apify/website-content-crawler which should work better for non-English
Thank you for your quick response, I do need all of the parsing. It is an amazing tool and exactly what I need. (I tried it in English with great results).
Could this help with adjusting the scrapper to work in Hebrew? https://stackoverflow.com/questions/1365510/where-can-i-find-a-list-of-hebrew-stop-words
I don't think that will be enough. There is quite a complicated parsing library behind it. If you know some good for Hebrew, we could add it as a backend.
Will this help? https://github.com/amir-zeldes/HebPipe Thank you
Hi, Do you think this https://github.com/amir-zeldes/HebPipe could help? I tried to run the scraper with no minimum words per article and got some information but not the full text. https://console.apify.com/view/runs/A4SU2iteF1ZF8Mkg3
I would appreciate if this could work for Hebrew.
Hi, Is there any update on this? Thank you!
Hi Makam, I'm sorry but this is a big feature and we don't see enough demand to justify the development cost just yet.
Hi, Is there a way for us to pay for such a development?
Hi Makam, You have 2 options
- Ask some of our partners or freelancers to build this from scratch.
- Reach out to https://apify.com/enterprise
The best bet for Hebrew is probably using GPT so maybe using https://apify.com/drobnikj/gpt-scraper would help here
I will close this issue now, we will keep your request in mind for this scraper in case an easier path forward is found
Actor Metrics
278 monthly users
-
80 stars
>99% runs succeeded
2.3 days response time
Created in Nov 2019
Modified a month ago