Smart Article Extractor avatar
Smart Article Extractor
Try for free

No credit card required

View all Actors
Smart Article Extractor

Smart Article Extractor

lukaskrivka/article-extractor-smart
Try for free

No credit card required

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

AF

Does not support Hebrew

Closed

abiding_flare opened this issue
a year ago

I've tried to run this scrapper on Hebrew websites and it didn't work. Is there a way to add Hebrew?

lukaskrivka avatar

Hi Makan,

I'm sorry but this scraper very likely will perform poorly for Hebrew (it might work on some websites).

For less detailed parsing, you can try https://apify.com/apify/website-content-crawler which should work better for non-English

AF

abiding_flare

a year ago

Thank you for your quick response, I do need all of the parsing. It is an amazing tool and exactly what I need. (I tried it in English with great results).

AF

abiding_flare

a year ago

Could this help with adjusting the scrapper to work in Hebrew? https://stackoverflow.com/questions/1365510/where-can-i-find-a-list-of-hebrew-stop-words

lukaskrivka avatar

I don't think that will be enough. There is quite a complicated parsing library behind it. If you know some good for Hebrew, we could add it as a backend.

AF

abiding_flare

a year ago

Will this help? https://github.com/amir-zeldes/HebPipe Thank you

AF

abiding_flare

a year ago

Hi, Do you think this https://github.com/amir-zeldes/HebPipe could help? I tried to run the scraper with no minimum words per article and got some information but not the full text. https://console.apify.com/view/runs/A4SU2iteF1ZF8Mkg3

I would appreciate if this could work for Hebrew.

AF

abiding_flare

a year ago

Hi, Is there any update on this? Thank you!

lukaskrivka avatar

Hi Makam, I'm sorry but this is a big feature and we don't see enough demand to justify the development cost just yet.

AF

abiding_flare

a year ago

Hi, Is there a way for us to pay for such a development?

lukaskrivka avatar

Hi Makam, You have 2 options

  1. Ask some of our partners or freelancers to build this from scratch.
  2. Reach out to https://apify.com/enterprise

The best bet for Hebrew is probably using GPT so maybe using https://apify.com/drobnikj/gpt-scraper would help here

lukaskrivka avatar

I will close this issue now, we will keep your request in mind for this scraper in case an easier path forward is found

Developer
Maintained by Apify
Actor metrics
  • 194 monthly users
  • 47 stars
  • 99.9% runs succeeded
  • 1.9 days response time
  • Created in Nov 2019
  • Modified 15 days ago
Categories