Smart Article Extractor avatar

Smart Article Extractor

Try for free

No credit card required

View all Actors
Smart Article Extractor

Smart Article Extractor

lukaskrivka/article-extractor-smart
Try for free

No credit card required

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

Do you want to learn more about this Actor?

Get a demo
PZ

Wrt previous isse

Closed

pzubkiewicz opened this issue
6 months ago

Sorry, I can't reply to your response.

I tried something similar, but it didn't scrape contents.

{text: $("//*[@id='middle-panel']/article/div[5] | //div[@class='pb-5']").text().trim()}

However, in Chrome this XPath finds the correct section of a webpage.

Could you please explain why?

PZ

pzubkiewicz

6 months ago

I am using | in the XPath so it can handle different HTML structures on this particular page.

lukaskrivka avatar

Hello,

You should be able to reply to the closed issue as well. I cannot make it XPath to work but I don't have much experience with it. I will see if any colleagues can give me advice. How exactly do you run it in Chrome?

lukaskrivka avatar

I realized we don't use the browser to run the parser so only CSS selectors are available.

Developer
Maintained by Apify

Actor Metrics

  • 197 monthly users

  • 65 stars

  • >99% runs succeeded

  • 1.2 days response time

  • Created in Nov 2019

  • Modified 4 months ago

Categories