WSJ Scraper avatar
WSJ Scraper
Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
WSJ Scraper

WSJ Scraper

lucier/wsj-scraper

Scrape news data from wsj.com with this unofficial API. Extract articles, monitor their popularity and performance and automate the fight against fake news. Filter the results by authors, topics, categories, or publication dates. Preview or download the results in your preferred format.

Start URLs

startUrlsarrayRequired

Can be main page URL or any category URLs. Article pages are found and enqueued from these.

Maximum number of articles

maxArticlesPerCrawlintegerOptional

Maximum number of valid articles scraped. The crawler will stop automatically after reaching this number.

Date from

dateFromstringOptional

Only articles from this day to present will be scraped. If empty, all articles will be scraped. Format is YYYY-MM-DD, e.g. 2019-12-31, or Number type e.g. 1 week or 20 days

Only new articles

onlyNewArticlesPerDomainbooleanOptional

If true, will scrape only new articles each time you run it. All URLs you scraped are compared with saved ones. Scraped articles are saved in one dataset per each domain, datasets are named 'ARTICLES-SCRAPED-domain'

Default value of this property is false

Developer
Maintained by Community
Categories