Article Extractor scrapes detailed data from all articles on any website. It automatically recognizes what page is an article. It can be used for extracting news from BBC, CNN, Bloomberg, and other popular news websites.
- Modified
- Users683
- Runs84,607
Optional
boolean
This option is only viable for smaller runs. If you plan to use this at large scale, use `onlyNewArticlesPerDomain` instead. If true, will scrape only new articles each you run it. All URLs you scraped are saved in dataset called `articles-state` and are compared with new ones.
Optional
boolean
If true, will scrape only new articles each you run it. All URLs you scraped ar and are compared with new ones. Scraped articles are saved in one dataset per each domain, datasets are named 'ARTICLES-SCRAPED-domain'
Optional
string