data:image/s3,"s3://crabby-images/4ce54/4ce5496d1163d2f76de29177d767ab32ff20aef0" alt="Article Crawler (FoxNews, CNBC, etc.) avatar"
Article Crawler (FoxNews, CNBC, etc.)
Under maintenance
Try for free
No credit card required
Go to Storedata:image/s3,"s3://crabby-images/4ce54/4ce5496d1163d2f76de29177d767ab32ff20aef0" alt="Article Crawler (FoxNews, CNBC, etc.)"
This Actor is under maintenance.
This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?
See alternative Actorsdata:image/s3,"s3://crabby-images/4ce54/4ce5496d1163d2f76de29177d767ab32ff20aef0" alt="Article Crawler (FoxNews, CNBC, etc.)"
Article Crawler (FoxNews, CNBC, etc.)
vee.theengineer/article-crawler-2
Try for free
No credit card required
Scraping articles from FoxNews and CNBC, (more extensibility in the fur Output Schema
News Article Scraper
Web scraper for extracting structured article data from news websites (currently supports Fox News and CNBC).
Features
- Handles different article page types (main, business, weather)
- Filters undesired URLs/category pages
- Extracts article content, metadata, and related links
- Built with Apify Platform and Puppeteer
Input Format
1{ 2 "urls": [ 3 { "url": "https://www.foxnews.com/politics", "site": "foxnews" }, 4 { "url": "https://www.cnbc.com/technology", "site": "cnbc" } 5 ], 6 "maxPages": 100, 7 "skippingCategoryPages": true 8}
Output Schema
Extracted data structure:
1type RawArticlePage = { 2 url: string; 3 page_title: string; 4 article_title_element?: string; 5 content_elements?: string[]; 6 author_elements?: string[]; 7 article_published_datetime_element?: string; 8 article_updated_datetime_element?: string; 9 tag_elements?: string[]; 10 tags?: string[]; 11 category_element?: string; 12 article_title?: string; 13 content?: string[]; 14 authors?: string[]; 15 article_published_datetime?: string; 16 article_updated_datetime?: string; 17 category?: string; 18 other_article_links?: string[]; 19 other_links?: string[]; 20 scrape_status: ScrapeStatus; 21 scraped_at: DateTime; 22};