News Website Crawler & Article Extractor avatar

News Website Crawler & Article Extractor

Try for free

2 hours trial then $20.00/month - No credit card required now

Go to Store
News Website Crawler & Article Extractor

News Website Crawler & Article Extractor

xtech/news-source-crawler
Try for free

2 hours trial then $20.00/month - No credit card required now

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

Developer
Maintained by Community

Actor Metrics

  • 21 monthly users

  • No reviews yet

  • 2 bookmarks

  • >99% runs succeeded

  • 13 days response time

  • Created in Feb 2025

  • Modified 10 days ago

News Source URL

urlstringRequired

The URL of the news website you want to crawl (e.g., https://www.cnn.com).

Maximum Articles

maxArticlesintegerOptional

The maximum number of articles to scrape. Leave blank for no limit.

Default value of this property is 100

Language

languagestringOptional

The language of the articles as a 2-letter ISO code (e.g., 'en', 'es').

Default value of this property is "en"

Keyword Search [Optional]

keywordSearchstringOptional

Filter articles containing specific keywords. Leave blank to get all articles. For advanced searches, you can use AND, OR, NOT operators (e.g., 'climate AND (change OR crisis) NOT politics').

Extract Summary & Keywords

extractSummarybooleanOptional

Generate article summaries and keywords using NLP.

Default value of this property is true

Include Images

includeImagesbooleanOptional

Whether to include image URLs in the results.

Default value of this property is true

Minimum Word Count

minWordCountintegerOptional

Skip articles that have fewer words than this number. Use 0 for no minimum.

Default value of this property is 0

Advanced Options

advancedOptionsobjectOptional

Additional configuration options for advanced users.

Proxy Configuration

proxyConfigurationobjectOptional

Proxy settings. Use Apify Proxy for automatic proxy management.

Default value of this property is {}