News Website Crawler & Article Extractor avatar
News Website Crawler & Article Extractor

Pricing

$20.00/month + usage

Go to Store
News Website Crawler & Article Extractor

News Website Crawler & Article Extractor

Developed by

Xtech

Xtech

Maintained by Community

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.

0.0 (0)

Pricing

$20.00/month + usage

3

Total users

94

Monthly users

33

Runs succeeded

>99%

Last modified

8 days ago

News Source URL

urlstringRequired

The URL of the news website you want to crawl (e.g., https://www.cnn.com).

Maximum Articles

maxArticlesintegerOptional

The maximum number of articles to scrape. Set to 0 for no limit.

Default value of this property is 100

Language

languageEnumOptional

The language of the articles. Choose from supported languages.

Value options:

"en": string"es": string"fr": string"de": string"it": string"pt": string"ru": string"zh": string"ja": string"ko": string"ar": string"nl": string"sv": string"da": string"no": string"fi": string"pl": string"he": string"tr": string"hu": string"el": string"uk": string"vi": string"id": string"sw": string"fa": string"hi": string"hr": string"bg": string"et": string"mk": string"be": string"sl": string"sr": string"ro": string

Default value of this property is "en"

Keyword Search [Optional]

keywordSearchstringOptional

Filter articles containing specific keywords. Leave blank to get all articles. Supports boolean operators: AND, OR, NOT with parentheses (e.g., 'climate AND (change OR crisis) NOT politics').

Search in Article Titles

searchInTitlebooleanOptional

Apply keyword search to article titles.

Default value of this property is true

Search in Article Content

searchInContentbooleanOptional

Apply keyword search to article text content.

Default value of this property is true

Case Sensitive Search

caseSensitivebooleanOptional

Make keyword search case sensitive.

Default value of this property is false

Minimum Word Count

minWordCountintegerOptional

Skip articles that have fewer words than this number. Use 0 for no minimum.

Default value of this property is 0

Extract Summary & Keywords

extractSummarybooleanOptional

Generate article summaries and keywords using Natural Language Processing (NLP). Note: This increases processing time.

Default value of this property is true

Include Images

includeImagesbooleanOptional

Extract and include image URLs from articles.

Default value of this property is true

Request Timeout (seconds)

requestTimeoutintegerOptional

Timeout for each article request in seconds.

Default value of this property is 7

Concurrent Requests

concurrencyintegerOptional

Number of articles to process simultaneously. Higher values = faster crawling but may trigger rate limits.

Default value of this property is 5

Proxy Configuration

proxyConfigurationobjectOptional

Configure proxy settings for reliable scraping. Apify Proxy is recommended for best results.

Default value of this property is {"useApifyProxy":false}