
News Website Crawler & Article Extractor
Pricing
$20.00/month + usage

News Website Crawler & Article Extractor
Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.
0.0 (0)
Pricing
$20.00/month + usage
3
Total users
94
Monthly users
33
Runs succeeded
>99%
Last modified
8 days ago
News Source URL
url
stringRequired
The URL of the news website you want to crawl (e.g., https://www.cnn.com).
Maximum Articles
maxArticles
integerOptional
The maximum number of articles to scrape. Set to 0 for no limit.
Default value of this property is 100
Language
language
EnumOptional
The language of the articles. Choose from supported languages.
Value options:
"en": string"es": string"fr": string"de": string"it": string"pt": string"ru": string"zh": string"ja": string"ko": string"ar": string"nl": string"sv": string"da": string"no": string"fi": string"pl": string"he": string"tr": string"hu": string"el": string"uk": string"vi": string"id": string"sw": string"fa": string"hi": string"hr": string"bg": string"et": string"mk": string"be": string"sl": string"sr": string"ro": string
Default value of this property is "en"
Keyword Search [Optional]
keywordSearch
stringOptional
Filter articles containing specific keywords. Leave blank to get all articles. Supports boolean operators: AND, OR, NOT with parentheses (e.g., 'climate AND (change OR crisis) NOT politics').
Search in Article Titles
searchInTitle
booleanOptional
Apply keyword search to article titles.
Default value of this property is true
Search in Article Content
searchInContent
booleanOptional
Apply keyword search to article text content.
Default value of this property is true
Case Sensitive Search
caseSensitive
booleanOptional
Make keyword search case sensitive.
Default value of this property is false
Minimum Word Count
minWordCount
integerOptional
Skip articles that have fewer words than this number. Use 0 for no minimum.
Default value of this property is 0
Extract Summary & Keywords
extractSummary
booleanOptional
Generate article summaries and keywords using Natural Language Processing (NLP). Note: This increases processing time.
Default value of this property is true
Include Images
includeImages
booleanOptional
Extract and include image URLs from articles.
Default value of this property is true
Request Timeout (seconds)
requestTimeout
integerOptional
Timeout for each article request in seconds.
Default value of this property is 7