SlashDot Crawler
Pricing
$10.00 / 1,000 results
Go to Apify Store
SlashDot Crawler
Extract comprehensive data from SlashDot.org, the premier technology news aggregator. This actor scrapes detailed article content, author information, publication dates, comment counts, popularity indicators, source links, and department tags from SlashDot's main sections.
5.0 (3)
Pricing
$10.00 / 1,000 results
0
1
1
Last modified
3 days ago
SlashDot Technology News Scraper
This Apify actor scrapes technology news articles from SlashDot.org, extracting comprehensive information about articles, their content, engagement metrics, and community discussions.
Features
- Comprehensive Article Data: Scrapes detailed information about technology news articles
- Content Analysis: Extracts full article content, summaries, and metadata
- Engagement Metrics: Collects comment counts, scores, views, and ratings
- Community Features: Gathers comments, discussions, and user interactions
- Categorization: Extracts sections, tags, and topic classifications
- Related Content: Finds related articles and cross-references
- Filtering Options: Supports filtering by sections and sorting methods
- HTML Debugging: Saves HTML content for selector analysis during development
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
maxArticles | Integer | 100 | Maximum number of articles to scrape |
scrapeDetails | Boolean | true | Whether to scrape detailed article pages |
sections | Array | [] | List of sections to filter by |
sortBy | String | "latest" | Sort method (latest, popular, most_commented) |
Output Data
Each article record includes:
Basic Information
article_id: Unique article identifiertitle: Article titlesummary: Article summary/teaserurl: URL to the full articleimage_url: Article thumbnail/preview image URL
Author and Publication
author: Article author namepublished_date: When the article was publishedsection: Article section/category
Categorization
tags: Array of tags and labels
Engagement Metrics
comment_count: Number of commentsscore: Article score/ratingviews: Number of views
Timestamps
scraped_at: When the data was scraped
Detailed Information (if scrapeDetails=true)
full_content: Complete article contentparagraphs: Array of article paragraphsrelated_articles: Array of related articles with title and URLcomments: Array of comments with text, author, date, and scoremedia_files: Array of media files with URL, type, and alt textsource_links: Array of external source linksmetadata: Article metadata from meta tags
Metadata
source: Source website (slashdot.org)
Usage Examples
Basic Usage
{"maxArticles": 50,"scrapeDetails": true}
Filtered by Section
{"maxArticles": 200,"scrapeDetails": true,"sections": ["technology", "science"],"sortBy": "popular"}
Most Commented Articles
{"maxArticles": 100,"scrapeDetails": true,"sortBy": "most_commented"}
Quick Scraping (No Details)
{"maxArticles": 500,"scrapeDetails": false,"sortBy": "latest"}
Development Features
HTML Debugging
During development, the scraper saves HTML content to the key-value store for selector analysis:
debug_slashdot_html: Contains the HTML content of the main page
Error Handling
- Comprehensive error handling with detailed logging
- Graceful handling of missing elements
- Retry logic for failed requests
Browser Automation
- Uses Playwright for reliable browser automation
- Handles dynamic content loading
- Implements proper delays and waits
Installation
- Install dependencies:
$pip install -r requirements.txt
- Install Playwright browsers:
$playwright install chromium
- Run the scraper:
$python -m src
Docker Usage
docker build -t slashdot-scraper .docker run -e APIFY_TOKEN=your_token slashdot-scraper
Notes
- The scraper respects rate limits and implements delays between requests
- HTML content is saved for debugging purposes during development
- The scraper handles various article listing layouts and structures
- All URLs are properly resolved and normalized
- Comment extraction includes author information and engagement metrics
- The scraper can handle both article listings and detailed article pages
