SlashDot Crawler avatar
SlashDot Crawler

Pricing

$10.00 / 1,000 results

Go to Apify Store
SlashDot Crawler

SlashDot Crawler

Developed by

Crawler Bros

Crawler Bros

Maintained by Community

Extract comprehensive data from SlashDot.org, the premier technology news aggregator. This actor scrapes detailed article content, author information, publication dates, comment counts, popularity indicators, source links, and department tags from SlashDot's main sections.

5.0 (3)

Pricing

$10.00 / 1,000 results

0

1

1

Last modified

3 days ago

SlashDot Technology News Scraper

This Apify actor scrapes technology news articles from SlashDot.org, extracting comprehensive information about articles, their content, engagement metrics, and community discussions.

Features

  • Comprehensive Article Data: Scrapes detailed information about technology news articles
  • Content Analysis: Extracts full article content, summaries, and metadata
  • Engagement Metrics: Collects comment counts, scores, views, and ratings
  • Community Features: Gathers comments, discussions, and user interactions
  • Categorization: Extracts sections, tags, and topic classifications
  • Related Content: Finds related articles and cross-references
  • Filtering Options: Supports filtering by sections and sorting methods
  • HTML Debugging: Saves HTML content for selector analysis during development

Input Parameters

ParameterTypeDefaultDescription
maxArticlesInteger100Maximum number of articles to scrape
scrapeDetailsBooleantrueWhether to scrape detailed article pages
sectionsArray[]List of sections to filter by
sortByString"latest"Sort method (latest, popular, most_commented)

Output Data

Each article record includes:

Basic Information

  • article_id: Unique article identifier
  • title: Article title
  • summary: Article summary/teaser
  • url: URL to the full article
  • image_url: Article thumbnail/preview image URL

Author and Publication

  • author: Article author name
  • published_date: When the article was published
  • section: Article section/category

Categorization

  • tags: Array of tags and labels

Engagement Metrics

  • comment_count: Number of comments
  • score: Article score/rating
  • views: Number of views

Timestamps

  • scraped_at: When the data was scraped

Detailed Information (if scrapeDetails=true)

  • full_content: Complete article content
  • paragraphs: Array of article paragraphs
  • related_articles: Array of related articles with title and URL
  • comments: Array of comments with text, author, date, and score
  • media_files: Array of media files with URL, type, and alt text
  • source_links: Array of external source links
  • metadata: Article metadata from meta tags

Metadata

  • source: Source website (slashdot.org)

Usage Examples

Basic Usage

{
"maxArticles": 50,
"scrapeDetails": true
}

Filtered by Section

{
"maxArticles": 200,
"scrapeDetails": true,
"sections": ["technology", "science"],
"sortBy": "popular"
}

Most Commented Articles

{
"maxArticles": 100,
"scrapeDetails": true,
"sortBy": "most_commented"
}

Quick Scraping (No Details)

{
"maxArticles": 500,
"scrapeDetails": false,
"sortBy": "latest"
}

Development Features

HTML Debugging

During development, the scraper saves HTML content to the key-value store for selector analysis:

  • debug_slashdot_html: Contains the HTML content of the main page

Error Handling

  • Comprehensive error handling with detailed logging
  • Graceful handling of missing elements
  • Retry logic for failed requests

Browser Automation

  • Uses Playwright for reliable browser automation
  • Handles dynamic content loading
  • Implements proper delays and waits

Installation

  1. Install dependencies:
$pip install -r requirements.txt
  1. Install Playwright browsers:
$playwright install chromium
  1. Run the scraper:
$python -m src

Docker Usage

docker build -t slashdot-scraper .
docker run -e APIFY_TOKEN=your_token slashdot-scraper

Notes

  • The scraper respects rate limits and implements delays between requests
  • HTML content is saved for debugging purposes during development
  • The scraper handles various article listing layouts and structures
  • All URLs are properly resolved and normalized
  • Comment extraction includes author information and engagement metrics
  • The scraper can handle both article listings and detailed article pages