Hackernews Job Scraper avatar
Hackernews Job Scraper

Pricing

$10.00 / 1,000 jobs

Go to Apify Store
Hackernews Job Scraper

Hackernews Job Scraper

Automatically scrapes and extracts structured job listings from Hacker News 'Who is hiring?' monthly posts. Uses Algolia search to find recent posts, fetches job comments from the Hacker News API, and leverages OpenAI to parse unstructured job postings into structured data.

Pricing

$10.00 / 1,000 jobs

Rating

0.0

(0)

Developer

Kutay

Kutay

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

7 days ago

Last modified

Share

Hacker News Job Scraper

Scrapes and parses job listings from Hacker News "Who is hiring?" posts using Algolia search and OpenAI for structured extraction.

Description

This Apify actor automates the process of discovering and extracting structured job listings from Hacker News "Who is hiring?" monthly threads. The actor leverages Algolia's search API to find recent hiring posts, then uses OpenAI's language models to transform unstructured job postings into clean, standardized data suitable for job boards, recruitment platforms, or market analysis.

The actor begins by querying Algolia's search index for "Ask HN: Who is hiring?" posts, filtering results to include only posts from the specified time period. This ensures users can focus on recent opportunities while maintaining flexibility to adjust the lookback window. Once the most recent post is identified, the actor fetches all associated job comment threads from the Hacker News API, processing each comment as a potential job posting.

A critical component of the actor is its text cleaning pipeline, which removes HTML entities, formatting artifacts, and extraneous whitespace from raw Hacker News comment text. This preprocessing step significantly improves the quality of data extraction by presenting clean text to OpenAI's models. The extraction process uses structured prompts to identify key job attributes including company names, job titles, locations, employment types, salary information, work arrangements, and application URLs.

The actor is designed with reliability and efficiency in mind. It processes jobs sequentially to respect API rate limits while implementing robust error handling that allows individual job extraction failures to be logged without halting the entire process. Configurable parameters enable users to control the scope of scraping through date ranges and maximum job limits, making it suitable for both one-time data collection and ongoing monitoring of new opportunities. The output is structured JSON data ready for integration with job aggregation platforms, applicant tracking systems, or custom analytics dashboards, making it ideal for recruiters tracking tech job markets, researchers analyzing hiring trends, or developers building job search applications.

What it does

  1. Searches Algolia API for "Ask HN: Who is hiring?" posts
  2. Filters posts from the last N days (default: 30)
  3. Fetches the latest post and all job comment replies from Hacker News API
  4. Cleans HTML entities and formatting from job postings
  5. Extracts structured data using OpenAI (company, title, location, type, salary, description, URLs)
  6. Outputs structured JSON to Apify dataset

Input

  • algoliaApiKey (required): Algolia API key from hn.algolia.com network requests
  • openAiApiKey (required): OpenAI API key for data extraction
  • model (optional): OpenAI model, default gpt-4o-mini
  • daysBack (optional): Days to look back, default 30
  • maxJobs (optional): Max jobs to process, default 100

Output

Each job listing includes:

  • company, title, location, type, work_location, salary, description, apply_url, company_url - Extracted job data
  • jobId - Hacker News comment ID
  • postId - Hacker News post ID
  • postTitle - Post title
  • postDate - Post creation date
  • rawText - Cleaned job posting text
  • extractedAt - Extraction timestamp