Hackernews Job Scraper
Pricing
$10.00 / 1,000 jobs
Hackernews Job Scraper
Automatically scrapes and extracts structured job listings from Hacker News 'Who is hiring?' monthly posts. Uses Algolia search to find recent posts, fetches job comments from the Hacker News API, and leverages OpenAI to parse unstructured job postings into structured data.
Pricing
$10.00 / 1,000 jobs
Rating
0.0
(0)
Developer

Kutay
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
7 days ago
Last modified
Categories
Share
Hacker News Job Scraper
Scrapes and parses job listings from Hacker News "Who is hiring?" posts using Algolia search and OpenAI for structured extraction.
Description
This Apify actor automates the process of discovering and extracting structured job listings from Hacker News "Who is hiring?" monthly threads. The actor leverages Algolia's search API to find recent hiring posts, then uses OpenAI's language models to transform unstructured job postings into clean, standardized data suitable for job boards, recruitment platforms, or market analysis.
The actor begins by querying Algolia's search index for "Ask HN: Who is hiring?" posts, filtering results to include only posts from the specified time period. This ensures users can focus on recent opportunities while maintaining flexibility to adjust the lookback window. Once the most recent post is identified, the actor fetches all associated job comment threads from the Hacker News API, processing each comment as a potential job posting.
A critical component of the actor is its text cleaning pipeline, which removes HTML entities, formatting artifacts, and extraneous whitespace from raw Hacker News comment text. This preprocessing step significantly improves the quality of data extraction by presenting clean text to OpenAI's models. The extraction process uses structured prompts to identify key job attributes including company names, job titles, locations, employment types, salary information, work arrangements, and application URLs.
The actor is designed with reliability and efficiency in mind. It processes jobs sequentially to respect API rate limits while implementing robust error handling that allows individual job extraction failures to be logged without halting the entire process. Configurable parameters enable users to control the scope of scraping through date ranges and maximum job limits, making it suitable for both one-time data collection and ongoing monitoring of new opportunities. The output is structured JSON data ready for integration with job aggregation platforms, applicant tracking systems, or custom analytics dashboards, making it ideal for recruiters tracking tech job markets, researchers analyzing hiring trends, or developers building job search applications.
What it does
- Searches Algolia API for "Ask HN: Who is hiring?" posts
- Filters posts from the last N days (default: 30)
- Fetches the latest post and all job comment replies from Hacker News API
- Cleans HTML entities and formatting from job postings
- Extracts structured data using OpenAI (company, title, location, type, salary, description, URLs)
- Outputs structured JSON to Apify dataset
Input
algoliaApiKey(required): Algolia API key from hn.algolia.com network requestsopenAiApiKey(required): OpenAI API key for data extractionmodel(optional): OpenAI model, defaultgpt-4o-minidaysBack(optional): Days to look back, default30maxJobs(optional): Max jobs to process, default100
Output
Each job listing includes:
company,title,location,type,work_location,salary,description,apply_url,company_url- Extracted job datajobId- Hacker News comment IDpostId- Hacker News post IDpostTitle- Post titlepostDate- Post creation daterawText- Cleaned job posting textextractedAt- Extraction timestamp