# Hacker News Stories, Comments & Users Scraper (`crawlerbros/hacker-news-scraper`) Actor

Scrape Hacker News - search stories and comments, fetch top/new/best stories, get user profiles and submission history. Uses the official Algolia HN Search API and Hacker News Firebase API.

- **URL**: https://apify.com/crawlerbros/hacker-news-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** News, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hacker News Stories, Comments & Users Scraper

Extract stories, comments, and user data from [Hacker News](https://news.ycombinator.com) using the official [Algolia HN Search API](https://hn.algolia.com/api) and [Hacker News Firebase API](https://github.com/HackerNews/API). No authentication or proxy required — 100% reliable.

### Features

- **Search stories** by keyword with relevance or date sorting
- **Search comments** by keyword across all HN discussions
- **Top / New / Best stories** — real-time front page and trending feeds
- **User submissions** — all stories and comments by any HN user
- **Item lookup** — fetch any item by ID or URL
- Filter by date range, minimum points, and story type (Ask HN, Show HN, Job)

### Input

| Field | Type | Description |
|-------|------|-------------|
| `mode` | select | What to scrape (see modes below) |
| `searchQuery` | string | Keyword to search (modes: searchStories, searchComments) |
| `username` | string | HN username (mode: byUser) |
| `startUrls` | array | Item IDs or URLs (mode: getItem) |
| `storyType` | select | story, ask_hn, show_hn, job, all |
| `sortBy` | select | relevance or date |
| `dateFrom` | string | Filter from date (YYYY-MM-DD) |
| `dateTo` | string | Filter to date (YYYY-MM-DD) |
| `minPoints` | integer | Minimum points filter |
| `maxItems` | integer | Maximum records to return (default 50) |

#### Modes

| Mode | Description |
|------|-------------|
| `searchStories` | Full-text search across all HN stories |
| `searchComments` | Full-text search across all HN comments |
| `topStories` | Current front page top stories |
| `newStories` | Latest submitted stories |
| `bestStories` | All-time best stories |
| `byUser` | All submissions (stories + comments) by a username |
| `getItem` | Fetch specific items by ID or URL |

### Output

#### Story record

```json
{
  "storyId": 39876543,
  "type": "story",
  "title": "Show HN: I built a faster way to index large codebases",
  "url": "https://example.com/faster-indexing",
  "hnUrl": "https://news.ycombinator.com/item?id=39876543",
  "author": "johndoe",
  "points": 312,
  "commentsCount": 87,
  "createdAt": "2024-03-15T14:22:00+00:00",
  "recordType": "story",
  "scrapedAt": "2024-03-15T15:00:00+00:00"
}
````

#### Comment record

```json
{
  "commentId": 39876999,
  "type": "comment",
  "text": "This is really impressive. How does it handle...",
  "hnUrl": "https://news.ycombinator.com/item?id=39876999",
  "author": "janedoe",
  "points": 42,
  "storyId": 39876543,
  "storyTitle": "Show HN: I built a faster way to index large codebases",
  "storyUrl": "https://example.com/faster-indexing",
  "storyHnUrl": "https://news.ycombinator.com/item?id=39876543",
  "createdAt": "2024-03-15T14:45:00+00:00",
  "recordType": "comment",
  "scrapedAt": "2024-03-15T15:00:00+00:00"
}
```

### Use Cases

- **Market research** — track mentions of products, technologies, or companies
- **Trend analysis** — monitor top stories over time by category
- **Competitor intelligence** — search for mentions of competitors or alternatives
- **Content research** — find highly-upvoted discussions on specific topics
- **Lead generation** — identify users active in your target domain

### FAQ

**Does this require authentication?**\
No. Both the Algolia HN API and Firebase API are fully public. No API key needed.

**Is there a rate limit?**\
The Algolia API allows generous usage for non-commercial use. This actor respects rate limits with automatic retries.

**How far back does the search history go?**\
The Algolia HN Search API indexes all HN content since the site's founding in 2006.

**Can I search for Ask HN or Show HN posts specifically?**\
Yes — set `storyType` to `ask_hn` or `show_hn` in searchStories mode.

**How many results can I fetch?**\
Up to 5,000 records per run. The Algolia API paginates results automatically.

# Actor input Schema

## `mode` (type: `string`):

What to scrape.

## `searchQuery` (type: `string`):

Keyword or phrase to search for (modes: searchStories, searchComments).

## `username` (type: `string`):

Hacker News username to fetch submissions for (mode: byUser).

## `startUrls` (type: `array`):

Hacker News item IDs (numeric) or URLs like https://news.ycombinator.com/item?id=12345 (mode: getItem).

## `storyType` (type: `string`):

Filter by story type (applies to searchStories).

## `sortBy` (type: `string`):

Sort order for search results.

## `dateFrom` (type: `string`):

Only return items created on or after this date (ISO format, e.g. 2024-01-01). Applies to search modes.

## `dateTo` (type: `string`):

Only return items created on or before this date (ISO format, e.g. 2024-12-31). Applies to search modes.

## `minPoints` (type: `integer`):

Only emit stories/comments with at least this many points.

## `maxItems` (type: `integer`):

Maximum number of records to emit.

## Actor input object example

```json
{
  "mode": "searchStories",
  "searchQuery": "AI startup",
  "startUrls": [],
  "storyType": "all",
  "sortBy": "relevance",
  "maxItems": 50
}
```

# Actor output Schema

## `records` (type: `string`):

Dataset containing all scraped Hacker News records (stories, comments, or user data depending on mode).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "searchStories",
    "searchQuery": "AI startup",
    "startUrls": [],
    "storyType": "all",
    "sortBy": "relevance",
    "maxItems": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/hacker-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "searchStories",
    "searchQuery": "AI startup",
    "startUrls": [],
    "storyType": "all",
    "sortBy": "relevance",
    "maxItems": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/hacker-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "searchStories",
  "searchQuery": "AI startup",
  "startUrls": [],
  "storyType": "all",
  "sortBy": "relevance",
  "maxItems": 50
}' |
apify call crawlerbros/hacker-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/hacker-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Stories, Comments & Users Scraper",
        "description": "Scrape Hacker News - search stories and comments, fetch top/new/best stories, get user profiles and submission history. Uses the official Algolia HN Search API and Hacker News Firebase API.",
        "version": "1.0",
        "x-build-id": "ADC8d1soKtXhnV7gf"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~hacker-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~hacker-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~hacker-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "searchStories",
                            "searchComments",
                            "topStories",
                            "newStories",
                            "bestStories",
                            "byUser",
                            "getItem"
                        ],
                        "type": "string",
                        "description": "What to scrape.",
                        "default": "searchStories"
                    },
                    "searchQuery": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Keyword or phrase to search for (modes: searchStories, searchComments)."
                    },
                    "username": {
                        "title": "Username",
                        "type": "string",
                        "description": "Hacker News username to fetch submissions for (mode: byUser)."
                    },
                    "startUrls": {
                        "title": "Item IDs or URLs",
                        "type": "array",
                        "description": "Hacker News item IDs (numeric) or URLs like https://news.ycombinator.com/item?id=12345 (mode: getItem).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "storyType": {
                        "title": "Story type filter",
                        "enum": [
                            "all",
                            "story",
                            "ask_hn",
                            "show_hn",
                            "job"
                        ],
                        "type": "string",
                        "description": "Filter by story type (applies to searchStories).",
                        "default": "all"
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "relevance",
                            "date"
                        ],
                        "type": "string",
                        "description": "Sort order for search results.",
                        "default": "relevance"
                    },
                    "dateFrom": {
                        "title": "Date from (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Only return items created on or after this date (ISO format, e.g. 2024-01-01). Applies to search modes."
                    },
                    "dateTo": {
                        "title": "Date to (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Only return items created on or before this date (ISO format, e.g. 2024-12-31). Applies to search modes."
                    },
                    "minPoints": {
                        "title": "Minimum points",
                        "minimum": 0,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Only emit stories/comments with at least this many points."
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of records to emit.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
