# Hacker News Scraper: Stories, Comments, Users & Search (`perconey/hackernews-scraper`) Actor

Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.

- **URL**: https://apify.com/perconey/hackernews-scraper.md
- **Developed by:** [Perconey](https://apify.com/perconey) (community)
- **Categories:** Developer tools, News, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$1.00 / 1,000 result items

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What does Hacker News Scraper do?

**Hacker News Scraper** pulls structured data from [Hacker News](https://news.ycombinator.com) using **two official public APIs**: the Firebase API (canonical, zero-rate-limit, the same backend the HN web app uses) and the Algolia full-text search API (which powers HN's own search box). No browser, no proxies, no cookies, no anti-bot fight. The Firebase API has no rate limit, no quota, no auth - it's just a Google Firebase Realtime Database mirror that the YC team keeps open for everyone.

Try it instantly: pick **getTopStories**, leave queries empty, click Start. You get the current HN frontpage with author, score, comment count, URL, and posting time in under 10 seconds for $0.03.

### Why use Hacker News Scraper?

- **Tech recruiters**: Find high-karma engineers on niche topics. `searchStories` with `tags: comment,author_dhh` returns every comment a specific developer has written. `getUserProfile` with `includeSubmissions: true` returns 30 most recent stories/comments by a user - a real signal of expertise.
- **Startup researchers**: Track "Show HN" launches with `getShowStories`. Watch what's gaining traction in `getBestStories` (high-karma recent submissions). Build a daily digest with Apify Scheduler.
- **DevRel teams**: Monitor stories mentioning your product. `searchStories` with your project name as query + `sortByDate: true` gives newest-first chronological feed. Pipe into Slack/Discord via Apify Integrations.
- **Content marketers**: Reverse-engineer what HN upvotes. `getTopStories` over time + URL clustering reveals which domains/topics get traction.
- **AI/ML pipelines**: Hacker News comments are one of the highest-quality public conversation datasets. `getItemDetail` with `includeComments: true` returns full threaded discussions in clean JSON.
- **Hiring managers**: Use `getJobStories` to track Y Combinator portfolio hiring activity.

### How to use Hacker News Scraper

1. Open the **Input** tab.
2. Pick an **action** from the dropdown. `getTopStories` is the simplest starting point.
3. Story-list actions (Top/New/Best/Ask/Show/Jobs) need no queries - leave that field empty.
4. For `getItemDetail` or `getUserProfile`, enter the item id (e.g. `48106024`) or HN username (e.g. `pg`) in queries.
5. For `searchStories`, type the search query and optionally set `tags` (e.g. `story,ask_hn`) and `sortByDate`.
6. Set **maxItems** to cap the run. Default 30.
7. Toggle `includeComments` for getItemDetail or `includeSubmissions` for getUserProfile if you want the deep dive.
8. Click **Start**. Results stream to the dataset.

### Input

Field | Required | Description
--- | --- | ---
`action` | yes | Which API call to make. Nine options.
`queries` | sometimes | Required for getItemDetail / getUserProfile / searchStories. Empty for story-list actions.
`maxItems` | no | Max items per query. Default 30. For getItemDetail+includeComments this caps the BFS comment walk.
`includeComments` | no | For getItemDetail: walk the full comment tree (each comment counts as a result-item).
`includeSubmissions` | no | For getUserProfile: also fetch the user's most recent submissions.
`sortByDate` | no | For searchStories: use Algolia's newest-first ranking instead of relevance.
`tags` | no | For searchStories: Algolia tags filter (e.g. `story`, `comment`, `ask_hn`, `show_hn`, `author_USERNAME`).

### Output

Every dataset item carries `_type` (`story` / `comment` / `job` / `poll` / `user` / `error`) plus `_action` for filtering.

```json
{
    "_type": "story",
    "_action": "getTopStories",
    "id": 48106024,
    "type": "story",
    "title": "Learning Software Architecture",
    "url": "https://example.com/article",
    "score": 189,
    "descendants": 47,
    "by": "surprisetalk",
    "time": "2026-05-12T11:23:11.000Z",
    "kids": [48106101, 48106220, 48106305],
    "hn_url": "https://news.ycombinator.com/item?id=48106024"
}
````

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab or the Apify API.

### Data fields

Type | Key fields
\--- | ---
`story` / `job` / `poll` | id, type, title, url, text, score, descendants (comment count), by (author), time, kids (comment ids), hn\_url
`comment` | id, type, text, by, time, parent, kids, hn\_url, score
`user` | id (username), karma, about (HTML bio), created, submitted\_count, submitted\_ids, hn\_url
`error` | \_action, \_query, error, status

### Pricing

**Pay-per-result: $0.001 per item.** Each story = one event. Each comment in a tree = one event. Each user profile = one event. No flat monthly fee.

Cost examples:

- Daily HN frontpage (30 stories): **$0.03**
- Full comment tree of a 200-comment story: **$0.20**
- 500 stories matching a search query: **$0.50**
- 100 user profiles + 30 submissions each (4000 items total): **$4.00**

### Tips

- **Firebase API has NO rate limit.** Run as many parallel requests as you want. We batch 5 concurrent fetches per page out of politeness.
- **Algolia returns up to 50 hits per page, max 1000 hits per query.** For exhaustive search over a high-volume term, segment by date with `numericFilters=created_at_i>...` (advanced) or use `sortByDate` and paginate by date.
- **Use `tags: author_USERNAME` in searchStories** to get every post or comment by a specific user. Compare with `getUserProfile + includeSubmissions: true` - same data, different angle.
- **HN ids are global and immutable.** A story id is unique across stories, comments, users, jobs, polls. The `_type` field tells you what shape the item is.
- **For real-time monitoring**, set up an Apify Schedule running `getNewStories` every 5 minutes. Combine with an Apify Integration (Slack, Discord, webhook) to get an instant feed.

### FAQ, disclaimers, support

**Is this legal?** The actor calls Hacker News's official public APIs (firebaseio.com + hn.algolia.com), both maintained by Y Combinator for public use. No scraping of the HTML, no auth bypass, no rate-limit games. We send a clear User-Agent identifying the actor.

**What about deleted/flagged content?** The Firebase API surfaces `deleted: true` and `dead: true` flags. We skip both in the comment-tree walker to keep your dataset clean, but solo `getItemDetail` will still return them with the flags so you can audit deletions if needed.

**Does it return downvotes?** No. HN doesn't expose downvote counts publicly. Only net `score` is available.

**Why no realtime stream?** The Firebase API supports realtime subscriptions but Apify Actors are batch workers. For a true realtime feed, run this actor on Apify Schedule with a tight interval (every 1-5 minutes).

**Bug or feature request?** Open an Issue on the actor's Issues tab. Usually responded to within a day.

**Need a scraper for Reddit, Lobsters, Lemmy, Tildes?** See my other actors at https://apify.com/perconey, or open an Issue.

# Actor input Schema

## `action` (type: `string`):

Story-list actions need no queries. getItemDetail and getUserProfile need a single id/username. searchStories needs a search query.

## `queries` (type: `array`):

Depends on action. Story-list actions: leave empty. getItemDetail: HN item id (e.g. 48106024) or full URL (https://news.ycombinator.com/item?id=...). getUserProfile: HN username (e.g. pg, dhh). searchStories: free-text query.

## `maxItems` (type: `integer`):

Stop after this many items. For getItemDetail+includeComments, this caps the comment tree (BFS).

## `includeComments` (type: `boolean`):

Walk and fetch the full comment tree of the story. Each comment counts as one result-item. Capped by maxItems.

## `includeSubmissions` (type: `boolean`):

After the user profile, also fetch their most recent submissions. Counts toward maxItems.

## `sortByDate` (type: `boolean`):

Algolia default is relevance. Enable to get newest-first instead.

## `tags` (type: `string`):

Restrict search results. Examples: 'story', 'comment', 'story,author\_pg', 'ask\_hn', 'show\_hn'. See https://hn.algolia.com/api for the full list.

## Actor input object example

```json
{
  "action": "getTopStories",
  "queries": [],
  "maxItems": 30,
  "includeComments": false,
  "includeSubmissions": false,
  "sortByDate": false
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "queries": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("perconey/hackernews-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "queries": [] }

# Run the Actor and wait for it to finish
run = client.actor("perconey/hackernews-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "queries": []
}' |
apify call perconey/hackernews-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=perconey/hackernews-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Scraper: Stories, Comments, Users & Search",
        "description": "Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.",
        "version": "0.1",
        "x-build-id": "Gh4JJoTWGpYqSh51A"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/perconey~hackernews-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-perconey-hackernews-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/perconey~hackernews-scraper/runs": {
            "post": {
                "operationId": "runs-sync-perconey-hackernews-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/perconey~hackernews-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-perconey-hackernews-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "action"
                ],
                "properties": {
                    "action": {
                        "title": "What do you want to scrape?",
                        "enum": [
                            "getTopStories",
                            "getNewStories",
                            "getBestStories",
                            "getAskStories",
                            "getShowStories",
                            "getJobStories",
                            "getItemDetail",
                            "getUserProfile",
                            "searchStories"
                        ],
                        "type": "string",
                        "description": "Story-list actions need no queries. getItemDetail and getUserProfile need a single id/username. searchStories needs a search query.",
                        "default": "getTopStories"
                    },
                    "queries": {
                        "title": "Queries",
                        "type": "array",
                        "description": "Depends on action. Story-list actions: leave empty. getItemDetail: HN item id (e.g. 48106024) or full URL (https://news.ycombinator.com/item?id=...). getUserProfile: HN username (e.g. pg, dhh). searchStories: free-text query.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max items per query",
                        "minimum": 0,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Stop after this many items. For getItemDetail+includeComments, this caps the comment tree (BFS).",
                        "default": 30
                    },
                    "includeComments": {
                        "title": "Include comment tree (getItemDetail only)",
                        "type": "boolean",
                        "description": "Walk and fetch the full comment tree of the story. Each comment counts as one result-item. Capped by maxItems.",
                        "default": false
                    },
                    "includeSubmissions": {
                        "title": "Include recent submissions (getUserProfile only)",
                        "type": "boolean",
                        "description": "After the user profile, also fetch their most recent submissions. Counts toward maxItems.",
                        "default": false
                    },
                    "sortByDate": {
                        "title": "Sort search by date (searchStories only)",
                        "type": "boolean",
                        "description": "Algolia default is relevance. Enable to get newest-first instead.",
                        "default": false
                    },
                    "tags": {
                        "title": "Algolia tags filter (searchStories only)",
                        "type": "string",
                        "description": "Restrict search results. Examples: 'story', 'comment', 'story,author_pg', 'ask_hn', 'show_hn'. See https://hn.algolia.com/api for the full list."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
