# Hacker News Scraper: Stories, Comments, Users & Search (`nominated_tupelo/hacker-news-scraper`) Actor

Scrape Hacker News stories, comments, user profiles, and search by keyword using the official HN Firebase API and Algolia search API. No auth required.

- **URL**: https://apify.com/nominated\_tupelo/hacker-news-scraper.md
- **Developed by:** [kade](https://apify.com/nominated_tupelo) (community)
- **Categories:** News, Developer tools, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What does Hacker News Scraper do?

**Hacker News Scraper** extracts stories, comments, user profiles, and full-text search results from [Hacker News](https://news.ycombinator.com) — the legendary tech community run by Y Combinator. It uses the **official Firebase API** (no rate limits) and the **Algolia search API** (full archive search) to pull clean, structured JSON data. No API key required. No browser. No proxy needed.

Use it to monitor trending discussions, research historical topics, track user activity, analyze YC startup trends, or build datasets for LLM fine-tuning and sentiment analysis.

### Why use Hacker News Scraper?

- **Research**: Find every HN discussion about a technology, company, or topic across all time
- **Monitoring**: Track daily top/new/best stories and alert on keywords
- **Data science**: Build training datasets from high-quality technical discussion
- **Competitive intel**: Monitor what the dev community says about your product or competitors
- **Hiring & HR**: Find Ask HN job threads and talent signals
- **Podcast & newsletter**: Auto-curate top HN content for weekly digests

### How to use Hacker News Scraper

1. Go to the **Input** tab and select a **Scrape Mode**
2. For **Top/New/Best/Ask/Show/Jobs**: set Max Items and optional Minimum Score filter
3. For **Keyword Search**: enter a query, choose sort (relevance or date), and optionally restrict to stories or comments
4. For **Thread**: paste a story ID or HN URL to get the story + all comments
5. For **User**: enter a username to get their profile and submissions
6. Click **Start** and find your results in the **Output** tab as clean JSON

### Input

| Field | Type | Description |
|---|---|---|
| `scrapeMode` | enum | `topStories`, `newStories`, `bestStories`, `askStories`, `showStories`, `jobStories`, `search`, `thread`, `user` |
| `searchQuery` | string | Keywords to search (Algolia full-text). Used with `search` mode |
| `searchSortBy` | enum | `relevance` or `date` (newest first) |
| `searchType` | enum | `story`, `comment`, or `all` |
| `storyId` | string | Story ID or HN URL for `thread` mode |
| `username` | string | HN username for `user` mode |
| `maxItems` | integer | Max items to return (0 = no limit, default: 100) |
| `includeComments` | boolean | Fetch full comment trees for each story in feed/search modes |
| `maxCommentsPerStory` | integer | Max comments per story (default: 50) |
| `minScore` | integer | Filter stories below this score (default: 0 = no filter) |

### Output

Each scraped item is a JSON object. You can download the dataset in JSON, CSV, HTML, or Excel format.

#### Story example

```json
{
  "type": "story",
  "id": 40123456,
  "title": "Show HN: I built a tool that does X",
  "by": "username",
  "score": 342,
  "descendants": 87,
  "url": "https://example.com/article",
  "text": null,
  "createdAt": "2026-06-01T14:30:00+00:00",
  "hnUrl": "https://news.ycombinator.com/item?id=40123456",
  "commentIds": [40123457, 40123458]
}
````

#### Comment example

```json
{
  "type": "comment",
  "id": 40123457,
  "parentId": 40123456,
  "storyId": 40123456,
  "by": "commenter",
  "text": "<p>This is really interesting because...</p>",
  "depth": 0,
  "createdAt": "2026-06-01T14:45:00+00:00",
  "hnUrl": "https://news.ycombinator.com/item?id=40123457"
}
```

#### User example

```json
{
  "type": "user",
  "id": "pg",
  "karma": 184923,
  "about": "<p>Co-founder of Y Combinator.</p>",
  "createdAt": "2006-10-09T18:00:00+00:00",
  "submittedCount": 1245,
  "hnUrl": "https://news.ycombinator.com/user?id=pg"
}
```

### Data fields

| Field | Description |
|---|---|
| `type` | `story`, `comment`, `user`, or `job` |
| `id` | HN item/user ID |
| `title` | Story headline |
| `by` | Author username |
| `score` | Story points (upvotes) |
| `descendants` | Total comment count |
| `url` | External article URL |
| `text` | Self-post body or comment text (HTML) |
| `createdAt` | ISO 8601 timestamp |
| `hnUrl` | Direct HN link |
| `commentIds` | IDs of top-level replies |
| `depth` | Comment nesting depth |
| `karma` | User karma score (user items only) |

### Pricing

This actor uses the **Pay Per Event** model. You are charged per item scraped:

- \~500 stories from top feed: ~$0.05
- 1,000 search results: ~$0.05
- Full thread with 200 comments: ~$0.02

Hacker News uses an open public API with no rate limits, so runs complete quickly and cheaply.

### Tips

- **Monitoring**: Use `topStories` with `minScore: 50` to get only high-signal stories
- **Research**: `search` mode with `searchSortBy: date` gives you chronological archives
- **Comment depth**: `includeComments: false` with feed modes keeps costs minimal while capturing discussion
- **Thread analysis**: `thread` mode gives the full discussion tree including nested replies

### FAQ & disclaimers

**Is this legal?** Yes. This actor uses the official public Hacker News Firebase API provided by Y Combinator/Firebase and the public Algolia search API. No ToS violation occurs.

**Can I get all HN data ever?** HN has over 40 million items. The Algolia search API covers the full archive. For bulk exports use `search` mode with broad queries.

**Something broken?** Open an issue on the actor's Issues tab.

# Actor input Schema

## `scrapeMode` (type: `string`):

What to scrape: top/new/best/ask/show/job story feeds, full-text keyword search, story thread with comments, or user profile.

## `searchQuery` (type: `string`):

Keyword(s) to search HN (Algolia full-text search). Used when scrapeMode=search.

## `searchSortBy` (type: `string`):

Sort search results by relevance or by date (newest first). Only applies to scrapeMode=search.

## `searchType` (type: `string`):

Restrict search to stories only, comments only, or all content.

## `storyId` (type: `string`):

HN item ID or full URL (e.g. https://news.ycombinator.com/item?id=43186765). Used when scrapeMode=thread.

## `username` (type: `string`):

Hacker News username to fetch profile and submissions. Used when scrapeMode=user.

## `maxItems` (type: `integer`):

Maximum number of items to return. 0 = no limit.

## `includeComments` (type: `boolean`):

Fetch full comment trees for each story when using feed or search modes. Makes extra API calls — keep maxItems low.

## `maxCommentsPerStory` (type: `integer`):

Maximum comments to fetch per story when includeComments=true.

## `minScore` (type: `integer`):

Only include stories with at least this many points. 0 = include all.

## Actor input object example

```json
{
  "scrapeMode": "topStories",
  "searchQuery": "python",
  "searchSortBy": "relevance",
  "searchType": "story",
  "storyId": "43186765",
  "username": "pg",
  "maxItems": 100,
  "includeComments": false,
  "maxCommentsPerStory": 50,
  "minScore": 0
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "scrapeMode": "topStories",
    "searchQuery": "python",
    "storyId": "43186765",
    "username": "pg",
    "maxItems": 100,
    "maxCommentsPerStory": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("nominated_tupelo/hacker-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "scrapeMode": "topStories",
    "searchQuery": "python",
    "storyId": "43186765",
    "username": "pg",
    "maxItems": 100,
    "maxCommentsPerStory": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("nominated_tupelo/hacker-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "scrapeMode": "topStories",
  "searchQuery": "python",
  "storyId": "43186765",
  "username": "pg",
  "maxItems": 100,
  "maxCommentsPerStory": 50
}' |
apify call nominated_tupelo/hacker-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=nominated_tupelo/hacker-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Scraper: Stories, Comments, Users & Search",
        "description": "Scrape Hacker News stories, comments, user profiles, and search by keyword using the official HN Firebase API and Algolia search API. No auth required.",
        "version": "0.1",
        "x-build-id": "e7FUq6KDXUqcqsv3I"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/nominated_tupelo~hacker-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-nominated_tupelo-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/nominated_tupelo~hacker-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-nominated_tupelo-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/nominated_tupelo~hacker-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-nominated_tupelo-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "scrapeMode"
                ],
                "properties": {
                    "scrapeMode": {
                        "title": "Scrape Mode",
                        "enum": [
                            "topStories",
                            "newStories",
                            "bestStories",
                            "askStories",
                            "showStories",
                            "jobStories",
                            "search",
                            "thread",
                            "user"
                        ],
                        "type": "string",
                        "description": "What to scrape: top/new/best/ask/show/job story feeds, full-text keyword search, story thread with comments, or user profile.",
                        "default": "topStories"
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Keyword(s) to search HN (Algolia full-text search). Used when scrapeMode=search."
                    },
                    "searchSortBy": {
                        "title": "Search Sort Order",
                        "enum": [
                            "relevance",
                            "date"
                        ],
                        "type": "string",
                        "description": "Sort search results by relevance or by date (newest first). Only applies to scrapeMode=search.",
                        "default": "relevance"
                    },
                    "searchType": {
                        "title": "Search Content Type",
                        "enum": [
                            "all",
                            "story",
                            "comment"
                        ],
                        "type": "string",
                        "description": "Restrict search to stories only, comments only, or all content.",
                        "default": "story"
                    },
                    "storyId": {
                        "title": "Story ID or HN URL",
                        "type": "string",
                        "description": "HN item ID or full URL (e.g. https://news.ycombinator.com/item?id=43186765). Used when scrapeMode=thread."
                    },
                    "username": {
                        "title": "HN Username",
                        "type": "string",
                        "description": "Hacker News username to fetch profile and submissions. Used when scrapeMode=user."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 0,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Maximum number of items to return. 0 = no limit.",
                        "default": 100
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Fetch full comment trees for each story when using feed or search modes. Makes extra API calls — keep maxItems low.",
                        "default": false
                    },
                    "maxCommentsPerStory": {
                        "title": "Max Comments Per Story",
                        "minimum": 1,
                        "maximum": 2000,
                        "type": "integer",
                        "description": "Maximum comments to fetch per story when includeComments=true.",
                        "default": 50
                    },
                    "minScore": {
                        "title": "Minimum Score",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only include stories with at least this many points. 0 = include all.",
                        "default": 0
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
