# Hacker News Scraper — Stories, Comments, Users (`knotless_cadence/hacker-news-scraper`) Actor

Scrape Hacker News stories, comments, and user profiles. Get title, score, author, URL, comment text, karma. Track trending tech topics and developer sentiment. Export JSON/CSV. No API key needed. Need custom scraping? Email spinov001@gmail.com. Tips: t.me/scraping\_ai

- **URL**: https://apify.com/knotless\_cadence/hacker-news-scraper.md
- **Developed by:** [Alex](https://apify.com/knotless_cadence) (community)
- **Categories:** News, Developer tools, Open source
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hacker News Scraper

Scrape stories and comments from Hacker News — extract top, new, best, Ask HN, Show HN, and job posts with full comment threads. Uses the official HN API and Algolia search for fast, reliable data extraction.

### Features

- **6 story types** — top, new, best, Ask HN, Show HN, and job stories
- **Full comment threads** — nested comments with author, text, timestamp, depth level, and child count (up to 3 levels deep)
- **Algolia search** — find stories by keyword with relevance ranking across all of Hacker News history
- **Score filtering** — set a minimum score threshold to extract only high-quality stories
- **Batch processing** — fetches stories in parallel batches of 10 for maximum speed
- **Domain extraction** — automatically extracts the domain from story URLs
- **Real-time data** — uses the official Firebase HN API for live scores and comment counts

### Output Example

```json
{
  "id": 39876543,
  "title": "Show HN: I built an open-source alternative to Notion",
  "url": "https://github.com/user/project",
  "author": "developer_123",
  "score": 487,
  "commentCount": 234,
  "time": "2026-03-17T14:20:00.000Z",
  "type": "story",
  "hnUrl": "https://news.ycombinator.com/item?id=39876543",
  "domain": "github.com",
  "source": "top",
  "comments": [
    {
      "id": 39876600,
      "author": "tech_reviewer",
      "text": "This is impressive! I especially like the...",
      "time": "2026-03-17T14:35:00.000Z",
      "depth": 0,
      "childCount": 5
    }
  ],
  "scrapedAt": "2026-03-18T12:00:00.000Z"
}
````

### Use Cases

- **Tech trend monitoring** — track what topics, tools, and technologies the developer community is discussing
- **Content research** — discover high-performing content topics and formats that resonate with technical audiences
- **Competitive intelligence** — monitor mentions of your product, competitors, and industry on the #1 tech news site
- **Startup discovery** — scrape Show HN posts to find new product launches and early-stage startups
- **Job market analysis** — extract HN job postings to analyze hiring trends, salaries, and in-demand skills

### Input Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `scrapeType` | String | `"top"` | Story type: top, new, best, ask, show, job, search |
| `searchQueries` | Array | `[]` | Keywords to search across HN history (via Algolia) |
| `maxStories` | Number | `100` | Maximum stories to extract |
| `includeComments` | Boolean | `true` | Whether to extract comment threads |
| `maxCommentsPerStory` | Number | `30` | Maximum comments per story (includes nested replies) |
| `minScore` | Number | `0` | Minimum score threshold (filter out low-scoring stories) |

### Cost Estimation

- \~$0.50 per 100 stories without comments
- \~$2.00 per 100 stories with full comment threads
- Free tier: up to 30 stories with Apify free plan

### FAQ

**Q: Can I search across all of Hacker News history?**
A: Yes. The search feature uses Algolia's HN Search API, which indexes all Hacker News stories and comments from the beginning. You can find stories from any time period.

**Q: Why are comments more expensive to scrape?**
A: Each comment requires a separate API call to the HN Firebase API. A story with 200 comments can require 30+ individual requests to fetch the top-level and nested replies.

**Q: What's the difference between "top" and "best" stories?**
A: "Top" shows the current front page ranking (changes frequently). "Best" shows the highest-scoring stories over a longer period. "New" shows the most recently submitted stories regardless of score.

# Actor input Schema

## `scrapeType` (type: `string`):

Type of stories to scrape: top, new, best, ask HN, show HN, or jobs

## `searchQueries` (type: `array`):

Search Hacker News for specific topics using Algolia-powered search

## `maxStories` (type: `integer`):

Maximum number of stories to extract per source

## `includeComments` (type: `boolean`):

Extract comment threads for each story (increases run time)

## `maxCommentsPerStory` (type: `integer`):

Maximum number of comments to extract per story

## `minScore` (type: `integer`):

Only include stories with at least this many upvotes/points

## Actor input object example

```json
{
  "scrapeType": "top",
  "searchQueries": [],
  "maxStories": 100,
  "includeComments": true,
  "maxCommentsPerStory": 30,
  "minScore": 0
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("knotless_cadence/hacker-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("knotless_cadence/hacker-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call knotless_cadence/hacker-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=knotless_cadence/hacker-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Scraper — Stories, Comments, Users",
        "description": "Scrape Hacker News stories, comments, and user profiles. Get title, score, author, URL, comment text, karma. Track trending tech topics and developer sentiment. Export JSON/CSV. No API key needed. Need custom scraping? Email spinov001@gmail.com. Tips: t.me/scraping_ai",
        "version": "1.0",
        "x-build-id": "U6i9bbhkbM8ctNYN7"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/knotless_cadence~hacker-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-knotless_cadence-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/knotless_cadence~hacker-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-knotless_cadence-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/knotless_cadence~hacker-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-knotless_cadence-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "scrapeType": {
                        "title": "Story Type",
                        "enum": [
                            "top",
                            "new",
                            "best",
                            "ask",
                            "show",
                            "job"
                        ],
                        "type": "string",
                        "description": "Type of stories to scrape: top, new, best, ask HN, show HN, or jobs",
                        "default": "top"
                    },
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "Search Hacker News for specific topics using Algolia-powered search",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxStories": {
                        "title": "Max Stories",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of stories to extract per source",
                        "default": 100
                    },
                    "includeComments": {
                        "title": "Include Comments",
                        "type": "boolean",
                        "description": "Extract comment threads for each story (increases run time)",
                        "default": true
                    },
                    "maxCommentsPerStory": {
                        "title": "Max Comments Per Story",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum number of comments to extract per story",
                        "default": 30
                    },
                    "minScore": {
                        "title": "Minimum Score",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only include stories with at least this many upvotes/points",
                        "default": 0
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
