# 🟧 Hacker News Scraper — Stories, Comments & Search by Keyword (`iskoren/hacker-news-scraper`) Actor

Search and scrape Hacker News stories, comments, and polls by keyword — points, authors, comment counts, dates, and links. Powered by the official HN API.

- **URL**: https://apify.com/iskoren/hacker-news-scraper.md
- **Developed by:** [Is Koren](https://apify.com/iskoren) (community)
- **Categories:** News, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.01 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🟧 Hacker News Scraper — Stories, Comments & Search by Keyword

Scrape **Hacker News** at scale with a fast, reliable **Hacker News scraper** built on the official
**Algolia HN Search API**. Search Hacker News by keyword to pull matching **stories**, **comments**,
and **polls**, or grab the current **front page** and **newest** items — no API key, no login, and
no anti-bot headaches. Every result is emitted as one clean, structured JSON record ready for
analysis, dashboards, alerting, or AI pipelines.

Whether you are tracking what Hacker News says about your product, monitoring keywords like
"artificial intelligence", or building a dataset of top stories, this Hacker News scraper gets you
there in seconds.

### ✨ Features

- 🔎 **Keyword search** across Hacker News stories, comments, and polls.
- 📰 **Front page mode** — scrape the current HN front page without a query.
- 🏆 **Sort by relevance or date** (newest first).
- 🎯 **Numeric filters** — only keep items above a minimum points or comment count.
- 📄 **Automatic pagination** up to the Algolia ~1000-result cap.
- 🧱 **Flat, structured output** — one record per result, ready for CSV/JSON/Excel export.
- 🛡️ **No anti-bot issues** — uses the public Algolia HN API, so runs are cheap and stable.

### 🚀 Quick start

Paste this input to scrape the top 10 stories about artificial intelligence:

```json
{
    "query": "artificial intelligence",
    "contentType": "story",
    "sortBy": "relevance",
    "maxItems": 10
}
````

Scrape the current front page (no keyword needed):

```json
{
    "query": "",
    "contentType": "front_page",
    "sortBy": "relevance",
    "maxItems": 30
}
```

Find the newest highly-upvoted discussions about a topic:

```json
{
    "query": "rust programming",
    "contentType": "story",
    "sortBy": "date",
    "minPoints": 50,
    "maxItems": 100
}
```

### ⚙️ Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | string | `"artificial intelligence"` | Keyword or phrase to search for. Leave empty to fetch the latest items / front page. |
| `contentType` | select | `story` | What to scrape: `story`, `comment`, `poll`, or `front_page`. |
| `sortBy` | select | `relevance` | `relevance` (best match) or `date` (newest first). |
| `maxItems` | integer | `50` | Maximum total results (1–1000; Algolia caps near 1000). |
| `minPoints` | integer | — | Only keep items with at least this many points. |
| `minComments` | integer | — | Only keep items with at least this many comments. |
| `proxyConfiguration` | proxy | `{ "useApifyProxy": true }` | Proxy settings. Datacenter proxies work fine here. |

### 📤 Output

Each result is pushed as one record to the dataset. Example story record:

```json
{
    "query": "artificial intelligence",
    "objectID": "39038064",
    "title": "The rise of artificial intelligence agents",
    "url": "https://example.com/ai-agents",
    "author": "pg",
    "points": 412,
    "numComments": 187,
    "createdAt": "2026-01-12T09:33:00.000Z",
    "createdAtTimestamp": 1768210380,
    "hnUrl": "https://news.ycombinator.com/item?id=39038064",
    "storyText": null,
    "tags": ["story", "author_pg", "story_39038064"]
}
```

Comment records additionally include `commentText`, `storyId`, and `parentId`.

| Field | Description |
|-------|-------------|
| `query` | The search query used for the run. |
| `objectID` | Unique Hacker News item ID. |
| `title` | Story/poll title (null for comments). |
| `url` | External link (null for Ask/Show HN and text posts). |
| `author` | Hacker News username of the author. |
| `points` | Score / upvotes. |
| `numComments` | Number of comments on the item. |
| `createdAt` | ISO 8601 creation timestamp. |
| `createdAtTimestamp` | Unix creation timestamp. |
| `hnUrl` | Canonical Hacker News discussion URL. |
| `storyText` | HTML-stripped self/Ask HN text (if any). |
| `tags` | Algolia `_tags` array. |
| `commentText` | (Comments only) HTML-stripped comment body. |
| `storyId` | (Comments only) ID of the parent story. |
| `parentId` | (Comments only) ID of the direct parent item. |

### ❓ FAQ

**Do I need a Hacker News API key?**
No. This Hacker News scraper uses the free, public Algolia HN Search API — no key or login.

**How many results can I get?**
The Algolia HN API caps results at roughly 1000 per query. Set `maxItems` accordingly.

**Why is `url` sometimes null?**
Ask HN, Show HN, and text posts have no external link, so `url` is null. Use `hnUrl` for the
discussion page and `storyText` for the body.

**Can I scrape only comments?**
Yes — set `contentType` to `comment`. Records will include `commentText`, `storyId`, and `parentId`.

**Will I get rate-limited or blocked?**
The Algolia HN API is very tolerant and has no anti-bot protection, so datacenter proxies are fine.

### 💡 Tips

- Use `sortBy: "date"` with `minPoints` to build a feed of fresh, already-popular discussions.
- Combine `query` with `contentType: "comment"` to mine sentiment and opinions on a topic.
- Leave `query` empty and set `contentType: "front_page"` to snapshot the HN front page on a schedule.
- Schedule this actor to run hourly to monitor a keyword and feed results into Slack or a webhook.

# Actor input Schema

## `query` (type: `string`):

Keyword or phrase to search Hacker News for. Leave empty to fetch the latest items (or the front page) for the chosen content type.

## `contentType` (type: `string`):

What to scrape. 'story' returns stories, 'comment' returns comments, 'poll' returns polls, and 'front\_page' returns the current Hacker News front page.

## `sortBy` (type: `string`):

'relevance' uses the Algolia /search endpoint (best match). 'date' uses /search\_by\_date (newest first).

## `maxItems` (type: `integer`):

Maximum total number of results to return. Algolia caps results at roughly 1000.

## `minPoints` (type: `integer`):

Only return items with at least this many points (upvotes). Leave empty for no filter.

## `minComments` (type: `integer`):

Only return items with at least this many comments. Leave empty for no filter.

## `proxyConfiguration` (type: `object`):

Proxy settings. Apify Proxy is used by default; the Algolia HN API has no anti-bot, so datacenter proxies are fine.

## Actor input object example

```json
{
  "query": "artificial intelligence",
  "contentType": "story",
  "sortBy": "relevance",
  "maxItems": 50,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "artificial intelligence"
};

// Run the Actor and wait for it to finish
const run = await client.actor("iskoren/hacker-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "query": "artificial intelligence" }

# Run the Actor and wait for it to finish
run = client.actor("iskoren/hacker-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "artificial intelligence"
}' |
apify call iskoren/hacker-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=iskoren/hacker-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "🟧 Hacker News Scraper — Stories, Comments & Search by Keyword",
        "description": "Search and scrape Hacker News stories, comments, and polls by keyword — points, authors, comment counts, dates, and links. Powered by the official HN API.",
        "version": "0.1",
        "x-build-id": "Q4fc05l7UZh289p13"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/iskoren~hacker-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-iskoren-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/iskoren~hacker-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-iskoren-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/iskoren~hacker-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-iskoren-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Keyword or phrase to search Hacker News for. Leave empty to fetch the latest items (or the front page) for the chosen content type."
                    },
                    "contentType": {
                        "title": "Content type",
                        "enum": [
                            "story",
                            "comment",
                            "poll",
                            "front_page"
                        ],
                        "type": "string",
                        "description": "What to scrape. 'story' returns stories, 'comment' returns comments, 'poll' returns polls, and 'front_page' returns the current Hacker News front page.",
                        "default": "story"
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "relevance",
                            "date"
                        ],
                        "type": "string",
                        "description": "'relevance' uses the Algolia /search endpoint (best match). 'date' uses /search_by_date (newest first).",
                        "default": "relevance"
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum total number of results to return. Algolia caps results at roughly 1000.",
                        "default": 50
                    },
                    "minPoints": {
                        "title": "Minimum points",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only return items with at least this many points (upvotes). Leave empty for no filter."
                    },
                    "minComments": {
                        "title": "Minimum comments",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only return items with at least this many comments. Leave empty for no filter."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy settings. Apify Proxy is used by default; the Algolia HN API has no anti-bot, so datacenter proxies are fine.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
