# Hacker News Scraper (`klondikeking/hacker-news-scraper`) Actor

- **URL**: https://apify.com/klondikeking/hacker-news-scraper.md
- **Developed by:** [Pierrick McD0nald](https://apify.com/klondikeking) (community)
- **Categories:** Developer tools, Social media, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hacker News Scraper — Extract Stories, Comments & Engagement Data

Scrape Hacker News stories, comments, and engagement metrics using the official Algolia Search API. Filter by keywords, content type, date range, and sort order to get exactly the data you need for trend analysis, content curation, market research, and competitive intelligence.

### Use Cases

- **Trend Analysis** — Track what technologies, products, or topics are gaining traction on Hacker News.
- **Content Curation** — Discover high-performing stories in your niche for newsletter curation or social sharing.
- **Market Research** — Monitor discussions about your product, competitor, or industry to gauge sentiment and interest.
- **Competitive Intelligence** — Identify which stories get the most engagement and understand what resonates with the HN community.
- **Academic Research** — Collect large datasets of tech community discussions for NLP, sentiment analysis, or social science research.

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `searchQueries` | Array | No | Keywords to search for. Leave empty to fetch top/front page stories. |
| `tags` | String | No | Content type filter: `story`, `comment`, `poll`, `show_hn`, `ask_hn`, `front_page` (default: `story`) |
| `timeRange` | String | No | Filter by time: `all`, `last24h`, `lastWeek`, `lastMonth`, `lastYear` (default: `all`) |
| `sortBy` | String | No | Sort order: `relevance`, `date`, `points` (default: `relevance`) |
| `maxItems` | Number | No | Maximum results to return, 1–1000 (default: 100) |
| `proxyConfiguration` | Object | No | Optional proxy configuration for redundancy |

### Output

The Actor outputs a dataset with the following fields:

```json
{
  "objectID": "21530860",
  "title": "John Carmack: I'm going to work on artificial general intelligence",
  "url": "https://www.facebook.com/100006735798590/posts/2547632585471243/",
  "author": "jbredeche",
  "points": 1574,
  "numComments": 889,
  "createdAt": "2019-11-13T23:17:23Z",
  "storyText": "",
  "tags": ["story", "author_jbredeche", "story_21530860"],
  "searchQuery": "artificial intelligence"
}
````

### Pricing

Pay per event: **$0.001 per story scraped**.

No hidden fees. You only pay for the data you extract. The Algolia Search API is free and public, so there are no additional API costs.

### Limitations

- Algolia HN Search API returns up to 1,000 hits per query.
- Very old stories may not be indexed.
- The API does not support sorting by points directly; `sortBy: points` falls back to relevance.
- Rate limits are generous but not officially documented; the Actor uses conservative concurrency to avoid issues.

### FAQ

**Q: Do I need an API key?**
A: No. The Hacker News Algolia Search API is completely free and requires no authentication.

**Q: Can I scrape comments instead of stories?**
A: Yes. Set `tags` to `comment` to extract comments matching your search queries.

**Q: How far back does the data go?**
A: The Algolia index covers the full history of Hacker News, but very old items may have incomplete metadata.

**Q: What happens if I reach my spending limit?**
A: The Actor stops gracefully and saves all data collected up to that point.

### Changelog

- **v1.0.0** — Initial release with search, filtering, and pagination support.

# Actor input Schema

## `searchQueries` (type: `array`):

Keywords to search for on Hacker News. Leave empty to fetch top/front page stories.

## `tags` (type: `string`):

Filter by content type: story, comment, poll, pollopt, show\_hn, ask\_hn, front\_page

## `timeRange` (type: `string`):

Filter results by time range

## `sortBy` (type: `string`):

Sort order for results

## `maxItems` (type: `integer`):

Maximum number of results to return (max 1000)

## `proxyConfiguration` (type: `object`):

Optional proxy configuration for redundancy

## Actor input object example

```json
{
  "searchQueries": [
    "artificial intelligence"
  ],
  "tags": "story",
  "timeRange": "all",
  "sortBy": "relevance",
  "maxItems": 50,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

Link to the dataset containing scraped Hacker News stories

## `stats` (type: `string`):

Link to the key-value store containing run statistics

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQueries": [
        "artificial intelligence"
    ],
    "tags": "story",
    "timeRange": "all",
    "sortBy": "relevance",
    "maxItems": 50,
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("klondikeking/hacker-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQueries": ["artificial intelligence"],
    "tags": "story",
    "timeRange": "all",
    "sortBy": "relevance",
    "maxItems": 50,
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("klondikeking/hacker-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQueries": [
    "artificial intelligence"
  ],
  "tags": "story",
  "timeRange": "all",
  "sortBy": "relevance",
  "maxItems": 50,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call klondikeking/hacker-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=klondikeking/hacker-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Scraper",
        "version": "1.0",
        "x-build-id": "WdlcxPO98o8LzKLus"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/klondikeking~hacker-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-klondikeking-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/klondikeking~hacker-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-klondikeking-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/klondikeking~hacker-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-klondikeking-hacker-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "Keywords to search for on Hacker News. Leave empty to fetch top/front page stories.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "tags": {
                        "title": "Content Type Filter",
                        "type": "string",
                        "description": "Filter by content type: story, comment, poll, pollopt, show_hn, ask_hn, front_page",
                        "default": "story"
                    },
                    "timeRange": {
                        "title": "Time Range",
                        "enum": [
                            "all",
                            "last24h",
                            "lastWeek",
                            "lastMonth",
                            "lastYear"
                        ],
                        "type": "string",
                        "description": "Filter results by time range",
                        "default": "all"
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "relevance",
                            "date",
                            "points"
                        ],
                        "type": "string",
                        "description": "Sort order for results",
                        "default": "relevance"
                    },
                    "maxItems": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of results to return (max 1000)",
                        "default": 100
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional proxy configuration for redundancy"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
