# Hacker News Search Scraper (`logiover/hacker-news-search-scraper`) Actor

Scrape Hacker News stories, comments and polls at scale via the Algolia API — title, author, URL, text, points, comment count and tags. Search by keyword, filter by points, thousands per run. Schedule it for a continuous HN feed.

- **URL**: https://apify.com/logiover/hacker-news-search-scraper.md
- **Developed by:** [Logiover](https://apify.com/logiover) (community)
- **Categories:** News, Developer tools
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🟠 Hacker News Search Scraper — Scrape HN Stories & Comments at Scale

Scrape **Hacker News** stories, comments and polls at scale through the official **HN Algolia search API**. This Hacker News scraper extracts title, author, URL, text, points, comment count, tags and created date — searchable by keyword and filterable by points. **No login, no API key, no blocking.** Thanks to date-windowed pagination it goes beyond Algolia's 1,000-result cap, returning thousands of items per run as JSON, CSV or Excel.

### ✨ What this Actor does / Key features

- ⚙️ **Official HN Algolia API** — fast, reliable, and not subject to anti-bot blocking.
- 📈 **Beyond the 1,000-result cap** — date-windowed pagination pulls tens of thousands of items per run.
- 🗂️ **Every item type** — stories, comments, polls, Show HN, Ask HN and front-page items.
- 🔎 **Keyword search** — track any topic, product, company or technology across all of Hacker News.
- ⭐ **Points filter** — return only items with at least a minimum number of points.
- 📦 **Rich data per item** — title, author, URL, text, points, comment count, tags, created date and HN link.
- ♾️ **Unlimited mode** — set `maxItems` to 0 to pull everything matching your query.
- ⏱️ **Schedule-ready** — built for recurring runs to maintain a continuously fresh HN feed.

### 🔍 Input

| Field | Type | Description |
|-------|------|-------------|
| `query` | string | Keyword to search across Hacker News (e.g. "AI", "startup", "rust"). Leave empty to scrape all items. |
| `itemType` | string | What to scrape: `story`, `comment`, `poll`, `show_hn`, `ask_hn` or `front_page`. |
| `minPoints` | integer | Only items with at least this many points. 0 = no filter. |
| `maxItems` | integer | Maximum items to save. Uses date-windowed pagination to go beyond Algolia's 1,000-result cap. 0 = all. |

### 🚀 Example input

```json
{
  "query": "AI",
  "itemType": "story",
  "minPoints": 10,
  "maxItems": 0
}
````

### 📦 Output

Each item is saved as a structured record in the dataset. Export to JSON, CSV, Excel or XML, or pull via the Apify API.

| Field | Description |
|-------|-------------|
| `objectId` | Unique Hacker News item ID |
| `type` | Item type (story, comment, poll, etc.) |
| `title` | Item title |
| `author` | Username of the author |
| `url` | External URL linked by the item (for stories) |
| `text` | Item text body (for comments, Ask HN, polls) |
| `points` | Number of points / upvotes |
| `numComments` | Number of comments |
| `tags` | Array of HN/Algolia tags |
| `createdAt` | Date the item was created |
| `hnUrl` | Direct link to the item on Hacker News |
| `scrapedAt` | Scrape timestamp (ISO 8601) |

### 💡 Use cases

- **Brand & topic monitoring** — track every Hacker News mention of your product, company or technology.
- **Trend research** — analyze what the tech community is discussing and how interest shifts over time.
- **Datasets & newsletters** — build a continuously updated Hacker News dataset for analysis or curation.
- **Sentiment & PR** — catch discussions about your brand or industry early.
- **Developer & market research** — study which tools, languages and startups gain traction on HN.
- **AI / LLM training data** — collect structured tech-discussion text for model training and analysis.

### ❓ Frequently Asked Questions

**Is it legal to scrape Hacker News?**
The Actor reads from the official public HN Algolia search API and only collects publicly available content. You are responsible for using the data in compliance with Hacker News' terms and applicable laws.

**Do I need an API key or a login?**
No. There is no Hacker News account, login or API key required. You only need an Apify account to run the Actor.

**How much data can I get?**
The Algolia API normally caps results at 1,000 per query, but this Actor uses date-windowed pagination to break past that limit — letting you pull tens of thousands of items per run. Set `maxItems` to 0 to capture everything matching your query.

**Can I scrape comments, not just stories?**
Yes. The `itemType` field supports stories, comments, polls, Show HN, Ask HN and front-page items.

**Can I filter by popularity?**
Yes. Use the `minPoints` filter to return only items above a points threshold.

**How fresh is the data and can I schedule it?**
Data is pulled live at run time. Schedule the Actor on Apify with a keyword query for a continuously fresh feed of new Hacker News activity.

**What output formats are supported?**
Results are stored in a structured Apify dataset and can be exported as JSON, CSV, Excel or XML, or accessed via the Apify API.

### ⏰ Scheduling & integration

Schedule this Actor on Apify to run on any cadence for a continuously fresh Hacker News feed. Export results to JSON, CSV or Excel, sync to Google Sheets, or push to dashboards, Slack/Discord alerts and webhooks through the Apify API.

# Actor input Schema

## `query` (type: `string`):

Keyword to search across Hacker News, e.g. 'AI', 'startup', 'rust'. Leave empty to scrape all items.

## `itemType` (type: `string`):

What to scrape.

## `minPoints` (type: `integer`):

Only items with at least this many points. 0 = no filter.

## `maxItems` (type: `integer`):

Maximum items to save. Uses date-windowed pagination to go beyond Algolia's 1,000-result cap. 0 = all.

## Actor input object example

```json
{
  "query": "",
  "itemType": "story",
  "minPoints": 0
}
```

# Actor output Schema

## `title` (type: `string`):

title

## `author` (type: `string`):

author

## `points` (type: `string`):

points

## `numComments` (type: `string`):

numComments

## `createdAt` (type: `string`):

createdAt

## `url` (type: `string`):

url

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("logiover/hacker-news-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("logiover/hacker-news-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call logiover/hacker-news-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=logiover/hacker-news-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Search Scraper",
        "description": "Scrape Hacker News stories, comments and polls at scale via the Algolia API — title, author, URL, text, points, comment count and tags. Search by keyword, filter by points, thousands per run. Schedule it for a continuous HN feed.",
        "version": "0.1",
        "x-build-id": "CY6fcSii5IHbEzNQu"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/logiover~hacker-news-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-logiover-hacker-news-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/logiover~hacker-news-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-logiover-hacker-news-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/logiover~hacker-news-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-logiover-hacker-news-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Keyword to search across Hacker News, e.g. 'AI', 'startup', 'rust'. Leave empty to scrape all items.",
                        "default": ""
                    },
                    "itemType": {
                        "title": "Item Type",
                        "enum": [
                            "story",
                            "comment",
                            "poll",
                            "show_hn",
                            "ask_hn",
                            "front_page"
                        ],
                        "type": "string",
                        "description": "What to scrape.",
                        "default": "story"
                    },
                    "minPoints": {
                        "title": "Minimum Points",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only items with at least this many points. 0 = no filter.",
                        "default": 0
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum items to save. Uses date-windowed pagination to go beyond Algolia's 1,000-result cap. 0 = all."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
