# Hacker News Data Scraper (`kawsar/hacker-news-data-scraper`) Actor

Hacker News scraper that pulls stories, jobs, Ask HN and Show HN posts from news.ycombinator.com, so developers and SEO teams can track tech trends and job listings without manual browsing.

- **URL**: https://apify.com/kawsar/hacker-news-data-scraper.md
- **Developed by:** [Kawsar](https://apify.com/kawsar) (community)
- **Categories:** Developer tools, Jobs, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.99 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hacker News Data Scraper: extract stories, jobs, and posts from news.ycombinator.com

Pulls structured data from news.ycombinator.com. Covers all six feeds (top, new, best, ask, show, and jobs), returns post titles, URLs, points, authors, comment counts, and post types, and pages through automatically until you hit your item limit. Works with any HN feed URL — paste a URL like `https://news.ycombinator.com/show?p=5` into Start URLs and it will paginate forward from that page.

### What data does this actor return?

| Field | Type | Description | Example |
|-------|------|-------------|---------|
| `itemId` | integer | Hacker News item ID | 48031684 |
| `rank` | integer | Position in the feed | 1 |
| `storyTitle` | string | Post title | "Agents can now create Cloudflare accounts" |
| `url` | string | Linked URL (internal HN link for Ask/Show) | https://blog.cloudflare.com/... |
| `domain` | string | Domain extracted from the linked URL | cloudflare.com |
| `points` | integer\|null | Upvote score (null for job posts) | 200 |
| `author` | string\|null | Submitter username (null for job posts) | rolph |
| `commentCount` | integer\|null | Number of comments (null for job posts) | 108 |
| `commentsUrl` | string\|null | HN discussion thread URL | https://news.ycombinator.com/item?id=... |
| `age` | string | Post age as displayed on HN | 3 hours ago |
| `postType` | string | One of: story, job, ask, show, launch | story |
| `scrapedAt` | string | ISO 8601 UTC timestamp | 2026-05-06T10:00:00+00:00 |

### How to use

#### Option 1: Scrape a feed

1. Open the input tab
2. Pick a feed type: top, new, best, ask, show, or jobs
3. Set your item limit (up to 1000)
4. Click Run

The actor pages through HN automatically (30 items per page) until it hits your limit.

#### Option 2: Start from a specific page

Add any HN feed URL to the Start URLs field. The actor detects the page number from the URL and paginates forward from there.

Examples:
- `https://news.ycombinator.com/show?p=3` — starts at Show HN page 3 and pages forward
- `https://news.ycombinator.com/newest` — scrapes the New feed from page 1
- `https://news.ycombinator.com/ask?p=10` — starts at Ask HN page 10

Multiple URLs are supported. The actor processes each in order and stops when it hits your item limit.

### Input

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `feedType` | string (select) | `top` | Feed to scrape when no Start URLs are set |
| `startUrls` | array of strings | `[]` | HN URLs to start paginating from. Overrides Feed type. |
| `maxItems` | integer | `100` | Max items to collect per run (up to 1000) |
| `requestTimeoutSecs` | integer | `30` | Per-request timeout in seconds |

#### Feed type options

| Value | URL | Description |
|-------|-----|-------------|
| `top` | news.ycombinator.com/ | Front page — highest-voted recent stories |
| `new` | news.ycombinator.com/newest | Newest submissions, unfiltered |
| `best` | news.ycombinator.com/best | Highest-voted of all time |
| `ask` | news.ycombinator.com/ask | Ask HN posts only |
| `show` | news.ycombinator.com/show | Show HN and Launch HN posts only |
| `jobs` | news.ycombinator.com/jobs | YC startup job listings |

### Example output

```json
[
  {
    "itemId": 48031684,
    "rank": 1,
    "storyTitle": "Agents can now create Cloudflare accounts, buy domains, and deploy products",
    "url": "https://blog.cloudflare.com/agents-stripe-projects/",
    "domain": "cloudflare.com",
    "points": 200,
    "author": "rolph",
    "commentCount": 108,
    "commentsUrl": "https://news.ycombinator.com/item?id=48031684",
    "age": "3 hours ago",
    "postType": "story",
    "scrapedAt": "2026-05-06T10:00:00.000000+00:00"
  },
  {
    "itemId": 48025244,
    "rank": 1,
    "storyTitle": "Proliferate (YC S25) Is Hiring",
    "url": "https://www.ycombinator.com/companies/proliferate/jobs/...",
    "domain": "ycombinator.com",
    "points": null,
    "author": null,
    "commentCount": null,
    "commentsUrl": "https://news.ycombinator.com/item?id=48025244",
    "age": "13 hours ago",
    "postType": "job",
    "scrapedAt": "2026-05-06T10:00:00.000000+00:00"
  }
]
````

### How pagination works

Each HN feed page returns 30 items. The actor increments the `?p=` query parameter and fetches the next page until either your `maxItems` limit is reached or there are no more items. If you set `maxItems` to 300, the actor fetches 10 pages automatically.

When you use Start URLs with a page number (e.g. `?p=5`), the actor starts at that page and paginates forward — it does not go back to page 1.

### Use cases

- SEO research: track which tech topics trend on HN and use that to shape your content calendar
- Job market monitoring: collect startup listings from the jobs feed and compare them week over week
- Show HN and Launch HN watching: see what new products the community pays attention to
- Content curation: pull top stories automatically for newsletters or internal feeds
- Dataset building: community engagement data (points, comment counts) across thousands of posts over time
- Competitive intelligence: monitor mentions of competitor products or technologies in trending discussions

### Scheduling

To collect HN data on a recurring schedule, use Apify's built-in scheduler:

1. Go to your actor page and click Schedules
2. Set a cron expression (e.g. `0 9 * * *` for 9am daily)
3. Configure the input (feed type, item limit)
4. Each run's results land in a separate dataset

This works well for building historical trend datasets over days or weeks.

### Limitations

- Max 1000 items per run (HN has no API rate limit, but this keeps run costs predictable)
- Comment content is not extracted — post-level data only
- Job posts return null for `points`, `author`, and `commentCount` (HN does not display these for jobs)
- HN's "best" feed is relatively small — it may return fewer than 200 unique items before repeating
- The `age` field is a human-readable string from HN ("3 hours ago"), not a parsed timestamp

### FAQ

**What feeds are supported?**
Top, new, best, ask, show, and jobs.

**How many items can I collect per run?**
Up to 1000. Each page has 30 items and the actor pages through automatically.

**Can I start scraping from a specific page?**
Yes. Add a URL like `https://news.ycombinator.com/show?p=5` to Start URLs. The actor reads the page number from the URL and paginates forward from there.

**Can I scrape multiple feeds in one run?**
Yes. Add multiple feed URLs to Start URLs (e.g. both `/show` and `/ask`) and the actor will scrape each in sequence until `maxItems` is reached.

**Does it scrape comments?**
No. Post-level only: title, URL, points, author, comment count. Comment text is not extracted.

**Do job posts include points and author?**
No. HN job posts do not show vote counts or usernames. Those fields come back null.

**How does post type detection work?**
Title prefix: "Ask HN:" becomes ask, "Show HN:" becomes show, "Launch HN:" becomes launch. Posts from the jobs feed are always tagged job. Everything else is story.

**Can I export results to CSV or Excel?**
Yes. In the Apify dataset view, click Export and choose CSV, Excel, JSON, or JSONL.

# Actor input Schema

## `feedType` (type: `string`):

Which Hacker News feed to scrape. Top returns the front page; New returns the newest submissions; Best returns highest-voted all-time; Ask, Show, and Jobs filter by post type.

## `startUrls` (type: `array`):

Optional list of specific Hacker News page URLs to scrape instead of a feed. When provided, Feed type is ignored.

## `maxItems` (type: `integer`):

Maximum number of items to collect per run. Each feed page returns 30 items.

## `requestTimeoutSecs` (type: `integer`):

Per-request timeout in seconds. Increase if you encounter timeout errors.

## Actor input object example

```json
{
  "feedType": "top",
  "startUrls": [
    "https://news.ycombinator.com/",
    "https://news.ycombinator.com/jobs"
  ],
  "maxItems": 30,
  "requestTimeoutSecs": 300
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "feedType": "top",
    "startUrls": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("kawsar/hacker-news-data-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "feedType": "top",
    "startUrls": [],
}

# Run the Actor and wait for it to finish
run = client.actor("kawsar/hacker-news-data-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "feedType": "top",
  "startUrls": []
}' |
apify call kawsar/hacker-news-data-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=kawsar/hacker-news-data-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Data Scraper",
        "description": "Hacker News scraper that pulls stories, jobs, Ask HN and Show HN posts from news.ycombinator.com, so developers and SEO teams can track tech trends and job listings without manual browsing.",
        "version": "0.0",
        "x-build-id": "wDXCFqhvPj8vE6rrI"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/kawsar~hacker-news-data-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-kawsar-hacker-news-data-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/kawsar~hacker-news-data-scraper/runs": {
            "post": {
                "operationId": "runs-sync-kawsar-hacker-news-data-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/kawsar~hacker-news-data-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-kawsar-hacker-news-data-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "feedType"
                ],
                "properties": {
                    "feedType": {
                        "title": "Feed type",
                        "enum": [
                            "top",
                            "new",
                            "best",
                            "ask",
                            "show",
                            "jobs"
                        ],
                        "type": "string",
                        "description": "Which Hacker News feed to scrape. Top returns the front page; New returns the newest submissions; Best returns highest-voted all-time; Ask, Show, and Jobs filter by post type.",
                        "default": "top"
                    },
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "Optional list of specific Hacker News page URLs to scrape instead of a feed. When provided, Feed type is ignored.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of items to collect per run. Each feed page returns 30 items.",
                        "default": 30
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 50,
                        "maximum": 1200,
                        "type": "integer",
                        "description": "Per-request timeout in seconds. Increase if you encounter timeout errors.",
                        "default": 300
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
