# HackerNews Insights Scraper — Stories, Comments & Velocity (`brilliant_gum/hackernews-insights-scraper`) Actor

Hacker News stories, full comment trees, user karma and contact info, story velocity tracking, history deltas. Search all 3.7M stories with filters for points, karma, domain, dates, keywords. For VCs hunting Show HN, recruiters mining talent, journalists tracking tech, and AI/RAG pipelines.

- **URL**: https://apify.com/brilliant\_gum/hackernews-insights-scraper.md
- **Developed by:** [Yuliia Kulakova](https://apify.com/brilliant_gum) (community)
- **Categories:** Lead generation, Developer tools, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.005 / story scraped

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## HackerNews Insights Scraper

Stories, comments, user karma, velocity tracking, and contact intelligence — turn Hacker News into a structured intelligence feed.

![HackerNews Insights Scraper](https://i.imgur.com/TMJWz8P.png)

---

### Why this scraper

Hacker News is the single highest-signal community in tech: where launches break, where engineers vent, where investors hunt. But the site itself gives you a ranked list and a thread view — no filters, no trends, no exports, no way to track a story's momentum or pull a list of senior commenters by domain expertise.

This scraper turns Hacker News into a structured data feed you can pipe straight into your CRM, dashboard, LLM, or spreadsheet. Search the entire 3.7M-story archive, filter by score / karma / domain / keywords, pull full comment trees with depth and reply analytics, enrich author profiles with contact info, and track how stories grow between runs.

---

### What you get

**Stories with rich context**
- Top, Best, New, Ask HN, Show HN, and Job listings
- Title, URL, domain, body text, author, score, comment count, submission time
- Auto-tagged: `story`, `show_hn`, `ask_hn`, `job`
- Permalinks straight back to news.ycombinator.com

**Full-text search across all 3.7M Hacker News stories**
- Search by keyword across the entire HN archive (2007 → today)
- Filter by tag, date range, score threshold, and author karma
- Sort by popularity or newest

**Complete comment trees with analytics**
- Recursive thread fetch with configurable depth (1–10 tiers)
- Hard cap per story so viral threads don't explode your budget
- Per-comment: author, body text, parent, depth, reply count, timestamps
- Per-story analytics: max depth, average replies per node, top 5 commenters by reply count

**User intelligence**
- Karma, account age, total submission count
- Recent submission IDs (last 50)
- Dominant domains the user posts about (signal for "what they're into")
- Average score on their last 20 stories

**Contact extraction from user bios**
- Emails, X/Twitter handles, GitHub, LinkedIn, Mastodon
- Personal websites (auto-classified separately from social profiles)
- Smart parsing that ignores false positives (an email like `thomas@fly.io` won't produce a fake `@fly` Twitter handle)

**Story velocity tracking**
- Points per hour and comments per hour since submission
- Points-to-comments ratio (viral vs. controversial signal)
- Story age in hours

**History delta — track stories across runs**
- Persistent snapshot store keyed by story ID
- On every subsequent run, each story includes: `scoreDelta`, `commentsDelta`, `scorePerHour`, `commentsPerHour`, `trend` (up/down/flat)
- See exactly how a Show HN gained traction overnight or how a controversial post peaked and stalled

**Input-time filters (the headline differentiator)**
- Minimum score, minimum comments, minimum author karma
- Date range (from / to)
- Keyword include list (case-insensitive title + body match)
- Domain whitelist (only stories pointing to `github.com`, `arxiv.org`, etc.)
- Filters apply *before* expensive fetches — you only pay for records that pass

---

### Use cases

| Who | What they pull |
|---|---|
| **VCs & angel investors** | Show HN deal flow — every product launch with > 100 points + maker contact + velocity since launch |
| **Recruiters** | High-karma authors who post about specific domains (Rust, ML, infrastructure) — with surfaced contact info |
| **Tech journalists** | Trending stories from arxiv.org, github.com, or competitor domains; sentiment via comment trees |
| **PR & comms teams** | Track when your company / product gets mentioned; full comment thread for response strategy |
| **AI / RAG engineers** | High-signal, opinion-rich training and retrieval data — full comments, not just titles |
| **Startup founders** | Competitor monitoring; see what users are saying about adjacent products in threads |
| **Product managers** | Pull all "Ask HN: how do you…" threads in your category for organic user research |
| **Open-source maintainers** | Find every HN discussion of your project across years; see which features users actually care about |

---

### Quick start

Drop this into the Input panel and run:

```json
{
    "lists": ["top"],
    "maxStoriesPerList": 30
}
````

You'll get 30 top stories with velocity analytics and tag classification — typically in under 15 seconds.

***

### Common input examples

**All Show HN with at least 500 points from the last year**

```json
{
    "tagSearchOnly": true,
    "tags": ["show_hn"],
    "minPoints": 500,
    "dateFrom": "2025-06-01",
    "maxStoriesPerQuery": 100,
    "sortBy": "popularity"
}
```

**Track a topic and gather author contacts**

```json
{
    "queries": ["LLM observability", "rust async"],
    "minPoints": 50,
    "includeAuthorProfiles": true,
    "includeContactInfo": true,
    "maxStoriesPerQuery": 30
}
```

**Pull a single story with the full comment tree**

```json
{
    "storyIds": ["48513806"],
    "includeComments": true,
    "commentDepth": 5,
    "maxCommentsPerStory": 500,
    "includeAuthorProfiles": true
}
```

**Daily monitor with growth tracking**

```json
{
    "lists": ["best"],
    "maxStoriesPerList": 50,
    "enableHistory": true,
    "minPoints": 100
}
```

Run on a schedule. Every run after the first includes `history.delta` showing how each story has grown.

**Domain-specific intelligence (arxiv papers on the front page)**

```json
{
    "queries": ["AI", "machine learning"],
    "domains": ["arxiv.org"],
    "minPoints": 100,
    "dateFrom": "2025-01-01",
    "maxStoriesPerQuery": 50
}
```

**Look up specific power users**

```json
{
    "userIds": ["pg", "tptacek", "patio11", "dang"],
    "includeContactInfo": true
}
```

***

### Output overview

Three record types in the dataset:

#### Story

| Field | Description |
|---|---|
| `type` | `"story"` or `"job"` |
| `id` | Hacker News story ID |
| `title` | Story title |
| `url` | External link (null for Ask HN / Tell HN self-posts) |
| `domain` | Apex domain of the URL |
| `text` | Body text for Ask HN / Show HN / Tell HN (HTML stripped) |
| `by` | Author username |
| `score` | Current points |
| `descendants` | Total comment count |
| `tag` | `story`, `show_hn`, `ask_hn`, or `job` |
| `createdAt` | ISO timestamp |
| `permalink` | Link to the story on news.ycombinator.com |
| `analytics` | `pointsPerHour`, `commentsPerHour`, `pointsToCommentsRatio`, `ageHours`, plus comment-tree shape stats when comments are fetched |
| `history` | `scoreDelta`, `commentsDelta`, `trend`, snapshot series — present when history tracking is on |

#### Comment

| Field | Description |
|---|---|
| `type` | `"comment"` |
| `id`, `parent`, `storyId` | Comment ID, immediate parent, root story |
| `by`, `text` | Author and full comment body (HTML stripped) |
| `depth` | 1 = direct reply, 2 = reply-to-reply, etc. |
| `replyCount` | Number of direct child replies |
| `createdAt`, `permalink` | Timestamp and link |

#### User

| Field | Description |
|---|---|
| `type` | `"user"` |
| `username` | HN handle |
| `karma` | Total karma |
| `about`, `aboutHtml` | Bio (cleaned and original) |
| `createdAt` | When the account was created |
| `submittedCount` | Lifetime submission count |
| `recentSubmittedIds` | Last 50 submission IDs |
| `contactInfo` | `emails`, `twitter`, `github`, `linkedin`, `mastodon`, `websites` |
| `activity` | Recent activity sample, dominant domains, average story score |

***

### Pricing

| Charge | Cost |
|---|---|
| Actor start | $0.01 per run |
| Story scraped | $0.005 per story (or job listing) |
| Comment scraped | $0.001 per comment |
| User profile scraped | $0.005 per user |

Records are only counted **after** filters pass — you don't pay for stories that get dropped by `minPoints`, `domains`, or `dateRange`. Comment trees and author profiles are opt-in.

**Worked examples:**

| Scenario | Stories | Comments | Users | Cost |
|---|---|---|---|---|
| 50 top stories, no comments | 50 | 0 | 0 | $0.26 |
| 100 Show HN historical search | 100 | 0 | 0 | $0.51 |
| 30 stories + full comment trees (~30 avg) | 30 | ~900 | 0 | $1.06 |
| 1 viral story + 500 comments + author profile | 1 | 500 | 1 | $0.52 |
| Daily best-of-50 monitor with author profiles | 50 | 0 | 50 | $0.51 |
| Deep weekly review: 100 stories + 5000 comments + 100 authors | 100 | 5000 | 100 | $6.01 |

Comments are intentionally priced low so that comment-tree analytics and AI/RAG workloads stay affordable.

***

### Proxies

Proxies are included and configured automatically. No setup required.

***

### FAQ

**Does this work with the Hacker News API directly?**
You don't need an API key or any setup. Pass an input, get a dataset. We handle the upstream calls.

**Can I get comments from before HN existed?**
Comments and stories go back to HN's launch in 2007. Full-text search covers the entire archive.

**Will this hit rate limits if I run it often?**
Hacker News exposes a generous public data surface for scrapers. Per-request throttling is built in. You can safely schedule this every 15 minutes for monitoring use cases.

**Can I track sentiment?**
The scraper returns full comment text. Sentiment is something you'd run downstream (an LLM call, your own classifier, etc.). We don't bundle sentiment to keep pricing flat and the data unopinionated.

**Why don't I see Twitter handles for `tptacek` even though his bio has email addresses?**
The contact parser is intentionally strict: it won't extract `@sockpuppet` from the email `thomas@sockpuppet.org` because that would be a false positive. Real Twitter handles (text like `@username` written as a standalone mention, or a `twitter.com/username` URL) are extracted reliably.

**How does history tracking work?**
Turn on `enableHistory: true` and pick a `historyStoreName`. On every run, each story's current score and comment count are snapshotted under that name. From the second run onward, every story includes a `history.delta` block with the change since the previous run, expressed as raw deltas and as per-hour rates.

**Why might a story I expected to see not appear in the output?**
Most often a filter dropped it. Check the log: it prints active filters at the start of every run. Common gotchas: `domains` set with a self-post (Ask HN has no URL → automatically dropped), `dateFrom` cutoff too aggressive, or `minAuthorKarma` filtering out new accounts.

**Does this fetch reply chains under deeply nested comments?**
Yes, up to `commentDepth` levels (default 3). HN threads sometimes go 8–10 levels deep; raise the limit if you need the full tree, but expect cost to scale with thread size.

**Can I export to CSV / XML / RSS?**
Apify supports all of those formats out of the box — pick your format in the "Export results" panel after a run finishes.

**What about private / dead / deleted content?**
Deleted comments and stories are skipped (you won't see hollow placeholder records). Reply chains beneath a deleted comment are still traversed when present.

**Will this work on a free Apify plan?**
Yes. Typical runs cost cents, well within the free tier's monthly compute budget.

***

### Limits (the honest list)

- **Show HN / Ask HN classification** is taken from the title prefix and from HN's own tags. Stories that aren't formally tagged Show HN but include "Show HN" in casual text will be classified as Show HN; this matches HN's own behavior.
- **Comment trees are capped** by `maxCommentsPerStory` (default 200). On the most viral threads (Anthropic-acquires-Bun-tier discussions with 1000+ comments) you'll get the top 200 by BFS order, not every leaf.
- **Comment sentiment / topic extraction is not included.** You get the raw text — sentiment is a downstream concern.
- **User contact extraction is best-effort.** It scans the bio the user wrote about themselves; if they didn't put their email in there, we can't surface it.
- **Real-time push / streaming is not supported.** This is a batch scraper. Schedule it on Apify's cron and pipe to a webhook for "almost-real-time" workflows.
- **No login-required content.** Everything we return is public — HN doesn't gate content behind auth in any meaningful way, so this is rarely a problem.

***

Maintained by **brilliant\_gum** on the Apify platform. Open an issue on the actor page for bugs, feature requests, or pricing questions.

# Actor input Schema

## `lists` (type: `array`):

Built-in HN front-page lists. 'top' = top stories, 'best' = highest rated this period, 'new' = newest, 'ask' = Ask HN, 'show' = Show HN, 'job' = job posts.

## `queries` (type: `array`):

Full-text search over Hacker News history via Algolia. Each query returns up to maxStoriesPerQuery matches. Searches across 3.7M stories from 2007 onwards.

## `tags` (type: `array`):

Restrict search results to specific HN tags. Only applied when 'queries' is non-empty or 'algoliaTagsOnly' is true.

## `tagSearchOnly` (type: `boolean`):

When checked, the scraper runs a tag-only search using the 'tags' field above — useful for fetching, say, 'all Show HN posts with > 100 points'. No 'queries' needed.

## `storyIds` (type: `array`):

Direct Hacker News story IDs to fetch (https://news.ycombinator.com/item?id=XXX).

## `userIds` (type: `array`):

Hacker News usernames to enrich (returns karma, about, contact info, dominant topics, submission velocity).

## `minPoints` (type: `integer`):

Skip stories below this score. 0 = no filter.

## `minComments` (type: `integer`):

Skip stories with fewer comments than this. 0 = no filter.

## `minAuthorKarma` (type: `integer`):

Skip stories whose author karma is below this. Requires extra user lookups (slower).

## `dateFrom` (type: `string`):

Only include stories submitted on/after this date.

## `dateTo` (type: `string`):

Only include stories submitted on/before this date.

## `keywords` (type: `array`):

Filter stories whose title contains ANY of these keywords (case-insensitive). Applied after fetch.

## `domains` (type: `array`):

Only keep stories whose URL domain matches one of these (e.g. 'arxiv.org', 'github.com'). Self-posts (Ask HN, Show HN text) skipped when this is set.

## `includeComments` (type: `boolean`):

Fetch every comment under each story (charged per comment). Off by default to keep costs predictable.

## `commentDepth` (type: `integer`):

Limit recursion depth for comment trees. 1 = top-level only, 3 = top + 2 reply tiers.

## `maxCommentsPerStory` (type: `integer`):

Hard cap on the number of comments fetched per story (after depth filter). Protects cost on viral threads.

## `includeAuthorProfiles` (type: `boolean`):

Add a user record per story author (karma, about, contact info, dominant topics).

## `maxStoriesPerList` (type: `integer`):

Cap stories taken from each live list (top/best/new/ask/show/job).

## `maxStoriesPerQuery` (type: `integer`):

Cap stories returned per Algolia search query.

## `sortBy` (type: `string`):

Algolia search ranking. 'popularity' = highest engagement first. 'date' = newest first.

## `includeAnalytics` (type: `boolean`):

Add per-story velocity (pointsPerHour, commentsPerHour), comment tree depth, top commenters, domain stats.

## `includeContactInfo` (type: `boolean`):

Scan user.about for emails, Twitter/X handles, GitHub, LinkedIn, websites. Applied to author profiles and userIds.

## `enableHistory` (type: `boolean`):

Persist a per-story snapshot of score and comment count between runs. Each story output then includes 'history.delta' (scoreDelta, commentsDelta, scorePerHour, isNew). Stored in the 'hackernews-history' KV store.

## `historyStoreName` (type: `string`):

Named key-value store for history snapshots.

## `maxConcurrency` (type: `integer`):

Maximum parallel fetches (item + comment lookups).

## `minDelayMs` (type: `integer`):

Floor delay between requests per worker. Firebase tolerates very high rates; default is conservative.

## `proxyConfiguration` (type: `object`):

Optional. Proxies are included and configured automatically — leave this empty.

## Actor input object example

```json
{
  "lists": [
    "top"
  ],
  "queries": [],
  "tags": [],
  "tagSearchOnly": false,
  "storyIds": [],
  "userIds": [],
  "minPoints": 0,
  "minComments": 0,
  "minAuthorKarma": 0,
  "dateFrom": "",
  "dateTo": "",
  "keywords": [],
  "domains": [],
  "includeComments": false,
  "commentDepth": 3,
  "maxCommentsPerStory": 200,
  "includeAuthorProfiles": false,
  "maxStoriesPerList": 30,
  "maxStoriesPerQuery": 50,
  "sortBy": "popularity",
  "includeAnalytics": true,
  "includeContactInfo": true,
  "enableHistory": false,
  "historyStoreName": "hackernews-history",
  "maxConcurrency": 6,
  "minDelayMs": 30
}
```

# Actor output Schema

## `summary` (type: `string`):

Counts of scraped records and pointers to the dataset and historical snapshot store.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "lists": [
        "top"
    ],
    "queries": [],
    "tags": [],
    "storyIds": [],
    "userIds": [],
    "minPoints": 0,
    "minComments": 0,
    "minAuthorKarma": 0,
    "keywords": [],
    "domains": []
};

// Run the Actor and wait for it to finish
const run = await client.actor("brilliant_gum/hackernews-insights-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "lists": ["top"],
    "queries": [],
    "tags": [],
    "storyIds": [],
    "userIds": [],
    "minPoints": 0,
    "minComments": 0,
    "minAuthorKarma": 0,
    "keywords": [],
    "domains": [],
}

# Run the Actor and wait for it to finish
run = client.actor("brilliant_gum/hackernews-insights-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "lists": [
    "top"
  ],
  "queries": [],
  "tags": [],
  "storyIds": [],
  "userIds": [],
  "minPoints": 0,
  "minComments": 0,
  "minAuthorKarma": 0,
  "keywords": [],
  "domains": []
}' |
apify call brilliant_gum/hackernews-insights-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=brilliant_gum/hackernews-insights-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "HackerNews Insights Scraper — Stories, Comments & Velocity",
        "description": "Hacker News stories, full comment trees, user karma and contact info, story velocity tracking, history deltas. Search all 3.7M stories with filters for points, karma, domain, dates, keywords. For VCs hunting Show HN, recruiters mining talent, journalists tracking tech, and AI/RAG pipelines.",
        "version": "1.0",
        "x-build-id": "drdCrEWn8lykNR9tf"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/brilliant_gum~hackernews-insights-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-brilliant_gum-hackernews-insights-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/brilliant_gum~hackernews-insights-scraper/runs": {
            "post": {
                "operationId": "runs-sync-brilliant_gum-hackernews-insights-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/brilliant_gum~hackernews-insights-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-brilliant_gum-hackernews-insights-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "lists": {
                        "title": "Live lists",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Built-in HN front-page lists. 'top' = top stories, 'best' = highest rated this period, 'new' = newest, 'ask' = Ask HN, 'show' = Show HN, 'job' = job posts.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "top",
                                "best",
                                "new",
                                "ask",
                                "show",
                                "job"
                            ],
                            "enumTitles": [
                                "Top stories (top)",
                                "Best stories (best)",
                                "New stories (new)",
                                "Ask HN (ask)",
                                "Show HN (show)",
                                "Job posts (job)"
                            ]
                        },
                        "default": []
                    },
                    "queries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "Full-text search over Hacker News history via Algolia. Each query returns up to maxStoriesPerQuery matches. Searches across 3.7M stories from 2007 onwards.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "tags": {
                        "title": "Algolia tag filter",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Restrict search results to specific HN tags. Only applied when 'queries' is non-empty or 'algoliaTagsOnly' is true.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "story",
                                "comment",
                                "poll",
                                "pollopt",
                                "show_hn",
                                "ask_hn",
                                "front_page",
                                "job"
                            ],
                            "enumTitles": [
                                "Story",
                                "Comment",
                                "Poll",
                                "Poll option",
                                "Show HN",
                                "Ask HN",
                                "Made the front page",
                                "Job"
                            ]
                        },
                        "default": []
                    },
                    "tagSearchOnly": {
                        "title": "Search by tags only (no query needed)",
                        "type": "boolean",
                        "description": "When checked, the scraper runs a tag-only search using the 'tags' field above — useful for fetching, say, 'all Show HN posts with > 100 points'. No 'queries' needed.",
                        "default": false
                    },
                    "storyIds": {
                        "title": "Story IDs",
                        "type": "array",
                        "description": "Direct Hacker News story IDs to fetch (https://news.ycombinator.com/item?id=XXX).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "userIds": {
                        "title": "User IDs",
                        "type": "array",
                        "description": "Hacker News usernames to enrich (returns karma, about, contact info, dominant topics, submission velocity).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "minPoints": {
                        "title": "Minimum points (score)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Skip stories below this score. 0 = no filter.",
                        "default": 0
                    },
                    "minComments": {
                        "title": "Minimum comments",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Skip stories with fewer comments than this. 0 = no filter.",
                        "default": 0
                    },
                    "minAuthorKarma": {
                        "title": "Minimum author karma",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Skip stories whose author karma is below this. Requires extra user lookups (slower).",
                        "default": 0
                    },
                    "dateFrom": {
                        "title": "From date (ISO YYYY-MM-DD)",
                        "type": "string",
                        "description": "Only include stories submitted on/after this date.",
                        "default": ""
                    },
                    "dateTo": {
                        "title": "To date (ISO YYYY-MM-DD)",
                        "type": "string",
                        "description": "Only include stories submitted on/before this date.",
                        "default": ""
                    },
                    "keywords": {
                        "title": "Keywords (any-match, OR)",
                        "type": "array",
                        "description": "Filter stories whose title contains ANY of these keywords (case-insensitive). Applied after fetch.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "domains": {
                        "title": "URL domains filter",
                        "type": "array",
                        "description": "Only keep stories whose URL domain matches one of these (e.g. 'arxiv.org', 'github.com'). Self-posts (Ask HN, Show HN text) skipped when this is set.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeComments": {
                        "title": "Include full comment trees",
                        "type": "boolean",
                        "description": "Fetch every comment under each story (charged per comment). Off by default to keep costs predictable.",
                        "default": false
                    },
                    "commentDepth": {
                        "title": "Max comment depth",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Limit recursion depth for comment trees. 1 = top-level only, 3 = top + 2 reply tiers.",
                        "default": 3
                    },
                    "maxCommentsPerStory": {
                        "title": "Max comments per story",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Hard cap on the number of comments fetched per story (after depth filter). Protects cost on viral threads.",
                        "default": 200
                    },
                    "includeAuthorProfiles": {
                        "title": "Enrich with author profiles",
                        "type": "boolean",
                        "description": "Add a user record per story author (karma, about, contact info, dominant topics).",
                        "default": false
                    },
                    "maxStoriesPerList": {
                        "title": "Max stories per list",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Cap stories taken from each live list (top/best/new/ask/show/job).",
                        "default": 30
                    },
                    "maxStoriesPerQuery": {
                        "title": "Max stories per query",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Cap stories returned per Algolia search query.",
                        "default": 50
                    },
                    "sortBy": {
                        "title": "Algolia sort",
                        "enum": [
                            "popularity",
                            "date"
                        ],
                        "type": "string",
                        "description": "Algolia search ranking. 'popularity' = highest engagement first. 'date' = newest first.",
                        "default": "popularity"
                    },
                    "includeAnalytics": {
                        "title": "Compute story analytics",
                        "type": "boolean",
                        "description": "Add per-story velocity (pointsPerHour, commentsPerHour), comment tree depth, top commenters, domain stats.",
                        "default": true
                    },
                    "includeContactInfo": {
                        "title": "Extract user contact info",
                        "type": "boolean",
                        "description": "Scan user.about for emails, Twitter/X handles, GitHub, LinkedIn, websites. Applied to author profiles and userIds.",
                        "default": true
                    },
                    "enableHistory": {
                        "title": "Track score/comment growth across runs",
                        "type": "boolean",
                        "description": "Persist a per-story snapshot of score and comment count between runs. Each story output then includes 'history.delta' (scoreDelta, commentsDelta, scorePerHour, isNew). Stored in the 'hackernews-history' KV store.",
                        "default": false
                    },
                    "historyStoreName": {
                        "title": "History KV store name",
                        "type": "string",
                        "description": "Named key-value store for history snapshots.",
                        "default": "hackernews-history"
                    },
                    "maxConcurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum parallel fetches (item + comment lookups).",
                        "default": 6
                    },
                    "minDelayMs": {
                        "title": "Per-request throttle (ms)",
                        "minimum": 0,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Floor delay between requests per worker. Firebase tolerates very high rates; default is conservative.",
                        "default": 30
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional. Proxies are included and configured automatically — leave this empty."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
