# Hackage Scraper - Haskell Package Metadata (`klondikeking/hackage-scraper`) Actor

Extract metadata for Haskell packages from Hackage. Get package names, versions, descriptions, licenses, authors, maintainers, categories, and tags.

- **URL**: https://apify.com/klondikeking/hackage-scraper.md
- **Developed by:** [Pierrick McD0nald](https://apify.com/klondikeking) (community)
- **Categories:** Developer tools, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hackage Scraper — Haskell Package Metadata & Dependencies

Extract comprehensive metadata for Haskell packages from [Hackage](https://hackage.haskell.org/), the central package archive for the Haskell community. This Actor retrieves package names, versions, synopses, descriptions, licenses, authors, maintainers, categories, tags, and deprecation status.

### Use Cases

- **Package Discovery** — Browse and filter Haskell packages by category, license, or tags to find libraries for your project.
- **Dependency Analysis** — Audit package metadata, licenses, and deprecation status to ensure compliance and avoid unmaintained dependencies.
- **Ecosystem Monitoring** — Track new releases, deprecated packages, and trending libraries in the Haskell ecosystem.
- **Academic Research** — Collect structured metadata about Haskell packages for software engineering research and analysis.

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `packageNames` | Array | No | List of Hackage package names to scrape. Leave empty to scrape all packages. |
| `maxItems` | Number | No | Maximum packages to scrape (default: 100, max: 5000). |
| `includeDetails` | Boolean | No | Include full metadata (synopsis, description, license, etc.) (default: true). |
| `proxyConfiguration` | Object | No | Proxy configuration for requests. |

### Output

The Actor outputs a dataset with the following fields:

```json
{
  "packageName": "Cabal",
  "version": "3.16.1.0",
  "synopsis": "A framework for packaging Haskell software",
  "description": "The Haskell Common Architecture for Building Applications and Libraries...",
  "license": "BSD-3-Clause",
  "author": "Cabal Development Team",
  "maintainer": "cabal-devel@haskell.org",
  "category": "Distribution",
  "homepage": "http://www.haskell.org/cabal/",
  "bugReports": "https://github.com/haskell/cabal/issues",
  "url": "https://hackage.haskell.org/package/Cabal",
  "tags": ["bsd3", "distribution", "library"],
  "versionsCount": 5,
  "latestVersion": "3.16.1.0",
  "deprecated": false
}
````

### Pricing

Pay per event: **$0.001 per package extracted**. Proxy usage is included by default.

### Limitations

- Large full-catalog scrapes (20,000+ packages) may take significant time. Use `maxItems` to limit scope.
- Package metadata is sourced from Hackage HTML pages; structural changes may affect extraction.
- Very new or removed packages may not have complete metadata.

### FAQ

**Q: Can I scrape all Hackage packages at once?**
A: Yes. Leave `packageNames` empty and set `maxItems` to 0 (unlimited). Be aware this will process over 20,000 packages.

**Q: How do I find packages by category?**
A: Scrape all packages and filter the output dataset by the `category` field.

**Q: What happens if a package is deprecated?**
A: The `deprecated` field will be set to `true` and the `latestVersion` will reflect the latest available version.

### Changelog

- **v1.0.0** — Initial release

# Actor input Schema

## `packageNames` (type: `array`):

List of Hackage package names to scrape. Leave empty to scrape all packages.

## `maxItems` (type: `integer`):

Maximum number of packages to scrape. Set to 0 for unlimited.

## `includeDetails` (type: `boolean`):

Include full metadata for each package (synopsis, description, license, etc.)

## `proxyConfiguration` (type: `object`):

Configure proxy settings for requests

## Actor input object example

```json
{
  "packageNames": [
    "Cabal",
    "aeson",
    "lens"
  ],
  "maxItems": 100,
  "includeDetails": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

Link to the dataset containing scraped Hackage packages

## `stats` (type: `string`):

Run statistics stored in key-value store

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "packageNames": [
        "Cabal",
        "aeson",
        "lens"
    ],
    "maxItems": 100,
    "includeDetails": true,
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("klondikeking/hackage-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "packageNames": [
        "Cabal",
        "aeson",
        "lens",
    ],
    "maxItems": 100,
    "includeDetails": True,
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("klondikeking/hackage-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "packageNames": [
    "Cabal",
    "aeson",
    "lens"
  ],
  "maxItems": 100,
  "includeDetails": true,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call klondikeking/hackage-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=klondikeking/hackage-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hackage Scraper - Haskell Package Metadata",
        "description": "Extract metadata for Haskell packages from Hackage. Get package names, versions, descriptions, licenses, authors, maintainers, categories, and tags.",
        "version": "1.0",
        "x-build-id": "d4a2BA82julenrOEt"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/klondikeking~hackage-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-klondikeking-hackage-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/klondikeking~hackage-scraper/runs": {
            "post": {
                "operationId": "runs-sync-klondikeking-hackage-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/klondikeking~hackage-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-klondikeking-hackage-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "packageNames": {
                        "title": "Package Names",
                        "type": "array",
                        "description": "List of Hackage package names to scrape. Leave empty to scrape all packages.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Maximum Results",
                        "minimum": 0,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of packages to scrape. Set to 0 for unlimited.",
                        "default": 100
                    },
                    "includeDetails": {
                        "title": "Include Details",
                        "type": "boolean",
                        "description": "Include full metadata for each package (synopsis, description, license, etc.)",
                        "default": true
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Configure proxy settings for requests"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
