# TOR Scraper - Dark Web & .onion Site Data Extractor (`igolaizola/tor-dark-web-scraper`) Actor

Scrape .onion hidden services and websites anonymously over TOR. Provide a list of URLs or search the dark web by keyword, extract page content, and pull any data using CSS selectors. No setup required. TOR runs automatically in the cloud.

- **URL**: https://apify.com/igolaizola/tor-dark-web-scraper.md
- **Developed by:** [Iñigo Garcia Olaizola](https://apify.com/igolaizola) (community)
- **Categories:** Developer tools, Automation, Integrations
- **Stats:** 2 total users, 1 monthly users, 75.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.70 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🧅 TOR Scraper – Dark Web & .onion Site Data Extractor

Extract data from any `.onion` hidden service or website anonymously over the TOR network. Scrape URLs directly or **search the dark web by keyword** (one or more queries) and let the actor discover and scrape matching hidden services for you — no setup, no proxies to configure, no technical knowledge required. Just run.

### ⚡ What can it do?

- **Scrape any .onion address** or regular website routed through TOR
- **Search the dark web** with a keyword and automatically scrape matching hidden services
- **Extract custom data** from any page using CSS selectors — links, headlines, prices, metadata, anything
- **Capture full content** — page title, plain text, and raw HTML for every URL
- **Never lose a result** — even failed pages are saved with an error message so you always know what happened

### 🤔 Why use this actor?

Whether you're a researcher monitoring dark web forums, a journalist tracking hidden services, or a security professional gathering threat intelligence, this actor handles the hard part: accessing TOR, fetching pages, parsing content, and delivering clean structured data — ready to export as JSON, CSV, or Excel.

No TOR installation. No proxy configuration. Just run.

### 🚀 How it works

1. **Paste your URLs** — enter any `.onion` or regular `https://` address
2. **Or search the dark web** — enter one or more keywords to discover and scrape matching hidden services automatically
3. **Optionally define extraction rules** — use CSS selectors to pull specific data points from every page
4. **Get your data** — results are saved to a structured dataset the moment each page is scraped

### 📥 Input

| Field | Key | Type | Description |
|-------|----------|------|-------------|
| **URLs** | `urls` | string[] | One or more URLs to scrape directly. Supports both `.onion` hidden service addresses and regular `https://` URLs. |
| **Search queries** | `queries` | string[] | One or more keywords. The actor will search the dark web for each query and scrape the matching hidden services. Can be combined with URLs. |
| **Max Items** | `maxItems` | integer | Maximum number of pages to scrape across all queries and URLs. Set to `0` (default) for no limit. |
| **CSS selectors** | `selectors` | object[] | Optional extraction rules. See [Extracting specific data with CSS selectors](#-extracting-specific-data-with-css-selectors) for details and examples. |

### 📦 Output

Every scraped URL produces one dataset row:

| Field | Description |
|-------|-------------|
| **URL** | The address that was scraped |
| **Status code** | HTTP response code (0 if the server was unreachable) |
| **Title** | Page `<title>` |
| **Text** | Full plain text — scripts, styles, and markup removed |
| **HTML** | Raw HTML source |
| **Fields** | Arrays of values extracted with your CSS selectors |
| **Search title** | Title of the result as returned by the dark web search engine (only when using keyword search) |
| **Search description** | Snippet returned by the dark web search engine (only when using keyword search) |
| **Error** | Error message if the page failed to load |

### 🎯 Extracting specific data with CSS selectors

Want to pull just the links, headlines, or prices from each page? Define named extraction rules as JSON:

```json
[
  { "name": "heading",     "selector": "h1:first-of-type",                          "single": true },
  { "name": "links",       "selector": "a::attr(href)" },
  { "name": "images",      "selector": "img::attr(src)" },
  { "name": "description", "selector": "meta[name=description]::attr(content)",     "single": true },
  { "name": "navLinks",    "selector": "nav a::attr(href)" },
  { "name": "ogImage",     "selector": "meta[property='og:image']::attr(content)",  "single": true },
  { "name": "feed",        "selector": "link[type='application/rss+xml']::attr(href)", "single": true }
]
````

Each rule supports three properties:

| Property | Description |
|----------|-------------|
| `name` | Key used in the `fields` output |
| `selector` | CSS selector. Append `::attr(name)` to extract an HTML attribute instead of text |
| `single` | If `true`, returns only the first match as a plain string instead of an array |

**Returning arrays vs single values:** by default every rule returns all matching elements as an array. Set `"single": true` when you expect exactly one result (a page title, a canonical URL, a meta description) and want a plain string in the output.

**Picking a specific element:** use standard CSS pseudo-classes — `:first-of-type`, `:last-of-type`, `:nth-of-type(n)` — to target a particular occurrence before extracting:

```json
{ "name": "secondHeading", "selector": "h2:nth-of-type(2)", "single": true }
```

**Example output:**

```json
{
  "heading":     "Welcome to the dark web",
  "links":       ["http://example.onion/about", "http://example.onion/contact"],
  "description": "An anonymous service for ...",
  "navLinks":    ["http://example.onion/home", "http://example.onion/faq"]
}
```

Results appear in the `fields` column of your dataset.

### 💡 Use cases

- **Dark web research** — monitor `.onion` forums, blogs, and marketplaces for content and changes
- **Threat intelligence** — track leaked credentials, infrastructure, and activity across hidden services
- **Investigative journalism** — access and archive TOR-only sources securely and anonymously
- **Academic research** — map and analyze dark web content at scale
- **Anonymous data collection** — scrape any website without exposing your IP address

### ❓ Frequently asked questions

**Do I need to install TOR or configure anything?**
No. TOR runs automatically in the cloud inside the actor — zero setup required.

**Can it scrape regular websites as well as .onion sites?**
Yes. Any `https://` URL works alongside `.onion` addresses in the same run.

**What happens if a page fails to load?**
The page is still saved to the dataset with the error message and status code, so you never lose track of which URLs were attempted.

**How do I find .onion sites to scrape?**
Use the built-in dark web search: enter one or more keywords and the actor discovers matching hidden services and scrapes them automatically — no manual URL hunting needed.

**How many pages can I scrape?**
By default there is no limit. Use **Max Items** to cap the number of pages scraped in a single run.

# Actor input Schema

## `urls` (type: `array`):

List of URLs to scrape directly. Supports both .onion hidden service URLs and regular URLs accessed via TOR.

## `queries` (type: `array`):

One or more search queries. The actor will search for matching .onion services for each query and scrape the results. Use together with Max Items to control how many pages are fetched.

## `maxItems` (type: `integer`):

Maximum number of pages to scrape. Set to 0 (default) for no limit.

## `selectors` (type: `array`):

Optional list of named CSS extraction rules. Each rule matches all elements on the page and returns their text content as an array. Use ::attr(name) to extract an attribute value instead (e.g. "a::attr(href)" to collect all links). Set single to true to return only the first match as a plain string.

## Actor input object example

```json
{
  "urls": [],
  "queries": [
    "data leak"
  ],
  "maxItems": 100,
  "selectors": [
    {
      "name": "heading",
      "selector": "h1:first-of-type",
      "single": true
    },
    {
      "name": "links",
      "selector": "a::attr(href)"
    }
  ]
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [],
    "queries": [
        "data leak"
    ],
    "selectors": [
        {
            "name": "heading",
            "selector": "h1:first-of-type",
            "single": true
        },
        {
            "name": "links",
            "selector": "a::attr(href)"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("igolaizola/tor-dark-web-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": [],
    "queries": ["data leak"],
    "selectors": [
        {
            "name": "heading",
            "selector": "h1:first-of-type",
            "single": True,
        },
        {
            "name": "links",
            "selector": "a::attr(href)",
        },
    ],
}

# Run the Actor and wait for it to finish
run = client.actor("igolaizola/tor-dark-web-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [],
  "queries": [
    "data leak"
  ],
  "selectors": [
    {
      "name": "heading",
      "selector": "h1:first-of-type",
      "single": true
    },
    {
      "name": "links",
      "selector": "a::attr(href)"
    }
  ]
}' |
apify call igolaizola/tor-dark-web-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=igolaizola/tor-dark-web-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "TOR Scraper - Dark Web & .onion Site Data Extractor",
        "description": "Scrape .onion hidden services and websites anonymously over TOR. Provide a list of URLs or search the dark web by keyword, extract page content, and pull any data using CSS selectors. No setup required. TOR runs automatically in the cloud.",
        "version": "0.0",
        "x-build-id": "1YYBvsKVYZJyXJrBy"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/igolaizola~tor-dark-web-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-igolaizola-tor-dark-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/igolaizola~tor-dark-web-scraper/runs": {
            "post": {
                "operationId": "runs-sync-igolaizola-tor-dark-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/igolaizola~tor-dark-web-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-igolaizola-tor-dark-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "urls": {
                        "title": "URLs",
                        "type": "array",
                        "description": "List of URLs to scrape directly. Supports both .onion hidden service URLs and regular URLs accessed via TOR.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "queries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "One or more search queries. The actor will search for matching .onion services for each query and scrape the results. Use together with Max Items to control how many pages are fetched.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of pages to scrape. Set to 0 (default) for no limit.",
                        "default": 100
                    },
                    "selectors": {
                        "title": "CSS selectors",
                        "type": "array",
                        "description": "Optional list of named CSS extraction rules. Each rule matches all elements on the page and returns their text content as an array. Use ::attr(name) to extract an attribute value instead (e.g. \"a::attr(href)\" to collect all links). Set single to true to return only the first match as a plain string."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
