Darkweb Scraper avatar

Darkweb Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Darkweb Scraper

Darkweb Scraper

Crawl dark web .onion sites via Tor. Extract links, emails, phone numbers, cryptocurrency wallet addresses, social media handles, and API keys from hidden services.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

1

Bookmarked

60

Total users

9

Monthly active users

a day ago

Last modified

Share

Crawl dark web .onion sites via Tor and extract sensitive data including emails, phone numbers, cryptocurrency wallet addresses, social media handles, and exposed API keys. Ideal for OSINT, threat intelligence, and security research.

What is Darkweb Scraper?

Darkweb Scraper is an Apify actor that accesses Tor hidden services (.onion sites) and extracts structured data from them. It bundles a Tor daemon internally, so you don't need any special setup or proxy configuration. Provide a search keyword or direct .onion URLs, and the scraper will crawl the dark web and return organized results.

What can this actor do?

  • Crawl .onion sites — Navigate through dark web pages with configurable crawl depth and page limits
  • Search the dark web — Enter any keyword and discover relevant .onion sites automatically via dark web search engines (best-effort; search engines are frequently offline)
  • Extract emails — Find email addresses embedded in dark web pages
  • Extract phone numbers — Detect phone numbers in international formats
  • Extract cryptocurrency addresses — Identify Bitcoin, Ethereum, Monero, Litecoin, Bitcoin Cash, and Ripple wallet addresses
  • Extract social media handles — Find Twitter/X, Instagram, and Telegram usernames and links
  • Detect exposed API keys — Discover exposed AWS keys, Google API keys, and generic credential strings
  • Keyword matching — Check whether your search term appears on each crawled page

Use cases

  • Threat intelligence — Monitor the dark web for leaked credentials, stolen data, or mentions of your organization
  • Brand protection — Detect unauthorized use of your brand name or products on hidden services
  • Security research — Discover exposed API keys, wallet addresses, and sensitive data on .onion sites
  • OSINT investigations — Map dark web site structures, discover linked hidden services, and extract contact information
  • Cryptocurrency tracking — Find wallet addresses associated with dark web activity

Input

FieldTypeDefaultDescription
ModeSelectcrawlHow to discover pages: crawl (start URLs only — most reliable), search (query dark web engines), searchAndCrawl (merge search seeds with start URLs).
Search KeywordStringKeyword to query dark web search engines (modes search, searchAndCrawl).
Start URLsURL ListBBC News onionDirect .onion URLs to crawl (modes crawl, searchAndCrawl). Must be valid Tor hidden service addresses.
Max Crawl DepthInteger1Maximum link depth to follow from seed pages. 0 = only the provided URLs.
Max Pages to CrawlInteger5Maximum number of pages to fetch during the crawl.
Max Output ItemsInteger5Maximum number of items to include in the output dataset.
Extract emailsBooleantrueFind email addresses on each page.
Extract phone numbersBooleantrueDetect phone numbers on each page.
Extract cryptocurrency addressesBooleantrueIdentify BTC / ETH / XMR / LTC / BCH / XRP wallet addresses.
Extract social media handlesBooleantrueFind Twitter/X, Instagram, and Telegram handles.
Detect exposed API keysBooleantrueDiscover AWS, Google, and generic API key strings.

Note: mode=crawl requires at least one Start URLs entry. mode=search requires a Search Keyword. mode=searchAndCrawl requires at least one of the two.

Example input — Crawl mode (default, most reliable)

{
"mode": "crawl",
"startUrls": [
{ "url": "http://bbcnewsd73hkzno2ini43t4gblxvycyac5aw4gnv7t2rfl3d5jakber2iniad.onion/" }
],
"maxDepth": 1,
"maxPages": 5,
"maxItems": 5
}

Example input — Search mode

{
"mode": "search",
"search": "marketplace",
"maxDepth": 2,
"maxPages": 20,
"maxItems": 20
}

Example input — Search + crawl combined

{
"mode": "searchAndCrawl",
"search": "forum",
"startUrls": [
{ "url": "http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/" }
],
"maxDepth": 2,
"maxPages": 30
}

Output

Each crawled page produces one item in the output dataset with the following fields:

FieldTypeDescription
urlStringThe .onion page URL that was scraped
sourceUrlStringCanonical source URL (same as url; included for consistency with downstream tooling)
titleStringThe page title extracted from the HTML <title> tag
linksArrayAll links discovered on the page (both .onion and clearnet), resolved to absolute URLs
onionLinksArraySubset of links that point to .onion hosts
emailsArrayEmail addresses found on the page (false-positive domains filtered)
phonesArrayPhone numbers found on the page (short / repeated / letter-containing matches filtered)
cryptoAddressesObjectCryptocurrency wallet addresses grouped by type (bitcoin, ethereum, monero, litecoin, bitcoinCash, ripple)
miscObjectSocial media handles (twitter, instagram, telegram) and exposed apiKeys
searchKeywordFoundBooleanWhether the search keyword was found on this page (only present when a keyword was supplied)
recordTypeStringAlways "page"
scrapedAtStringUTC ISO-8601 timestamp of when the page was scraped

Empty arrays / objects / strings are never emitted — any field that would be empty is omitted entirely from the record.

Sample output

{
"url": "http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/",
"sourceUrl": "http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/",
"title": "Dark Market - Home",
"links": [
"http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/faq/",
"http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/support/",
"http://phobosxilamwcg75xt22id7aywkzol6q6rfl2flipcqoc4e4ahima5id.onion/"
],
"onionLinks": [
"http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/faq/",
"http://xjfbpuj56rdazx4iolylxplbvyft2onuerjeimlcqwaihp3s6r4xebqd.onion/support/",
"http://phobosxilamwcg75xt22id7aywkzol6q6rfl2flipcqoc4e4ahima5id.onion/"
],
"emails": ["contact@darkservice.onion"],
"phones": ["+1-555-0123"],
"cryptoAddresses": {
"bitcoin": ["1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa"],
"monero": ["4AdUndXHHZ6cfufTMvppY6JwXNouMBzSkbLYfpAV5Usx3skxNgYeYTRj5UzqtReoS44qo9mtmXCqY45DJ852K5Jv2684Rge"]
},
"misc": {
"twitter": ["@darkmarket"],
"telegram": ["@darkmarketgroup"]
},
"searchKeywordFound": true,
"recordType": "page",
"scrapedAt": "2026-06-18T12:00:00.000000+00:00"
}

How to use

  1. Add one or more .onion URLs to the Start URLs field
  2. Set Mode to crawl
  3. Set Max Crawl Depth to 0 to only scrape the provided pages, or higher to follow links
  4. Click Start

Search the dark web (best-effort)

  1. Set Mode to search
  2. Enter a keyword in Search Keyword (e.g., "marketplace", "forum", "leaked data")
  3. Set Max Crawl Depth to control how deep the crawler follows links (0 = only search results, 2+ = follow links from results)
  4. Set Max Pages to limit the crawl scope
  5. Click Start and wait for results

Combine both modes

Set Mode to searchAndCrawl with both a keyword and start URLs. The scraper merges all discovered URLs and crawls them together, removing duplicates.

Tips

  • Start with crawl modemode=crawl with a known-good .onion URL is the most reliable path. Search engines on the dark web are frequently offline.
  • Start small — Use maxDepth: 0 and maxPages: 5 for your first run to see how the actor works
  • Dark web sites are unreliable — Many .onion sites go offline frequently. If a site is unreachable, the scraper will skip it and continue with other URLs
  • Tor is slow — Connecting through the Tor network adds latency. Expect each page to take 5-30 seconds to load
  • No proxy needed — The actor bundles its own Tor daemon, so you don't need to configure any proxy or pay extra for proxy services
  • Keyword search — Use specific, relevant keywords for better results. Generic terms may return many unrelated pages
  • Toggle extractors off — Disable individual extractors (extractEmails, extractPhones, etc.) to speed up runs when you only care about one data type

Reliability

  • Crawl mode (mode=crawl) is the reliable default. The daily test run on Apify uses crawl mode against a stable .onion seed and is expected to produce at least one record.
  • Search mode (mode=search) is best-effort. Dark web search engines (Ahmia, Torch, Haystack) rotate addresses and go offline frequently. The actor tries multiple engines in order, but search mode may return zero results on any given run. This is expected and not a bug.
  • Retry behavior. Every page fetch retries up to 2 times with exponential backoff on timeout / network error. HTTP 4xx/5xx responses are logged and skipped (the crawl continues with the next URL).
  • Tor bootstrap. The actor waits up to 120 seconds for Tor to bootstrap. If bootstrap fails, the run fails with a clear status message suggesting a retry.

Data Source

This actor scrapes Tor hidden services (.onion websites) via the Tor network. Tor access is free and requires no paid API keys, no residential proxy groups, and no user-supplied credentials — the actor bundles its own Tor daemon and uses the free Apify datacenter network only for the build. This satisfies the project's zero-cost-cloud requirement: any user on the Apify free plan can run this actor with no configuration beyond the input fields.

For search mode, the actor queries the public dark web search engines Ahmia (ahmia.fi), Torch, and Haystack. These are public indexes of .onion sites; no account or API key is required.

Limitations

  • The actor can only access .onion (Tor hidden service) URLs. Regular clearnet websites are not crawled
  • Dark web search engine results depend on what has been indexed. Not all .onion sites are discoverable via search
  • Some hidden services use CAPTCHAs or anti-bot measures that may prevent scraping
  • Tor circuit establishment takes 10-30 seconds at the start of each run
  • The actor does not render JavaScript. Sites that require JavaScript for content display may return incomplete data

FAQ

Is it legal to scrape the dark web? Accessing the dark web via Tor is legal in most jurisdictions. However, the legality depends on what you do with the data. This tool is intended for security research, OSINT, and threat intelligence. Always comply with applicable laws.

Do I need a proxy to use this actor? No. The actor includes a built-in Tor daemon that handles all network routing automatically. No additional proxy configuration is needed and no paid Apify proxy groups are used.

How fast is the scraping? Tor connections are inherently slower than regular internet. Expect 5-30 seconds per page depending on the hidden service's responsiveness. A typical run with 10 pages completes in 2-5 minutes.

Why are some pages not scraped? Dark web sites have high failure rates. Sites may be temporarily offline, overloaded, or have moved to a new .onion address. The scraper will skip unreachable pages and continue with others.

Why did my search run return zero results? Dark web search engines (Ahmia, Torch, Haystack) frequently go offline or rotate their .onion addresses. The actor tries them in order and falls back gracefully, but on any given day search mode may return zero seeds. Use mode=crawl with known .onion URLs for reliable results.

What cryptocurrency addresses are detected? The scraper identifies Bitcoin (BTC), Ethereum (ETH), Monero (XMR), Litecoin (LTC), Bitcoin Cash (BCH), and Ripple (XRP) wallet addresses.

Can I scrape a specific .onion site deeply? Yes. Add the site URL to Start URLs, set Mode to crawl, set Max Crawl Depth to 3-5, and increase Max Pages to allow thorough crawling of the site's internal pages.

What happens if Tor fails to connect? The actor will wait up to 2 minutes for Tor to establish a connection. If it fails, the run will end with an error message suggesting you retry.

Why are empty fields missing from my records? The actor uses an omit-empty contract: any field that would be null, "", [], or {} is removed from the record entirely before push. This keeps the dataset clean and downstream-friendly. Only fields with real data are present.