# urlscan.io Threat Intelligence Scraper (`parseforge/urlscan-scraper`) Actor

Search the urlscan.io public scan database with Lucene queries (domain, page.url, hash, IP, ASN, tag) and export scan metadata: page URL, IP, ASN, server, TLS, screenshot, redirect chain, country, brand, verdict.

- **URL**: https://apify.com/parseforge/urlscan-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Developer tools, News, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $26.62 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/raw/main/banner.jpg)

## 🛡️ urlscan.io Threat Intelligence Scraper

> 🚀 **Export urlscan.io scan results in seconds.** Run **Lucene-style queries** across the public urlscan.io scan database and pull back domain, IP, ASN, TLS, brand, verdict, and screenshot metadata. No API key, no rate-limit dance, no manual JSON parsing.

> 🕒 **Last updated:** 2026-05-13 · **📊 31 fields** per record · **🛡️ Phishing + malware feed** · **🌐 Any domain, IP, ASN, or tag**

The **urlscan.io Threat Intelligence Scraper** queries the urlscan.io public search API with full Lucene syntax (`domain:`, `page.url:`, `task.tags:phishing`, `page.asn:`, `brand.name:`, `verdicts.overall.malicious:true`, plus `AND`, `OR`, `NOT`, wildcards, and date ranges) and returns one row per scan. Each row carries the page URL, apex domain, IP, ASN, server software, TLS issuer, redirect chain, country, page title, request count, brand attribution, and the malicious verdict score, plus links to the rendered screenshot and the full urlscan report.

Coverage spans the entire urlscan.io public corpus, which adds **millions of new scans every week** across phishing kits, brand impersonation, malware C2s, fast-flux infrastructure, and regular web pages. Every field maps directly to the upstream API so you can join scans to your own SIEM, takedown queue, or brand-protection workflow.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Threat intel teams, SOC analysts, brand-protection engineers, takedown vendors, anti-phishing researchers, OSINT investigators | Phishing kit discovery, brand impersonation monitoring, IP / ASN attribution, malware infrastructure mapping, screenshot enrichment, indicator-of-compromise hunting |

---

### 📋 What the urlscan.io Scraper does

Five intel workflows in one Actor:

- 🎣 **Phishing discovery.** Pull every scan tagged `phishing` for a brand or apex domain.
- 🏢 **Brand impersonation monitoring.** Watch `brand.name:<your-brand>` across the global scan feed.
- 🌐 **Infrastructure attribution.** Pivot on `page.ip:`, `page.asn:`, or `page.server:` to map hosting clusters.
- 🔁 **Redirect-chain analysis.** Trace landing-page redirects and final URLs across recent scans.
- 🖼️ **Visual enrichment.** Every record links to a public urlscan screenshot and the full result report.

Each scan record carries scan metadata (UUID, visibility, method, time, tags), page facts (URL, domain, apex, country, IP, ASN, server, status, title, MIME), TLS context (issuer, valid days), traffic stats (unique IPs, unique countries, request count, data length), brand attribution, and the urlscan verdict (score, malicious flag, categories), plus deep links to the screenshot and report page.

> 💡 **Why it matters:** brand-protection and SOC teams burn hours stitching together phishing kit pivots from raw urlscan JSON. This Actor flattens the response into a spreadsheet-ready table so triage, takedown filings, and dashboards land in one query.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing a phishing query, pivot to ASN, and Slack alert._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>query</code></td><td>string</td><td><code>"domain:apify.com"</code></td><td>Lucene query. Required. Supports <code>domain:</code>, <code>page.url:</code>, <code>page.ip:</code>, <code>page.asn:</code>, <code>task.tags:</code>, <code>brand.name:</code>, <code>verdicts.overall.malicious:true</code>, <code>hash:</code>, <code>filename:</code>, plus <code>AND</code>, <code>OR</code>, <code>NOT</code>, wildcards, and date ranges.</td></tr>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Records to return. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td><code>pageSize</code></td><td>integer</td><td><code>100</code></td><td>Results per API request. Lower values are friendlier to free-tier rate limits.</td></tr>
</tbody>
</table>

**Example: every phishing scan against PayPal in the last seven days.**

```json
{
    "query": "page.domain:paypal.com AND task.tags:phishing AND date:>now-7d",
    "maxItems": 500
}
````

**Example: malicious verdicts hosted on a specific ASN.**

```json
{
    "query": "page.asn:AS139341 AND verdicts.overall.malicious:true",
    "maxItems": 200,
    "pageSize": 100
}
```

> ⚠️ **Good to Know:** urlscan.io rate-limits anonymous search and may return partial results for very broad queries. Narrow with `date:>now-30d` or an apex domain when running bulk pulls, and keep `pageSize` modest on the free tier.

***

### 📊 Output

Each scan record carries **31 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🆔 `uuid` | string | `"019e2370-d463-72c7-a1ef-3f07c7db0e75"` |
| 🔗 `task_url` | string | `"https://classai-jdssb5uo04.edgeone.dev/"` |
| 👁️ `task_visibility` | string | `"public"` |
| 🛠️ `task_method` | string | `"api"` |
| 🕒 `task_time` | ISO 8601 | `"2026-05-13T22:24:25.440Z"` |
| 🏷️ `task_tags` | string\[] | `["phishing","malicious"]` |
| 🌐 `page_url` | string | `"https://classai-jdssb5uo04.edgeone.dev/"` |
| 🌍 `page_domain` | string | `"classai-jdssb5uo04.edgeone.dev"` |
| 🪪 `page_apex_domain` | string | `"edgeone.dev"` |
| 🏳️ `page_country` | string | `"SG"` |
| 🖥️ `page_server` | string | `"edgeone-pages"` |
| 📡 `page_ip` | string | `"43.174.247.29"` |
| 🛰️ `page_asn` | string | `"AS139341"` |
| 🏢 `page_asn_name` | string | `"ACE-AS-AP ACE, SG"` |
| 🪞 `page_ptr` | string | null | `null` |
| 📟 `page_status` | string | `"200"` |
| 🔐 `page_tlsValidDays` | number | `364` |
| 🏷️ `page_tlsIssuer` | string | `"DigiCert Secure Site OV G2 TLS CN RSA4096 SHA256 2022 CA1"` |
| 🔁 `page_redirected` | string | null | `null` |
| 📰 `page_title` | string | `"欢迎来到信息科技宇宙"` |
| 📄 `page_mime_type` | string | `"text/html"` |
| 🌐 `page_language` | string | null | `null` |
| 📅 `domain_age_days` | number | `1273` |
| 🌐 `unique_ips` | number | `1` |
| 🗺️ `unique_countries` | number | `1` |
| 📊 `request_count` | number | `2` |
| 📦 `data_length` | number | `10882` |
| 🏷️ `brand_name` | string | `"PayPal"` |
| 🚨 `verdict_score` | number | `100` |
| ⚠️ `verdicts_overall_malicious` | boolean | `true` |
| 🖼️ `screenshot` | string | `"https://urlscan.io/screenshots/<uuid>.png"` |
| 📄 `report_url` | string | `"https://urlscan.io/result/<uuid>/"` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-05-13T22:25:22.027Z"` |

#### 📦 Sample records

<details>
<summary><strong>🎣 Phishing scan tagged by the urlscan community</strong></summary>

```json
{
    "uuid": "019e2370-d463-72c7-a1ef-3f07c7db0e75",
    "task_url": "https://classai-jdssb5uo04.edgeone.dev/",
    "task_visibility": "public",
    "task_method": "api",
    "task_time": "2026-05-13T22:24:25.440Z",
    "task_tags": ["phishing", "malicious"],
    "page_url": "https://classai-jdssb5uo04.edgeone.dev/",
    "page_domain": "classai-jdssb5uo04.edgeone.dev",
    "page_apex_domain": "edgeone.dev",
    "page_country": "SG",
    "page_server": "edgeone-pages",
    "page_ip": "43.174.247.29",
    "page_asn": "AS139341",
    "page_asn_name": "ACE-AS-AP ACE, SG",
    "page_status": "200",
    "page_tlsValidDays": 364,
    "page_tlsIssuer": "DigiCert Secure Site OV G2 TLS CN RSA4096 SHA256 2022 CA1",
    "page_title": "欢迎来到信息科技宇宙",
    "page_mime_type": "text/html",
    "domain_age_days": 1273,
    "unique_ips": 1,
    "unique_countries": 1,
    "request_count": 2,
    "data_length": 10882,
    "screenshot": "https://urlscan.io/screenshots/019e2370-d463-72c7-a1ef-3f07c7db0e75.png",
    "report_url": "https://urlscan.io/result/019e2370-d463-72c7-a1ef-3f07c7db0e75/",
    "scrapedAt": "2026-05-13T22:25:22.027Z"
}
```

</details>

<details>
<summary><strong>🏢 Benign brand scan (corporate site)</strong></summary>

```json
{
    "uuid": "019e2310-aabb-72c7-9c01-feedface0001",
    "task_url": "https://apify.com/",
    "task_visibility": "public",
    "task_method": "automatic",
    "task_time": "2026-05-13T20:11:02.000Z",
    "page_url": "https://apify.com/",
    "page_domain": "apify.com",
    "page_apex_domain": "apify.com",
    "page_country": "US",
    "page_server": "cloudflare",
    "page_ip": "104.21.32.10",
    "page_asn": "AS13335",
    "page_asn_name": "CLOUDFLARENET, US",
    "page_status": "200",
    "page_tlsIssuer": "Google Trust Services",
    "page_title": "Apify · Full-stack web scraping & automation",
    "request_count": 84,
    "screenshot": "https://urlscan.io/screenshots/019e2310-aabb-72c7-9c01-feedface0001.png",
    "report_url": "https://urlscan.io/result/019e2310-aabb-72c7-9c01-feedface0001/",
    "scrapedAt": "2026-05-13T22:25:22.027Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🛡️ | **Lucene-native search.** Every urlscan search operator works as-is: `domain:`, `page.ip:`, `task.tags:`, `brand.name:`, `verdicts.overall.malicious:true`, date ranges, wildcards, boolean logic. |
| 🌐 | **Public corpus.** Searches the global pool of public scans contributed by the urlscan community and automated submitters. |
| 🖼️ | **Screenshot + report links.** Every record points at the rendered PNG and the full urlscan report page for analyst review. |
| 🎯 | **Brand & verdict attribution.** Includes urlscan's own brand match, verdict score, and malicious flag where present. |
| ⚡ | **Fast pagination.** Server-side `search_after` cursor walks the full result set without timing out. |
| 🚫 | **No API key required.** Uses the public search endpoint. Plug it in and run. |
| 🔁 | **Always fresh.** Every run hits the live urlscan index. |

> 📊 The urlscan.io public corpus is one of the most cited threat-intel data sources in modern SOC tooling, takedown vendor pipelines, and brand-protection products.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ urlscan.io Scraper** *(this Actor)* | $5 free credit, then pay-per-use | Public urlscan corpus | **Live per run** | Full Lucene syntax | ⚡ 2 min |
| urlscan PRO subscription | $200+/month per seat | Public + private | Live | Full Lucene | 🐢 Vendor onboarding |
| Build your own integration | Engineering time | Same | Same | Same | 🕒 Days |
| Commercial brand-protection suite | $$$ | Curated | Hourly | Vendor-defined | ⏳ Weeks |

Pick this Actor when you want urlscan firepower without the seat licenses or the parser code.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the urlscan.io Threat Intelligence Scraper page on the Apify Store.
3. 🎯 **Set the query.** Try `domain:yourbrand.com AND task.tags:phishing` and set `maxItems`.
4. 🚀 **Run it.** Click **Start** and let the Actor walk the search index.
5. 📥 **Download.** Grab results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to a phishing feed export: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 🎣 Brand Protection & Anti-Phishing

- Daily `brand.name:<yourbrand>` pulls feeding a takedown queue
- Phishing kit fingerprinting via `hash:` and `filename:`
- Lookalike-domain monitoring with apex wildcards
- Screenshot evidence packs for legal filings

</td>
<td width="50%" valign="top">

#### 🕵️ SOC & Threat Intel

- IOC enrichment with IP, ASN, server, and verdict fields
- Pivot from a single phish to the entire C2 cluster
- Watchlists for high-risk ASNs and fast-flux ranges
- Daily exports into MISP, OpenCTI, or Splunk

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🏢 Marketplace & Platform Trust

- Detect impersonation of your sellers, creators, or merchants
- Catch fake login pages targeting your customers
- Monitor counterfeit storefronts and clone sites
- Feed risk scores into payment and onboarding flows

</td>
<td width="50%" valign="top">

#### 📰 Investigative Journalism & OSINT

- Map infrastructure behind disinformation campaigns
- Document phishing waves around elections or breaches
- Pivot from one screenshot to a network of related scans
- Build evidence dossiers with linkable urlscan report URLs

</td>
</tr>
</table>

***

### 🔌 Automating urlscan.io Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Hourly phishing sweeps, daily brand watches, and weekly ASN audits keep your downstream SIEM, takedown vendor, or Slack channel in sync.

***

### 🌟 Beyond business use cases

Threat intel data feeds far more than commercial SOCs. The same structured records support research, civic transparency, and personal security projects.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Phishing-kit ecosystem studies and longitudinal analyses
- TLS issuer and ASN reputation papers
- Reproducible datasets for security ML classifiers
- Coursework on indicator-of-compromise pivoting

</td>
<td width="50%">

#### 🎨 Personal and creative

- Hobbyist scam-spotting blogs and Mastodon feeds
- Personal early-warning systems for your own apex domain
- Visualizations of phishing campaigns over time
- Portfolio dashboards for security analysts

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Tracking scams that target vulnerable populations
- NGO digital-safety operations for activists and journalists
- Civic transparency on hosting providers harboring abuse
- Election integrity monitoring of fake-vote sites

</td>
<td width="50%">

#### 🧪 Experimentation

- Train phishing-classifier ML models on real labels
- Benchmark detection engines against fresh scans
- Prototype agent workflows that triage IOCs end-to-end
- Stress-test takedown automations with live feeds

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20urlscan.io%20Threat%20Intelligence%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20Lucene%20query%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20SOC%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20urlscan.io%20Threat%20Intelligence%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20Lucene%20query%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20SOC%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20urlscan.io%20Threat%20Intelligence%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20Lucene%20query%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20SOC%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20urlscan.io%20Threat%20Intelligence%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20Lucene%20query%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20SOC%20workflow.)

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Drop a Lucene query into the input form, click Start, and the Actor walks the urlscan.io public search API with a cursor-based pager. Each scan is flattened into 31 columns covering page, network, TLS, brand, and verdict data, plus links to the screenshot and the full report.

#### 🔍 What query syntax can I use?

Anything that works in urlscan's own search bar. Common fields: `domain:`, `page.url:`, `page.domain:`, `page.ip:`, `page.asn:`, `page.country:`, `task.tags:`, `brand.name:`, `verdicts.overall.malicious:true`, `hash:`, `filename:`, plus `AND`, `OR`, `NOT`, wildcards, and date ranges like `date:>now-7d`.

#### 📏 How accurate is the data?

Every field maps to a urlscan.io public API response. urlscan is widely cited across SOC and brand-protection tooling, though tags and verdicts are crowd plus heuristic in origin. Treat verdicts as one input among several when making takedown decisions.

#### 🔁 How fresh is the data?

Every run hits the live urlscan index, so results reflect scans submitted up to the moment the run started.

#### 🚫 Do I need a urlscan API key?

No. This Actor uses the public search endpoint. For very high-volume use cases consider a urlscan PRO subscription on top of this Actor.

#### ⏰ Can I schedule daily phishing sweeps?

Yes. Use Apify Schedules to trigger the Actor on any cron interval and pipe results into Slack, email, a webhook, or your warehouse.

#### 🖼️ Are screenshots included?

Yes. Every record includes a public screenshot URL and the full urlscan report URL.

#### ⚖️ Is this data legal to use?

urlscan.io publishes scan results publicly. Use the data in line with urlscan's terms and your local regulations. For takedowns and legal filings, follow standard evidence-handling practices.

#### 💳 Do I need a paid Apify plan?

No. The free plan covers small runs (10 records). A paid plan unlocks higher limits, scheduling, and concurrency.

#### 🆘 What if I need help?

Reach out via the contact form below to request a custom intel pipeline, a private workflow, or a feature.

***

### 🔌 Integrate with any app

urlscan.io Threat Intelligence Scraper connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step phishing workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get phishing alerts in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe scans into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits or issues
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push new phishing scans into your takedown queue or alert your SOC in Slack.

***

### 🔗 Recommended Actors

- [**🌐 RDAP Domain Lookup Scraper**](https://apify.com/parseforge/rdap-domain-lookup-scraper) - Modern WHOIS replacement via the RDAP protocol
- [**🏢 GSA eLibrary Scraper**](https://apify.com/parseforge/gsa-elibrary-scraper) - U.S. federal contract vendor and price data
- [**🏗️ Hubspot Marketplace Scraper**](https://apify.com/parseforge/hubspot-marketplace-scraper) - Marketplace app and integration catalog
- [**📰 PR Newswire Scraper**](https://apify.com/parseforge/pr-newswire-scraper) - Press release feed with publish dates
- [**🤗 Hugging Face Model Scraper**](https://apify.com/parseforge/hugging-face-model-scraper) - AI model registry metadata

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data and intel scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by urlscan.io GmbH or any of its partners. All trademarks mentioned are the property of their respective owners. Only publicly available scan data from the urlscan.io public search API is collected.

# Actor input Schema

## `query` (type: `string`):

urlscan.io Lucene query. Examples:

- domain:example.com — all scans for a domain
- page.url:google.com — match against the final URL
- page.domain:paypal.com AND task.tags:phishing — phishing scans on paypal
- page.ip:8.8.8.8 — scans hitting an IP
- page.asn:AS15169 — scans on Google ASN
- hash:sha256... — scans containing a known JS/file hash
- filename:wp-login.php — scans referencing a filename
- task.tags:malware — tagged as malware
- page.country:RU AND date:>now-7d — Russian-hosted, last 7 days
  See https://urlscan.io/docs/search/ for the full syntax (supports AND, OR, NOT, wildcards, ranges).

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `pageSize` (type: `integer`):

Results per API request (urlscan max is 10000, default 100). Lower values are friendlier to free tier rate limits.

## Actor input object example

```json
{
  "query": "domain:apify.com",
  "maxItems": 10,
  "pageSize": 100
}
```

# Actor output Schema

## `overview` (type: `string`):

Overview of scraped data

## `fullData` (type: `string`):

Complete dataset

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "domain:apify.com",
    "maxItems": 10,
    "pageSize": 100
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/urlscan-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "domain:apify.com",
    "maxItems": 10,
    "pageSize": 100,
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/urlscan-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "domain:apify.com",
  "maxItems": 10,
  "pageSize": 100
}' |
apify call parseforge/urlscan-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/urlscan-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "urlscan.io Threat Intelligence Scraper",
        "description": "Search the urlscan.io public scan database with Lucene queries (domain, page.url, hash, IP, ASN, tag) and export scan metadata: page URL, IP, ASN, server, TLS, screenshot, redirect chain, country, brand, verdict.",
        "version": "0.0",
        "x-build-id": "XDVCK1BJ3fW8fylg2"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~urlscan-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-urlscan-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~urlscan-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-urlscan-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~urlscan-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-urlscan-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "query"
                ],
                "properties": {
                    "query": {
                        "title": "Lucene Search Query",
                        "type": "string",
                        "description": "urlscan.io Lucene query. Examples:\n  - domain:example.com — all scans for a domain\n  - page.url:google.com — match against the final URL\n  - page.domain:paypal.com AND task.tags:phishing — phishing scans on paypal\n  - page.ip:8.8.8.8 — scans hitting an IP\n  - page.asn:AS15169 — scans on Google ASN\n  - hash:sha256... — scans containing a known JS/file hash\n  - filename:wp-login.php — scans referencing a filename\n  - task.tags:malware — tagged as malware\n  - page.country:RU AND date:>now-7d — Russian-hosted, last 7 days\nSee https://urlscan.io/docs/search/ for the full syntax (supports AND, OR, NOT, wildcards, ranges)."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "pageSize": {
                        "title": "Page Size",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Results per API request (urlscan max is 10000, default 100). Lower values are friendlier to free tier rate limits.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
