urlscan.io Threat Intelligence Scraper avatar

urlscan.io Threat Intelligence Scraper

Pricing

from $26.62 / 1,000 results

Go to Apify Store
urlscan.io Threat Intelligence Scraper

urlscan.io Threat Intelligence Scraper

Search the urlscan.io public scan database with Lucene queries (domain, page.url, hash, IP, ASN, tag) and export scan metadata: page URL, IP, ASN, server, TLS, screenshot, redirect chain, country, brand, verdict.

Pricing

from $26.62 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

ParseForge Banner

๐Ÿ›ก๏ธ urlscan.io Threat Intelligence Scraper

๐Ÿš€ Export urlscan.io scan results in seconds. Run Lucene-style queries across the public urlscan.io scan database and pull back domain, IP, ASN, TLS, brand, verdict, and screenshot metadata. No API key, no rate-limit dance, no manual JSON parsing.

๐Ÿ•’ Last updated: 2026-05-13 ยท ๐Ÿ“Š 31 fields per record ยท ๐Ÿ›ก๏ธ Phishing + malware feed ยท ๐ŸŒ Any domain, IP, ASN, or tag

The urlscan.io Threat Intelligence Scraper queries the urlscan.io public search API with full Lucene syntax (domain:, page.url:, task.tags:phishing, page.asn:, brand.name:, verdicts.overall.malicious:true, plus AND, OR, NOT, wildcards, and date ranges) and returns one row per scan. Each row carries the page URL, apex domain, IP, ASN, server software, TLS issuer, redirect chain, country, page title, request count, brand attribution, and the malicious verdict score, plus links to the rendered screenshot and the full urlscan report.

Coverage spans the entire urlscan.io public corpus, which adds millions of new scans every week across phishing kits, brand impersonation, malware C2s, fast-flux infrastructure, and regular web pages. Every field maps directly to the upstream API so you can join scans to your own SIEM, takedown queue, or brand-protection workflow.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Threat intel teams, SOC analysts, brand-protection engineers, takedown vendors, anti-phishing researchers, OSINT investigatorsPhishing kit discovery, brand impersonation monitoring, IP / ASN attribution, malware infrastructure mapping, screenshot enrichment, indicator-of-compromise hunting

๐Ÿ“‹ What the urlscan.io Scraper does

Five intel workflows in one Actor:

  • ๐ŸŽฃ Phishing discovery. Pull every scan tagged phishing for a brand or apex domain.
  • ๐Ÿข Brand impersonation monitoring. Watch brand.name:<your-brand> across the global scan feed.
  • ๐ŸŒ Infrastructure attribution. Pivot on page.ip:, page.asn:, or page.server: to map hosting clusters.
  • ๐Ÿ” Redirect-chain analysis. Trace landing-page redirects and final URLs across recent scans.
  • ๐Ÿ–ผ๏ธ Visual enrichment. Every record links to a public urlscan screenshot and the full result report.

Each scan record carries scan metadata (UUID, visibility, method, time, tags), page facts (URL, domain, apex, country, IP, ASN, server, status, title, MIME), TLS context (issuer, valid days), traffic stats (unique IPs, unique countries, request count, data length), brand attribution, and the urlscan verdict (score, malicious flag, categories), plus deep links to the screenshot and report page.

๐Ÿ’ก Why it matters: brand-protection and SOC teams burn hours stitching together phishing kit pivots from raw urlscan JSON. This Actor flattens the response into a spreadsheet-ready table so triage, takedown filings, and dashboards land in one query.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing a phishing query, pivot to ASN, and Slack alert.


โš™๏ธ Input

InputTypeDefaultBehavior
querystring"domain:apify.com"Lucene query. Required. Supports domain:, page.url:, page.ip:, page.asn:, task.tags:, brand.name:, verdicts.overall.malicious:true, hash:, filename:, plus AND, OR, NOT, wildcards, and date ranges.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
pageSizeinteger100Results per API request. Lower values are friendlier to free-tier rate limits.

Example: every phishing scan against PayPal in the last seven days.

{
"query": "page.domain:paypal.com AND task.tags:phishing AND date:>now-7d",
"maxItems": 500
}

Example: malicious verdicts hosted on a specific ASN.

{
"query": "page.asn:AS139341 AND verdicts.overall.malicious:true",
"maxItems": 200,
"pageSize": 100
}

โš ๏ธ Good to Know: urlscan.io rate-limits anonymous search and may return partial results for very broad queries. Narrow with date:>now-30d or an apex domain when running bulk pulls, and keep pageSize modest on the free tier.


๐Ÿ“Š Output

Each scan record carries 31 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” uuidstring"019e2370-d463-72c7-a1ef-3f07c7db0e75"
๐Ÿ”— task_urlstring"https://classai-jdssb5uo04.edgeone.dev/"
๐Ÿ‘๏ธ task_visibilitystring"public"
๐Ÿ› ๏ธ task_methodstring"api"
๐Ÿ•’ task_timeISO 8601"2026-05-13T22:24:25.440Z"
๐Ÿท๏ธ task_tagsstring[]["phishing","malicious"]
๐ŸŒ page_urlstring"https://classai-jdssb5uo04.edgeone.dev/"
๐ŸŒ page_domainstring"classai-jdssb5uo04.edgeone.dev"
๐Ÿชช page_apex_domainstring"edgeone.dev"
๐Ÿณ๏ธ page_countrystring"SG"
๐Ÿ–ฅ๏ธ page_serverstring"edgeone-pages"
๐Ÿ“ก page_ipstring"43.174.247.29"
๐Ÿ›ฐ๏ธ page_asnstring"AS139341"
๐Ÿข page_asn_namestring"ACE-AS-AP ACE, SG"
๐Ÿชž page_ptrstring | nullnull
๐Ÿ“Ÿ page_statusstring"200"
๐Ÿ” page_tlsValidDaysnumber364
๐Ÿท๏ธ page_tlsIssuerstring"DigiCert Secure Site OV G2 TLS CN RSA4096 SHA256 2022 CA1"
๐Ÿ” page_redirectedstring | nullnull
๐Ÿ“ฐ page_titlestring"ๆฌข่ฟŽๆฅๅˆฐไฟกๆฏ็ง‘ๆŠ€ๅฎ‡ๅฎ™"
๐Ÿ“„ page_mime_typestring"text/html"
๐ŸŒ page_languagestring | nullnull
๐Ÿ“… domain_age_daysnumber1273
๐ŸŒ unique_ipsnumber1
๐Ÿ—บ๏ธ unique_countriesnumber1
๐Ÿ“Š request_countnumber2
๐Ÿ“ฆ data_lengthnumber10882
๐Ÿท๏ธ brand_namestring"PayPal"
๐Ÿšจ verdict_scorenumber100
โš ๏ธ verdicts_overall_maliciousbooleantrue
๐Ÿ–ผ๏ธ screenshotstring"https://urlscan.io/screenshots/<uuid>.png"
๐Ÿ“„ report_urlstring"https://urlscan.io/result/<uuid>/"
๐Ÿ•’ scrapedAtISO 8601"2026-05-13T22:25:22.027Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐Ÿ›ก๏ธLucene-native search. Every urlscan search operator works as-is: domain:, page.ip:, task.tags:, brand.name:, verdicts.overall.malicious:true, date ranges, wildcards, boolean logic.
๐ŸŒPublic corpus. Searches the global pool of public scans contributed by the urlscan community and automated submitters.
๐Ÿ–ผ๏ธScreenshot + report links. Every record points at the rendered PNG and the full urlscan report page for analyst review.
๐ŸŽฏBrand & verdict attribution. Includes urlscan's own brand match, verdict score, and malicious flag where present.
โšกFast pagination. Server-side search_after cursor walks the full result set without timing out.
๐ŸšซNo API key required. Uses the public search endpoint. Plug it in and run.
๐Ÿ”Always fresh. Every run hits the live urlscan index.

๐Ÿ“Š The urlscan.io public corpus is one of the most cited threat-intel data sources in modern SOC tooling, takedown vendor pipelines, and brand-protection products.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ urlscan.io Scraper (this Actor)$5 free credit, then pay-per-usePublic urlscan corpusLive per runFull Lucene syntaxโšก 2 min
urlscan PRO subscription$200+/month per seatPublic + privateLiveFull Lucene๐Ÿข Vendor onboarding
Build your own integrationEngineering timeSameSameSame๐Ÿ•’ Days
Commercial brand-protection suite$$$CuratedHourlyVendor-definedโณ Weeks

Pick this Actor when you want urlscan firepower without the seat licenses or the parser code.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the urlscan.io Threat Intelligence Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set the query. Try domain:yourbrand.com AND task.tags:phishing and set maxItems.
  4. ๐Ÿš€ Run it. Click Start and let the Actor walk the search index.
  5. ๐Ÿ“ฅ Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to a phishing feed export: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐ŸŽฃ Brand Protection & Anti-Phishing

  • Daily brand.name:<yourbrand> pulls feeding a takedown queue
  • Phishing kit fingerprinting via hash: and filename:
  • Lookalike-domain monitoring with apex wildcards
  • Screenshot evidence packs for legal filings

๐Ÿ•ต๏ธ SOC & Threat Intel

  • IOC enrichment with IP, ASN, server, and verdict fields
  • Pivot from a single phish to the entire C2 cluster
  • Watchlists for high-risk ASNs and fast-flux ranges
  • Daily exports into MISP, OpenCTI, or Splunk

๐Ÿข Marketplace & Platform Trust

  • Detect impersonation of your sellers, creators, or merchants
  • Catch fake login pages targeting your customers
  • Monitor counterfeit storefronts and clone sites
  • Feed risk scores into payment and onboarding flows

๐Ÿ“ฐ Investigative Journalism & OSINT

  • Map infrastructure behind disinformation campaigns
  • Document phishing waves around elections or breaches
  • Pivot from one screenshot to a network of related scans
  • Build evidence dossiers with linkable urlscan report URLs

๐Ÿ”Œ Automating urlscan.io Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly phishing sweeps, daily brand watches, and weekly ASN audits keep your downstream SIEM, takedown vendor, or Slack channel in sync.


๐ŸŒŸ Beyond business use cases

Threat intel data feeds far more than commercial SOCs. The same structured records support research, civic transparency, and personal security projects.

๐ŸŽ“ Research and academia

  • Phishing-kit ecosystem studies and longitudinal analyses
  • TLS issuer and ASN reputation papers
  • Reproducible datasets for security ML classifiers
  • Coursework on indicator-of-compromise pivoting

๐ŸŽจ Personal and creative

  • Hobbyist scam-spotting blogs and Mastodon feeds
  • Personal early-warning systems for your own apex domain
  • Visualizations of phishing campaigns over time
  • Portfolio dashboards for security analysts

๐Ÿค Non-profit and civic

  • Tracking scams that target vulnerable populations
  • NGO digital-safety operations for activists and journalists
  • Civic transparency on hosting providers harboring abuse
  • Election integrity monitoring of fake-vote sites

๐Ÿงช Experimentation

  • Train phishing-classifier ML models on real labels
  • Benchmark detection engines against fresh scans
  • Prototype agent workflows that triage IOCs end-to-end
  • Stress-test takedown automations with live feeds

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Drop a Lucene query into the input form, click Start, and the Actor walks the urlscan.io public search API with a cursor-based pager. Each scan is flattened into 31 columns covering page, network, TLS, brand, and verdict data, plus links to the screenshot and the full report.

๐Ÿ” What query syntax can I use?

Anything that works in urlscan's own search bar. Common fields: domain:, page.url:, page.domain:, page.ip:, page.asn:, page.country:, task.tags:, brand.name:, verdicts.overall.malicious:true, hash:, filename:, plus AND, OR, NOT, wildcards, and date ranges like date:>now-7d.

๐Ÿ“ How accurate is the data?

Every field maps to a urlscan.io public API response. urlscan is widely cited across SOC and brand-protection tooling, though tags and verdicts are crowd plus heuristic in origin. Treat verdicts as one input among several when making takedown decisions.

๐Ÿ” How fresh is the data?

Every run hits the live urlscan index, so results reflect scans submitted up to the moment the run started.

๐Ÿšซ Do I need a urlscan API key?

No. This Actor uses the public search endpoint. For very high-volume use cases consider a urlscan PRO subscription on top of this Actor.

โฐ Can I schedule daily phishing sweeps?

Yes. Use Apify Schedules to trigger the Actor on any cron interval and pipe results into Slack, email, a webhook, or your warehouse.

๐Ÿ–ผ๏ธ Are screenshots included?

Yes. Every record includes a public screenshot URL and the full urlscan report URL.

urlscan.io publishes scan results publicly. Use the data in line with urlscan's terms and your local regulations. For takedowns and legal filings, follow standard evidence-handling practices.

๐Ÿ’ณ Do I need a paid Apify plan?

No. The free plan covers small runs (10 records). A paid plan unlocks higher limits, scheduling, and concurrency.

๐Ÿ†˜ What if I need help?

Reach out via the contact form below to request a custom intel pipeline, a private workflow, or a feature.


๐Ÿ”Œ Integrate with any app

urlscan.io Threat Intelligence Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step phishing workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get phishing alerts in your channels
  • Airbyte - Pipe scans into your warehouse
  • GitHub - Trigger runs from commits or issues
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push new phishing scans into your takedown queue or alert your SOC in Slack.


๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more reference-data and intel scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by urlscan.io GmbH or any of its partners. All trademarks mentioned are the property of their respective owners. Only publicly available scan data from the urlscan.io public search API is collected.