Have I Been Pwned Breaches Catalog Scraper avatar

Have I Been Pwned Breaches Catalog Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Have I Been Pwned Breaches Catalog Scraper

Have I Been Pwned Breaches Catalog Scraper

Pull the entire Have I Been Pwned breach catalog with company logos, breach dates, account counts, and the categories of data exposed like email addresses, passwords, and IP addresses. Filter by domain or fetch one breach by name. Built for breach awareness and security research.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🛡 Have I Been Pwned Breaches Catalog Scraper

🚀 Export the full Have I Been Pwned breach catalog in one run. Pull all 1,001+ documented data breaches with company logos, breach dates, account counts, and the exact categories of data exposed.

🕒 Last updated: 2026-06-04 · 📊 18 fields per record · 1,001+ breaches in the catalog · keyless public breach data

Have I Been Pwned (HIBP) is the most widely trusted public registry of known data breaches, maintained by security researcher Troy Hunt. This Actor reads the public breach catalog and turns it into clean, structured records you can analyze, monitor, and feed into your own security tooling. It is built for defensive security work and breach awareness, not for looking up individual people.

Coverage: every breach in the public HIBP catalog (over 1,001 entries at last refresh), including the company logo, the date the breach occurred, when it was added to HIBP, the number of accounts affected, a full description, and the list of data classes exposed such as email addresses, passwords, IP addresses, and phone numbers. You can list the entire catalog, filter by a single domain, or pull one breach by name.

🎯 Target Audience💡 Primary Use Cases
Security analysts and blue teamsTrack new and historical breaches affecting your domains
Threat intelligence researchersBuild a structured breach dataset for analysis
Compliance and risk teamsEvidence gathering for vendor and third party risk reviews
Developers and data engineersPower dashboards, alerts, and awareness tooling

📋 What the HIBP Breaches Catalog Scraper does

This Actor connects to the public Have I Been Pwned breach catalog and returns structured breach records. It supports three modes:

  • All breaches lists every breach in the catalog.
  • Breaches by domain returns only the breaches tied to a domain you provide, for example adobe.com.
  • Single breach by name fetches one breach by its HIBP name, for example Adobe or LinkedIn.

Each record leads with the breached company logo and includes the breach date, the number of accounts affected, a description, and the categories of data exposed. The Actor only reads the breach catalog, which is public and keyless. It never looks up individual email addresses or accounts.

🎬 Full Demo (🚧 Coming soon)

⚙️ Input

FieldTypeDescription
modeselectall lists every breach, domain filters by a domain, breach fetches a single breach by name. Default all.
domainstringUsed with domain mode. A domain to filter by, for example adobe.com.
breachNamestringUsed with breach mode. The exact HIBP breach name, for example Adobe.
maxItemsintegerMaximum records to return. Free users are capped at 10.

Example 1. List the catalog (default).

{
"mode": "all",
"maxItems": 50
}

Example 2. Breaches for a single domain.

{
"mode": "domain",
"domain": "adobe.com"
}

⚠️ Good to Know: the domain field is only applied in domain mode and the breachName field is only applied in breach mode. A small number of catalog entries (around 5 percent) have no associated domain, so the domain field can be empty for those records. This is expected and reflects the source data.

📊 Output

FieldDescription
🖼 imageUrlFull URL of the breached company logo
📌 titleDisplay title of the breach
🏷 nameStable HIBP breach name used for single lookups
🌐 domainPrimary domain associated with the breach (can be empty)
📅 breachDateDate the breach is believed to have occurred
addedDateDate the breach was added to HIBP
✏️ modifiedDateDate the breach record was last modified
👥 pwnCountNumber of accounts affected
📝 descriptionFull description of the breach
🗂 dataClassesArray of data categories exposed, such as email addresses and passwords
isVerifiedWhether the breach has been verified
🧪 isFabricatedWhether the breach is considered fabricated
🔒 isSensitiveWhether the breach is flagged sensitive
📦 isRetiredWhether the breach has been retired
📨 isSpamListWhether the entry is a spam list
🦠 isMalwareWhether the entry relates to malware
🆓 isSubscriptionFreeWhether the breach is searchable without a subscription
🕒 scrapedAtTimestamp when the record was collected
errorError message if a record failed, otherwise null

Real sample records from a live run:

{
"imageUrl": "https://logos.haveibeenpwned.com/000webhost.png",
"title": "000webhost",
"name": "000webhost",
"domain": "000webhost.com",
"breachDate": "2015-03-01",
"addedDate": "2015-10-26T23:35:45Z",
"modifiedDate": "2017-12-10T21:44:27Z",
"pwnCount": 14936670,
"dataClasses": ["Email addresses", "IP addresses", "Names", "Passwords"],
"isVerified": true,
"scrapedAt": "2026-06-04T19:46:11.282Z",
"error": null
}
{
"imageUrl": "https://logos.haveibeenpwned.com/123RF.png",
"title": "123RF",
"name": "123RF",
"domain": "123rf.com",
"breachDate": "2020-03-22",
"addedDate": "2020-11-15T00:59:50Z",
"modifiedDate": "2020-11-15T01:07:10Z",
"pwnCount": 8661578,
"dataClasses": ["Email addresses", "IP addresses", "Names", "Passwords", "Phone numbers", "Physical addresses", "Usernames"],
"isVerified": true,
"scrapedAt": "2026-06-04T19:46:11.427Z",
"error": null
}
{
"imageUrl": "https://logos.haveibeenpwned.com/126.png",
"title": "126",
"name": "126",
"domain": "126.com",
"breachDate": "2012-01-01",
"addedDate": "2016-10-08T07:46:05Z",
"modifiedDate": "2016-10-08T07:46:05Z",
"pwnCount": 6414191,
"dataClasses": ["Email addresses", "Passwords"],
"isVerified": false,
"scrapedAt": "2026-06-04T19:46:11.462Z",
"error": null
}

✨ Why choose this Actor

  • Complete catalog. Pulls the entire public HIBP breach list, over 1,001 entries, in a single run.
  • Logo first. Every record leads with the breached company logo for clean visual dashboards.
  • Three query modes. List everything, filter by domain, or fetch one breach by name.
  • Rich data classes. See exactly what was exposed in each breach, from passwords to physical addresses.
  • Defensive by design. Reads only the public breach catalog. No personal account lookups.

📈 How it compares to alternatives

ApproachSetupStructured outputLogos includedDomain filter
This ActorNone, just runYesYesYes
Manual browsing of the HIBP websiteHigh, copy by handNoNoLimited
Writing your own scriptMedium, code and maintainDepends on youDepends on youDepends on you

🚀 How to use

  1. Create a free Apify account using this sign up link.
  2. Open the HIBP Breaches Catalog Scraper.
  3. Pick a mode. Leave it on all to list the entire catalog, or choose domain or breach and fill the matching field.
  4. Set maxItems if you want fewer records, then click Start.
  5. Download your results or connect them to another app through the Apify API and integrations.

💼 Business use cases

Third party and vendor risk

GoalHow this helps
Screen vendors for known breachesFilter the catalog by a vendor domain and review what was exposed
Document risk evidenceKeep a structured record of breach dates, account counts, and data classes

Security operations and threat intel

GoalHow this helps
Track the breach landscapePull the full catalog on a schedule and watch for new additions
Prioritize responseUse pwnCount and dataClasses to gauge severity

Awareness and training

GoalHow this helps
Build awareness contentPull real breach examples with logos and descriptions
Brief leadershipSummarize recent breaches affecting your sector

Research and reporting

GoalHow this helps
Study breach trendsAnalyze breach dates and exposed data classes over time
Publish findingsBuild datasets from a trusted public source

🔌 Automating HIBP Breaches Catalog Scraper

Connect the Actor output to the tools your team already uses:

  • Make and Zapier to trigger workflows when a run finishes.
  • Slack to post a summary of new breaches to a security channel.
  • Airbyte to load breach records into a warehouse.
  • GitHub to commit dataset snapshots for versioned tracking.
  • Google Drive to archive run outputs for your records.

🌟 Beyond business use cases

  • Research. Academics and journalists can study how breaches evolve over time using a trusted public source.
  • Personal projects. Build a personal dashboard that tracks breaches affecting the brands you use.
  • Non-profit. Community defenders and digital rights groups can monitor the breach landscape at no infrastructure cost.
  • Experimentation. Practice data engineering and visualization with a clean, real world dataset.

🤖 Ask an AI assistant

Drop your dataset into your favorite assistant and ask it to summarize, cluster, or chart the results:

❓ Frequently Asked Questions

Does this Actor look up individual email addresses or passwords? No. It reads only the public breach catalog, which describes breaches at the company level. It never queries individual accounts and needs no API key.

Do I need a Have I Been Pwned API key? No. The breach catalog endpoints used here are public and keyless. Only per account lookups, which this Actor does not perform, require a key.

How many breaches are in the catalog? At the last refresh there were over 1,001 breaches. The number grows as new breaches are documented.

Can I filter to a specific company or domain? Yes. Use domain mode and provide a domain such as adobe.com to return only the breaches tied to that domain.

Can I fetch just one breach? Yes. Use breach mode and provide the exact HIBP breach name, for example Adobe or LinkedIn.

Why is the domain field sometimes empty? A small share of catalog entries have no associated domain in the source data. The Actor keeps the field and leaves it empty for those records rather than dropping them.

What are data classes? They are the categories of information exposed in a breach, such as email addresses, passwords, IP addresses, names, and phone numbers. Each record includes them as an array.

How current is the data? The Actor reads the live catalog at run time, so each run reflects the current state of the public HIBP database.

Is this affiliated with Have I Been Pwned? No. This is an independent tool that reads publicly available catalog data. It is not affiliated with or endorsed by Have I Been Pwned.

Can free users run it? Yes. Free runs are limited to 10 records as a preview. Paid plans raise the limit substantially.

What if a breach name is not found? The Actor returns a single record with an error message explaining that no breach was found for that name.

Can I schedule recurring runs? Yes. Use the Apify scheduler to run the Actor on an interval and track changes to the catalog over time.

🔌 Integrate with any app

Every run produces a structured dataset you can pull through the Apify API, webhooks, or the built in integrations. Push breach records straight into spreadsheets, databases, BI tools, or your own security platform.

💡 Pro Tip: browse the complete ParseForge collection.

🆘 Need Help? Open our contact form

⚠️ Disclaimer: independent tool, not affiliated with Have I Been Pwned. Only publicly available breach catalog data is collected. This Actor is intended for defensive security, breach awareness, and research, and does not perform lookups of individual people.