Deprecated

Pricing

from $0.001 / actor start

See alternative Actors

Go to Apify Store

Cybersecurity Intelligence Directory Scraper

Deprecated

See alternative Actors

Scrapes the Cybersecurity Intelligence Supplier Directory (cybersecurityintelligence.com) for company profiles including name, website, description, location, phone, and category tags.

Pricing

from $0.001 / actor start

Rating

0.0

(0)

Developer

Jon Froemming

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

What it scrapes

Phase	Description
Categories	If you do not pass `categories`, the Actor opens `browse_categories.php`, collects every supplier-directory category link, and enqueues them. Blog `/category/` links are ignored so only real listing URLs are used.
Listings	For each category URL, it parses `.listingsWrapper` blocks (name, short description, address snippet) and follows pagination via `ul.pagination`.
Details (optional)	When `scrapeDetailPages` is `true`, each company link is enqueued as a detail request; the handler extracts full profile data and pushes one dataset item per company.

Country filter: If country is set, a location segment is appended to category URLs (for example US → location/usa/) using a small built-in code → slug map. Leave country empty to scrape all locations.

Input

Configure the Actor in the Apify console or via JSON input. All fields are optional unless noted.

Field	Type	Default	Description
`categories`	`string[]`	`[]`	Category slugs only (e.g. `cloud-security`, `managed-security-services`). Empty = scrape all categories from the browse index.
`country`	`string`	`""`	Filter by country code (`US`, `UK`, `DE`, …) or leave empty for worldwide.
`maxPagesPerCategory`	`integer`	`0`	Cap listing pages per category. `0` = unlimited (follow “next” until none). Max allowed in schema: `500`.
`scrapeDetailPages`	`boolean`	`true`	`true`: visit each company detail page (website, phone, tags). `false`: only data visible on listing cards (faster, fewer fields).
`maxConcurrency`	`integer`	`3`	Playwright concurrency (`1`–`10`). Raise carefully on Apify; higher values increase load on the target site and memory use.

Example input (full directory, details on)

{
  "categories": [],
  "country": "",
  "maxPagesPerCategory": 0,
  "scrapeDetailPages": true,
  "maxConcurrency": 3
}

Example input (specific categories, US only)

{
  "categories": ["cloud-security", "managed-security-services"],
  "country": "US",
  "maxPagesPerCategory": 0,
  "scrapeDetailPages": true,
  "maxConcurrency": 2
}

Output

Results are stored in the default dataset (see the Actor Output tab in Apify for the dataset items link).

Each item is one company. Typical fields:

Field	Description
`company_name`	Display name
`website`	Company site URL when found on the detail page
`domain`	Hostname derived from `website`
`description`	Longer text from the detail page (truncated in code for safety)
`location`	Address / region text
`phone`	Phone if present (`tel:` links)
`industry_tags`	Comma-separated category/tag strings from the page
`source_url`	Page URL used for this row
`directory_source`	Constant label identifying this directory
`date_scraped`	UTC date (`YYYY-MM-DD`)

Field presence depends on scrapeDetailPages and what the site exposes for each company.

Deduplication

The directory lists the same supplier profile URL under multiple categories. Without dedupe you would get repeated rows for one company.

Mode	Behavior
`scrapeDetailPages: true`	Immediately before each `push_data`, the Actor checks a normalized profile URL (scheme, host, path; UTM query params stripped). The first successful extraction for that URL is written to the dataset; later handler invocations for the same URL skip output and log `Skip duplicate profile output`.
`scrapeDetailPages: false`	Listing-only rows use the same rule on the company link URL from the category page so each company appears at most once per run.

Details:

Normalization uses the same logic as the scraper’s clean_url helper (e.g. trailing slashes, utm_* removed).
Dedupe state is in memory for the current run only. A new Apify run starts with an empty set, so the default dataset for that run can contain one row per company again (expected for a fresh dataset).
Reservations are released if push_data fails so Crawlee retries can still emit a row for that profile URL.
The Crawlee request queue may still drop duplicate detail URLs by URL key; the output gate is an extra guarantee when listing cards or retries could otherwise double-emit.

How a run completes on Apify

The entrypoint is python -m my_actor, which calls crawler.run() once. Crawlee drains the request queue for that run: categories → listing pages → detail pages (if enabled). You do not need a shell loop on the platform for a full crawl.

For local development, scripts/run_until_done.sh can repeat apify run if you want to retry until the local queue reports zero pending requests (optional; see script header comments).

Local development

Requirements: Python 3.x, Apify CLI, Docker (for apify run with the same image as production).

cd cybersecurity-intelligence-scraper
apify login
apify run

Optional full local loop:

$./scripts/run_until_done.sh

Environment variables used by the helper script: MAX_ATTEMPTS (default 200), SLEEP_SECONDS (default 5).

Deploy

From this directory:

$apify push

Ensure .actor/actor.json, input_schema.json, output_schema.json, and dataset_schema.json stay valid; Apify validates them at build time.

Legal and etiquette

Only run this Actor in compliance with the target site’s terms of service, robots.txt, and applicable law. Use reasonable concurrency; the defaults are conservative.

Project layout

Path	Role
`my_actor/`	`main.py` (crawler setup, start URLs), `routes.py` (handlers)
`.actor/`	Actor manifest, input/output/dataset schemas
`Dockerfile`	`apify/actor-python-playwright` base, `CMD python -m my_actor`
`scripts/run_until_done.sh`	Optional local multi-attempt runner

FormerGov Directory Scraper

rl1987/formergov-scraper

Scrapes the formergov.com directory of former government and military professionals, with contact info (LinkedIn, website, email) and full advanced-search filtering.

R.L.

Map Your Show Exhibitor Scraper

maximedupre/map-your-show-exhibitor-scraper

Scrape public Map Your Show exhibitor directories. Export company names, profile URLs, booths, descriptions, logos, categories, websites, addresses, phone numbers, and social links.

Maxime Dupré

Podcast Directory Scraper — Host Contacts & Emails

ryanclinton/podcast-directory-scraper

Search Apple Podcasts and Spotify by keyword. Extract host emails, owner contacts, website URLs, publishing frequency, and episode data. Active status filtering and multi-keyword deduplication included.

Ryan Clinton

101

5.0

(1)

FIRST.org CSIRT Teams + EPSS Scraper

parseforge/first-org-csirt-teams-scraper

Extract the FIRST.org global directory of Computer Security Incident Response Teams (CSIRTs): team name, country, region, host organization, constituency, members, established date, and contact channels. Export to JSON, CSV, or Excel for cybersecurity research, threat intelligence.

ParseForge

🚀 Techstars Companies Directory — Accelerator Alumni DB

nexgendata/techstars-companies-directory

Scrape Techstars portfolio — 5,500+ alumni from 128 accelerator programs (Boulder, NYC, Seattle, London, Berlin, Tel Aviv). Name, program, cohort year, city, industry, website, unicorn/exit flags. VC sourcing, BD, recruiting. Bloomberg / PitchBook / CB Insights alternative.

NexGenData

Dun & Bradstreet Scraper - Companies, DUNS & Revenue

haketa/dnb-scraper

Dun & Bradstreet (D&B) scraper & API: search the business directory and export company profile, DUNS number, revenue, employee count, industry, address, phone, website and key principals/executives. B2B lead generation, company firmographics and due diligence — fast, no login.

Haketa

B2B Agency Scraper - Marketing, Design & Dev Agency Leads

scrapesage/b2b-agency-scraper

Scrape B2B agencies from Sortlist & DesignRush: name, services, ratings, pricing, team size, location, socials & website contact emails. Marketing, SEO, design, web, app & software agencies — the clean Clutch alternative. Filter by category, score leads & monitor new agencies.

Scrape Sage

Xporience Exhibitor Scraper 🔍🚀🗺️ - Cheap

scrapestorm/xporience-exhibitor-scraper---cheap

🔍 Scrape Exhibitors at Scale – Xporience Platform 🏢 Enter an Xporience exhibitor list URL to collect exhibitor data at scale, including company name, country, sectors, profiles & floorplan links 🔗📊 Perfect for international trade fair research, B2B, market analysis & event intelligence 🚀

Storm_Scraper

4.0

(2)

Government Contract Awards Scraper

parseforge/government-contract-awards-scraper

Scrape all US federal spending data from USAspending.gov API. Extract contracts, grants, loans and more with recipient analytics, transaction history, and spending breakdowns by agency, NAICS an geography. Filter by keyword, agency, date range, amount, recipient type, location, NAICS and PSC codes.

ParseForge

Canada Aerospace & Defence Company Scraper

skydernet/canada-aerospace-defence-agencies

Scrape 500+ aerospace and defence companies from AIAC and CADSI directories. Targets Canada's structural engineering talent shortage market — Systems Engineers, DO-178C specialists, avionics, propulsion, and more. Optional Corporations Canada enrichment.