Webinar Event Discovery Scraper
Pricing
Pay per event
Webinar Event Discovery Scraper
Discover public webinars, workshops, demos, and B2B event landing pages from domains or URLs with dates, speakers, CTAs, and evidence.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Discover public webinars, workshops, virtual events, product demos, summits, and on-demand event landing pages from company domains or known event URLs.
The actor crawls public B2B marketing pages, follows same-domain event links, extracts structured data when available, and saves normalized records with dates, CTAs, speakers, topics, evidence snippets, and confidence scores.
Use it when you need repeatable event intelligence for lead generation, competitive monitoring, partnership research, content calendars, or CRM enrichment.
What does Webinar Event Discovery Scraper do?
It turns a list of public websites into a clean dataset of webinar and event opportunities.
- ๐ Starts from supplied URLs or domains.
- ๐งญ Follows same-domain links that look like webinars, events, demos, workshops, summits, conferences, or training pages.
- ๐งฑ Reads structured
Eventdata from JSON-LD when websites publish it. - ๐ Falls back to page titles, metadata, headings, body text, CTAs, and speaker blocks when structured data is missing.
- ๐ Normalizes dates and status hints.
- ๐ฏ Finds registration URLs and CTA text.
- ๐งช Adds evidence snippets and confidence scores so humans can review borderline pages quickly.
Who is it for?
Demand generation teams
Track upcoming webinars across target accounts, partners, competitors, and industry publishers.
Sales development teams
Find timely campaign hooks and company initiatives to use in outbound messaging.
Competitive intelligence teams
Monitor competitor webinars, demos, workshops, and product education pages.
Partner marketing teams
Build shared event calendars from partner and ecosystem websites.
Content operations teams
Audit old and new event assets across many public resource centers.
Why use it?
Manual webinar research is repetitive and easy to miss.
This actor helps you:
- Save hours of manual browsing.
- Find registration pages before events happen.
- Detect on-demand webinars that can become lead magnets.
- Standardize messy event pages into one schema.
- Feed CRM, spreadsheets, Slack alerts, or BI dashboards.
- Re-run the same sources every week or month.
What data can it extract?
| Field | Description |
|---|---|
sourceUrl | Page where the event was found |
eventTitle | Webinar or event title |
eventType | Structured type or inferred event type |
status | upcoming, on-demand, past, or unknown |
startDate | ISO start date when detected |
endDate | ISO end date when detected |
startTime | Time portion from the start date |
endTime | Time portion from the end date |
timezone | Timezone hint when present |
hostCompany | Organizer, publisher, site name, or domain |
speakers | Speaker or presenter names detected on the page |
description | Metadata or structured event description |
agenda | Short agenda-like evidence line |
topics | Topic keywords inferred from the page |
registrationUrl | Best registration, watch, or CTA URL |
ctaText | CTA anchor/button text |
gatedFormProvider | Hints such as Marketo, HubSpot, ON24, Zoom, or Cvent |
evidenceSnippets | Short snippets proving why the page matched |
pageHash | Stable hash for change detection |
discoveredAt | Extraction timestamp |
confidence | 0-1 score for review prioritization |
How much does it cost to discover webinar event leads?
Pricing uses pay-per-event output.
You pay a small start fee for each run and then a per-record event fee for saved webinar/event records.
This is useful for monitoring workflows because small tests stay cheap while larger recurring crawls scale with the amount of useful data returned.
How to use it
- Open the actor on Apify.
- Add public event hub URLs or company domains.
- Keep the default crawl caps for the first run.
- Optionally set include or exclude keywords.
- Run the actor.
- Export the dataset as CSV, JSON, Excel, or via API.
- Schedule it weekly or monthly for monitoring.
Input options
Start URLs
Use this for known event hubs, resource centers, webinar listing pages, or landing pages.
Examples:
https://www.salesforce.com/events/webinars/https://www.hubspot.com/resources/webinarshttps://example.com/events
Domains
Use this when you know the company but not the exact event URL.
Examples:
salesforce.comhubspot.comexample.com
The actor starts at the homepage and follows event-like same-domain links.
Maximum pages per domain
Controls crawler sprawl.
For a quick test, use 5-15 pages per domain.
For a deeper crawl, use 50-100 pages per domain.
Maximum event records
Stops the run after this many saved records.
Use a low value for testing and a higher value for scheduled discovery.
Include keywords
The default keywords target webinars, events, workshops, demos, summits, conferences, training pages, and on-demand pages.
Add vertical terms like security, AI, data, or CRM when you want a tighter dataset.
Exclude keywords
Use this to drop irrelevant pages.
Common examples:
careersinvestorpress releasesponsorship
Status filters
Keep upcoming events, on-demand assets, past events, or unknown pages.
Unknown is useful because many landing pages hide dates inside images or scripts.
Date filters
Use fromDate and toDate when you only want records with detected dates in a specific window.
Pages without dates are not removed by date filters unless a comparable detected date exists.
Proxy option
Public B2B marketing pages usually work without a proxy.
Enable Apify Proxy only when a target blocks normal requests.
Output example
{"sourceUrl": "https://www.example.com/events/ai-webinar","eventTitle": "AI Automation Webinar","eventType": "Webinar","status": "upcoming","startDate": "2026-07-10T17:00:00.000Z","endDate": null,"startTime": "17:00:00","endTime": null,"timezone": "UTC","hostCompany": "example.com","speakers": ["Jane Smith"],"description": "Join our AI automation webinar for marketing teams.","agenda": "Learn how to automate webinar follow-up workflows.","topics": ["webinar", "ai", "automation", "marketing"],"registrationUrl": "https://www.example.com/events/ai-webinar/register","ctaText": "Register now","gatedFormProvider": "marketo","evidenceSnippets": ["Register now for our AI automation webinar"],"pageHash": "abc123def4567890","discoveredAt": "2026-06-25T00:00:00.000Z","confidence": 0.9}
Tips for better results
- Start with known webinar or event hub URLs when possible.
- Use domains for discovery, then save high-quality event hubs for recurring runs.
- Keep page caps conservative for large enterprise websites.
- Add industry keywords to reduce noisy general event pages.
- Keep
unknownstatus enabled if you care about pages where dates are hidden in scripts. - Use
pageHashto detect changed pages over time.
Common workflows
Competitive webinar monitoring
Add competitor domains and schedule the actor weekly.
Export upcoming and on-demand records to a spreadsheet or CRM.
Account-based marketing research
Add target account domains.
Find recent webinars and demos that reveal strategic priorities.
Partner event calendar
Add partner domains.
Collect upcoming workshops and webinars in one dataset.
Lead magnet discovery
Keep on-demand status enabled.
Find gated webinars and resource pages for content intelligence.
Integrations
The dataset can be sent to:
- Google Sheets for campaign planning.
- Airtable for editorial calendars.
- HubSpot or Salesforce for account enrichment.
- Slack for new event alerts.
- Snowflake, BigQuery, or S3 for historical analysis.
- Zapier or Make for no-code automations.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('automation-lab/webinar-event-discovery-scraper').call({startUrls: [{ url: 'https://www.salesforce.com/events/webinars/' }],maxPagesPerDomain: 20,maxResults: 100});console.log(run.defaultDatasetId);
Python
from apify_client import ApifyClientclient = ApifyClient("<APIFY_TOKEN>")run = client.actor("automation-lab/webinar-event-discovery-scraper").call(run_input={"domains": ["salesforce.com"],"maxPagesPerDomain": 20,"maxResults": 100,})print(run["defaultDatasetId"])
cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~webinar-event-discovery-scraper/runs?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"startUrls":[{"url":"https://www.salesforce.com/events/webinars/"}],"maxResults":100}'
MCP usage
Use the Apify MCP server with this actor in Claude Code, Claude Desktop, or compatible MCP clients.
MCP URL:
https://mcp.apify.com/?tools=automation-lab/webinar-event-discovery-scraper
Claude Code setup:
$claude mcp add apify-webinar-events https://mcp.apify.com/?tools=automation-lab/webinar-event-discovery-scraper
Claude Desktop JSON config:
{"mcpServers": {"apify-webinar-events": {"url": "https://mcp.apify.com/?tools=automation-lab/webinar-event-discovery-scraper"}}}
Example prompts:
- "Find upcoming webinars from these five competitor domains and summarize the topics."
- "Run the webinar event discovery scraper for this partner list and export registration URLs."
- "Compare this week's discovered webinar pages with last week's page hashes."
Scheduling
Schedule the actor weekly for competitor monitoring or monthly for broad account research.
Use a stable source list and compare pageHash, eventTitle, and startDate between runs.
Data quality notes
Websites publish event data inconsistently.
The actor combines structured data and heuristics.
Confidence scores help you decide which records need review.
A missing date does not always mean the page is irrelevant.
Some pages hide dates in images, third-party widgets, or scripts.
FAQ
What websites work best?
Public B2B webinar hubs, event pages, demo pages, and resource centers work best. Private portals, login walls, and heavily scripted widgets may need exact URLs or future browser fallback.
Can I monitor the same sources repeatedly?
Yes. Schedule the actor and compare sourceUrl, eventTitle, startDate, and pageHash across runs.
Troubleshooting
The run saved fewer records than expected
Increase maxPagesPerDomain, keep unknown status enabled, and add broader include keywords such as event, demo, and training.
A page is skipped
The page may be non-HTML, blocked, private, or outside the same domain crawl scope.
Try adding the exact landing page as a start URL.
Dates are missing
The site may render dates with JavaScript or images.
Use the evidence snippets and source URL to review those records manually.
The crawler visits too many pages
Lower maxPagesPerDomain, use more specific start URLs, and add include keywords.
Legality and responsible use
This actor is designed for public pages only.
Do not use it to bypass login walls, private communities, paywalls, or access controls.
Review each target site's terms and your local laws before running large crawls.
Respect robots, rate limits, and data privacy rules.
Related scrapers
Related Automation Lab actors that can complement this workflow:
- Website Contact Finder for company contact enrichment.
- Domain to company enrichment actors for account lists.
- Search result scrapers for finding new webinar hubs.
- LinkedIn or company profile scrapers where allowed by their terms.
Limitations
- Generic crawling cannot guarantee every event page on every website.
- JavaScript-only widgets may require future browser fallback work.
- Registration forms may be embedded by third-party providers.
- Some sites use regional redirects or consent walls.
- Speaker extraction depends on page structure.
Best practices
- Use high-quality seed URLs.
- Run a small sample first.
- Review confidence scores.
- Export and deduplicate by source URL and title.
- Re-run on a schedule for monitoring.
- Keep source lists organized by segment or competitor set.
Example source list
{"startUrls": [{ "url": "https://www.salesforce.com/events/webinars/" },{ "url": "https://www.hubspot.com/resources/webinars" }],"maxPagesPerDomain": 25,"maxResults": 100,"statuses": ["upcoming", "on-demand", "unknown"]}
Changelog
0.1
Initial private build for public webinar and event discovery.
Support
If a target website consistently blocks public access, reduce crawl depth, add exact event URLs, or enable proxy for that site.
For best commercial results, keep runs focused on public B2B marketing and event pages.