Webinar Landing Page Extractor
Pricing
Pay per event
Webinar Landing Page Extractor
Extract webinar titles, dates, speakers, registration links, platform hints, and evidence from public event hubs and landing pages.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Extract structured webinar intelligence from public webinar pages, event hubs, registration pages, and on-demand landing pages.
Use this actor when you need a clean dataset of webinar titles, hosts, dates, speakers, registration links, topics, platform hints, and evidence snippets without manually copying data from marketing pages.
What does Webinar Landing Page Extractor do?
Webinar Landing Page Extractor crawls public pages that promote webinars, demos, virtual events, workshops, and on-demand sessions.
It reads the HTML, JSON-LD event metadata, headings, CTA links, and visible page text.
Then it returns normalized webinar records that are ready for spreadsheets, CRM enrichment, competitive research, and content calendars.
The actor does not bypass logins, private forms, paywalls, or registrant lists.
It only extracts information visible on public landing pages.
Who is it for?
Demand generation teams use it to monitor competitor webinar programs.
Revenue operations teams use it to build a repeatable feed of upcoming registration pages.
SDR teams use it to find topical events and timing signals for outreach.
Content marketers use it to audit webinar hubs and repurpose event topics.
Agencies use it to compare webinar calendars across many client competitors.
Analysts use it to normalize public event pages from many different website templates.
Why use it?
Public webinar pages are inconsistent.
One site uses JSON-LD Event markup.
Another hides the date in body copy.
Another uses a generic event hub with registration cards.
This actor combines structured extraction with resilient heuristics and evidence snippets, so you can review ambiguous records quickly.
What data can it extract?
| Field | Description |
|---|---|
title | Webinar or event title |
host | Website, organization, or event organizer |
dateText | Raw date/time text found on the page |
startDateIso | Parsed structured start date when available |
timezone | Timezone abbreviation when visible |
status | upcoming, on-demand, past, or unknown |
registrationUrl | Best registration, watch, reserve, or join CTA link |
ctaText | Visible text for the registration CTA |
speakers | Speaker or presenter names when visible |
topics | Headings and topic snippets from the page |
platformHints | Zoom, ON24, Webex, Airmeet, BrightTALK, and similar hints |
confidence | Extraction confidence score from 0 to 1 |
evidenceText | Raw evidence used to support the record |
How much does it cost to extract webinar landing pages?
This actor uses pay-per-event pricing.
You pay a small start fee per run and a per-record event for each webinar record saved.
The default input is intentionally small so first tests stay inexpensive.
For large webinar hubs, increase maxPagesPerStartUrl and maxItems after a smoke test.
Quick start
- Open the actor on Apify.
- Paste one or more public webinar, event, or registration URLs into
startUrls. - Keep
discoverLinksenabled if the URL is a hub page. - Set
maxItemsto the number of webinar records you want. - Run the actor.
- Export the dataset as JSON, CSV, Excel, or connect it to your workflow.
Input options
startUrls
Use public webinar landing pages, webinar hubs, event pages, or registration pages.
Examples:
https://www.salesforce.com/resources/webinars/https://www.semrush.com/webinars/- A competitor webinar registration URL
- A product demo event page
maxItems
Stops the run after this many webinar records are saved.
Use a small number for testing.
Use a larger number for complete hub extraction.
discoverLinks
When enabled, the actor follows same-domain links that look like webinar, event, demo, workshop, register, or on-demand pages.
Disable this when you only want the exact URLs you provided.
maxPagesPerStartUrl
Caps how many pages the actor fetches for each start URL.
This protects your run budget and avoids crawling an entire website.
includeKeywords
Optional keywords that must appear in the extracted title, host, or evidence.
Use this for topic-specific monitoring, such as AI, security, or SEO.
excludeKeywords
Optional keywords that exclude matching records.
Use this to remove careers pages, unrelated conferences, or archived content.
Output example
{"sourceUrl": "https://www.example.com/webinars/","pageUrl": "https://www.example.com/webinars/ai-demo","title": "AI Demo Webinar","host": "Example","dateText": "July 30, 2026 at 2 PM EST","status": "upcoming","registrationUrl": "https://www.example.com/register/ai-demo","speakers": [{ "name": "Jane Doe", "role": "VP Marketing" }],"topics": ["AI Demo Webinar", "How teams automate workflows"],"platformHints": ["zoom"],"confidence": 0.85,"evidenceText": "AI Demo Webinar | July 30, 2026 at 2 PM EST | Register now"}
Discovery mode
Discovery mode is designed for webinar hub pages.
The actor fetches the hub page first.
Then it follows same-domain links whose URL or anchor text suggests webinar, event, demo, workshop, registration, or on-demand content.
It does not crawl off-domain links during discovery.
This keeps the run focused on the website you provided.
Date and timezone handling
The actor prefers structured JSON-LD event dates when a page provides them.
If no structured date exists, it looks for visible date and time text.
Date parsing can be ambiguous across regions and page templates.
For that reason, the actor always includes dateText and evidenceText so you can audit important records.
Speaker extraction
Speaker data is extracted from JSON-LD performer/speaker fields when available.
The actor also checks common speaker and presenter sections in the HTML.
Because every website template is different, speaker fields may be empty for some pages.
Use evidenceText and topics to review ambiguous pages.
Registration link extraction
The actor searches visible links and buttons for labels such as:
- Register
- Save my spot
- Reserve
- Sign up
- Watch now
- View webinar
- On demand
- Join
The best matching URL is returned as registrationUrl.
Platform hints
The actor scans public page text and links for common webinar platform hints.
Examples include Zoom, ON24, GoToWebinar, Webex, Microsoft Teams, Airmeet, BrightTALK, Demio, Livestorm, BigMarker, Hopin, Goldcast, and Bizzabo.
These hints are useful for routing events into the right operational workflow.
Confidence score
confidence is a simple extraction-quality score.
It increases when the actor finds a title, date, registration CTA, speakers, structured event metadata, and platform hints.
Low-confidence records are still saved because public pages can be messy.
Use the score to prioritize manual review.
Tips for better results
Start with official webinar hubs rather than generic homepages.
Keep maxPagesPerStartUrl modest for first runs.
Use includeKeywords when monitoring a narrow product category.
Use excludeKeywords to remove archived topics or unrelated event pages.
Review evidenceText for any record that will trigger an automated action.
Integrations
Send extracted webinar records to Google Sheets for editorial calendars.
Push upcoming events into a CRM enrichment queue.
Sync registration links to Slack alerts for competitive-intel teams.
Store webinar topics in a warehouse for trend analysis.
Feed public webinar evidence into an LLM workflow for summarization.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('automation-lab/webinar-landing-page-extractor').call({startUrls: [{ url: 'https://www.salesforce.com/resources/webinars/' }],maxItems: 20,discoverLinks: true,});console.log(run.defaultDatasetId);
Python
from apify_client import ApifyClientclient = ApifyClient()run = client.actor('automation-lab/webinar-landing-page-extractor').call(run_input={'startUrls': [{'url': 'https://www.salesforce.com/resources/webinars/'}],'maxItems': 20,'discoverLinks': True,})print(run['defaultDatasetId'])
cURL
curl -X POST 'https://api.apify.com/v2/acts/automation-lab~webinar-landing-page-extractor/runs?token=YOUR_APIFY_TOKEN' \-H 'Content-Type: application/json' \-d '{"startUrls":[{"url":"https://www.salesforce.com/resources/webinars/"}],"maxItems":20,"discoverLinks":true}'
MCP usage
Use the Apify MCP server with Claude Code, Claude Desktop, or another MCP client.
MCP URL:
https://mcp.apify.com/?tools=automation-lab/webinar-landing-page-extractor
Claude Code setup:
$claude mcp add apify-webinar-extractor https://mcp.apify.com/?tools=automation-lab/webinar-landing-page-extractor
Claude Desktop JSON config:
{"mcpServers": {"apify-webinar-extractor": {"url": "https://mcp.apify.com/?tools=automation-lab/webinar-landing-page-extractor"}}}
Example prompts:
- "Extract upcoming webinars from these competitor event hubs and return registration URLs."
- "Find AI-related webinars on these public marketing sites."
- "Summarize the speakers and dates from this webinar dataset."
Legality and compliance
This actor extracts publicly visible webpage content.
Do not use it to bypass login walls, gated forms, private registrant lists, or access controls.
Review the target website terms and applicable laws before using scraped data in production.
Avoid storing personal data unless you have a lawful basis and clear business need.
FAQ
Can it extract private attendee or registrant lists?
No. It only extracts public landing-page information and does not bypass forms, accounts, or registration gates.
Troubleshooting
Why is dateText filled but startDateIso empty?
Some websites publish human-readable dates without machine-readable dates.
The actor keeps the raw date evidence so you can review or parse it downstream.
Why are speakers empty?
Some landing pages hide speakers in images, scripts, or late-loaded widgets.
Try using a specific webinar detail URL instead of a high-level hub page.
Why did a hub return unrelated topics?
Hub pages contain navigation, product names, and general marketing copy.
Use includeKeywords, excludeKeywords, or lower maxPagesPerStartUrl to focus the run.
Related scrapers
Explore other Automation Lab actors at https://apify.com/automation-lab/ for lead research, content extraction, website auditing, and market-intelligence workflows.
Use this actor alongside generic website crawlers when you need normalized event fields instead of raw page text.
Changelog
Initial version extracts public webinar landing-page data with HTTP, Cheerio, JSON-LD parsing, heuristic date/CTA/speaker detection, link discovery, confidence scoring, and evidence snippets.