Webinar Landing Page Extractor avatar

Webinar Landing Page Extractor

Pricing

Pay per event

Go to Apify Store
Webinar Landing Page Extractor

Webinar Landing Page Extractor

Extract webinar titles, dates, speakers, registration links, platform hints, and evidence from public event hubs and landing pages.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

Extract structured webinar intelligence from public webinar pages, event hubs, registration pages, and on-demand landing pages.

Use this actor when you need a clean dataset of webinar titles, hosts, dates, speakers, registration links, topics, platform hints, and evidence snippets without manually copying data from marketing pages.

What does Webinar Landing Page Extractor do?

Webinar Landing Page Extractor crawls public pages that promote webinars, demos, virtual events, workshops, and on-demand sessions.

It reads the HTML, JSON-LD event metadata, headings, CTA links, and visible page text.

Then it returns normalized webinar records that are ready for spreadsheets, CRM enrichment, competitive research, and content calendars.

The actor does not bypass logins, private forms, paywalls, or registrant lists.

It only extracts information visible on public landing pages.

Who is it for?

Demand generation teams use it to monitor competitor webinar programs.

Revenue operations teams use it to build a repeatable feed of upcoming registration pages.

SDR teams use it to find topical events and timing signals for outreach.

Content marketers use it to audit webinar hubs and repurpose event topics.

Agencies use it to compare webinar calendars across many client competitors.

Analysts use it to normalize public event pages from many different website templates.

Why use it?

Public webinar pages are inconsistent.

One site uses JSON-LD Event markup.

Another hides the date in body copy.

Another uses a generic event hub with registration cards.

This actor combines structured extraction with resilient heuristics and evidence snippets, so you can review ambiguous records quickly.

What data can it extract?

FieldDescription
titleWebinar or event title
hostWebsite, organization, or event organizer
dateTextRaw date/time text found on the page
startDateIsoParsed structured start date when available
timezoneTimezone abbreviation when visible
statusupcoming, on-demand, past, or unknown
registrationUrlBest registration, watch, reserve, or join CTA link
ctaTextVisible text for the registration CTA
speakersSpeaker or presenter names when visible
topicsHeadings and topic snippets from the page
platformHintsZoom, ON24, Webex, Airmeet, BrightTALK, and similar hints
confidenceExtraction confidence score from 0 to 1
evidenceTextRaw evidence used to support the record

How much does it cost to extract webinar landing pages?

This actor uses pay-per-event pricing.

You pay a small start fee per run and a per-record event for each webinar record saved.

The default input is intentionally small so first tests stay inexpensive.

For large webinar hubs, increase maxPagesPerStartUrl and maxItems after a smoke test.

Quick start

  1. Open the actor on Apify.
  2. Paste one or more public webinar, event, or registration URLs into startUrls.
  3. Keep discoverLinks enabled if the URL is a hub page.
  4. Set maxItems to the number of webinar records you want.
  5. Run the actor.
  6. Export the dataset as JSON, CSV, Excel, or connect it to your workflow.

Input options

startUrls

Use public webinar landing pages, webinar hubs, event pages, or registration pages.

Examples:

  • https://www.salesforce.com/resources/webinars/
  • https://www.semrush.com/webinars/
  • A competitor webinar registration URL
  • A product demo event page

maxItems

Stops the run after this many webinar records are saved.

Use a small number for testing.

Use a larger number for complete hub extraction.

When enabled, the actor follows same-domain links that look like webinar, event, demo, workshop, register, or on-demand pages.

Disable this when you only want the exact URLs you provided.

maxPagesPerStartUrl

Caps how many pages the actor fetches for each start URL.

This protects your run budget and avoids crawling an entire website.

includeKeywords

Optional keywords that must appear in the extracted title, host, or evidence.

Use this for topic-specific monitoring, such as AI, security, or SEO.

excludeKeywords

Optional keywords that exclude matching records.

Use this to remove careers pages, unrelated conferences, or archived content.

Output example

{
"sourceUrl": "https://www.example.com/webinars/",
"pageUrl": "https://www.example.com/webinars/ai-demo",
"title": "AI Demo Webinar",
"host": "Example",
"dateText": "July 30, 2026 at 2 PM EST",
"status": "upcoming",
"registrationUrl": "https://www.example.com/register/ai-demo",
"speakers": [{ "name": "Jane Doe", "role": "VP Marketing" }],
"topics": ["AI Demo Webinar", "How teams automate workflows"],
"platformHints": ["zoom"],
"confidence": 0.85,
"evidenceText": "AI Demo Webinar | July 30, 2026 at 2 PM EST | Register now"
}

Discovery mode

Discovery mode is designed for webinar hub pages.

The actor fetches the hub page first.

Then it follows same-domain links whose URL or anchor text suggests webinar, event, demo, workshop, registration, or on-demand content.

It does not crawl off-domain links during discovery.

This keeps the run focused on the website you provided.

Date and timezone handling

The actor prefers structured JSON-LD event dates when a page provides them.

If no structured date exists, it looks for visible date and time text.

Date parsing can be ambiguous across regions and page templates.

For that reason, the actor always includes dateText and evidenceText so you can audit important records.

Speaker extraction

Speaker data is extracted from JSON-LD performer/speaker fields when available.

The actor also checks common speaker and presenter sections in the HTML.

Because every website template is different, speaker fields may be empty for some pages.

Use evidenceText and topics to review ambiguous pages.

The actor searches visible links and buttons for labels such as:

  • Register
  • Save my spot
  • Reserve
  • Sign up
  • Watch now
  • View webinar
  • On demand
  • Join

The best matching URL is returned as registrationUrl.

Platform hints

The actor scans public page text and links for common webinar platform hints.

Examples include Zoom, ON24, GoToWebinar, Webex, Microsoft Teams, Airmeet, BrightTALK, Demio, Livestorm, BigMarker, Hopin, Goldcast, and Bizzabo.

These hints are useful for routing events into the right operational workflow.

Confidence score

confidence is a simple extraction-quality score.

It increases when the actor finds a title, date, registration CTA, speakers, structured event metadata, and platform hints.

Low-confidence records are still saved because public pages can be messy.

Use the score to prioritize manual review.

Tips for better results

Start with official webinar hubs rather than generic homepages.

Keep maxPagesPerStartUrl modest for first runs.

Use includeKeywords when monitoring a narrow product category.

Use excludeKeywords to remove archived topics or unrelated event pages.

Review evidenceText for any record that will trigger an automated action.

Integrations

Send extracted webinar records to Google Sheets for editorial calendars.

Push upcoming events into a CRM enrichment queue.

Sync registration links to Slack alerts for competitive-intel teams.

Store webinar topics in a warehouse for trend analysis.

Feed public webinar evidence into an LLM workflow for summarization.

API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/webinar-landing-page-extractor').call({
startUrls: [{ url: 'https://www.salesforce.com/resources/webinars/' }],
maxItems: 20,
discoverLinks: true,
});
console.log(run.defaultDatasetId);

Python

from apify_client import ApifyClient
client = ApifyClient()
run = client.actor('automation-lab/webinar-landing-page-extractor').call(run_input={
'startUrls': [{'url': 'https://www.salesforce.com/resources/webinars/'}],
'maxItems': 20,
'discoverLinks': True,
})
print(run['defaultDatasetId'])

cURL

curl -X POST 'https://api.apify.com/v2/acts/automation-lab~webinar-landing-page-extractor/runs?token=YOUR_APIFY_TOKEN' \
-H 'Content-Type: application/json' \
-d '{"startUrls":[{"url":"https://www.salesforce.com/resources/webinars/"}],"maxItems":20,"discoverLinks":true}'

MCP usage

Use the Apify MCP server with Claude Code, Claude Desktop, or another MCP client.

MCP URL:

https://mcp.apify.com/?tools=automation-lab/webinar-landing-page-extractor

Claude Code setup:

$claude mcp add apify-webinar-extractor https://mcp.apify.com/?tools=automation-lab/webinar-landing-page-extractor

Claude Desktop JSON config:

{
"mcpServers": {
"apify-webinar-extractor": {
"url": "https://mcp.apify.com/?tools=automation-lab/webinar-landing-page-extractor"
}
}
}

Example prompts:

  • "Extract upcoming webinars from these competitor event hubs and return registration URLs."
  • "Find AI-related webinars on these public marketing sites."
  • "Summarize the speakers and dates from this webinar dataset."

Legality and compliance

This actor extracts publicly visible webpage content.

Do not use it to bypass login walls, gated forms, private registrant lists, or access controls.

Review the target website terms and applicable laws before using scraped data in production.

Avoid storing personal data unless you have a lawful basis and clear business need.

FAQ

Can it extract private attendee or registrant lists?

No. It only extracts public landing-page information and does not bypass forms, accounts, or registration gates.

Troubleshooting

Why is dateText filled but startDateIso empty?

Some websites publish human-readable dates without machine-readable dates.

The actor keeps the raw date evidence so you can review or parse it downstream.

Why are speakers empty?

Some landing pages hide speakers in images, scripts, or late-loaded widgets.

Try using a specific webinar detail URL instead of a high-level hub page.

Why did a hub return unrelated topics?

Hub pages contain navigation, product names, and general marketing copy.

Use includeKeywords, excludeKeywords, or lower maxPagesPerStartUrl to focus the run.

Explore other Automation Lab actors at https://apify.com/automation-lab/ for lead research, content extraction, website auditing, and market-intelligence workflows.

Use this actor alongside generic website crawlers when you need normalized event fields instead of raw page text.

Changelog

Initial version extracts public webinar landing-page data with HTTP, Cheerio, JSON-LD parsing, heuristic date/CTA/speaker detection, link discovery, confidence scoring, and evidence snippets.