Spotify Podcast Guest Graph Scraper
Pricing
Pay per event
Spotify Podcast Guest Graph Scraper
Extract a structured guest-history graph from Spotify podcast episodes via the Spotify API — episode metadata plus NLP-detected guest names, roles, and confidence scores — export to JSON or CSV. Uses your own Spotify Developer credentials; metadata only, no audio.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
a day ago
Last modified
Categories
Share
🎯 What this scrapes
Spotify publishes every show's episode list at api.spotify.com/v1/shows/{id}/episodes. This Actor authenticates with your Spotify Developer credentials, pages through episodes for one or more shows, runs spaCy + regex over each episode description, and emits one row per episode × guest pair — name, inferred role (host / cohost / guest), and a confidence score. The result is a clean, exportable podcast guest database ready for research, lead-gen, or AI training pipelines.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the endpoint sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down and surface partial progress rather than going silent.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Podcast guest research — build a podcast guest database of who has appeared on which shows to find cross-show patterns and booking opportunities.
- Booking agencies — surface guests that match a target podcast's profile to pitch your client into similar shows; replaces hours of manual research.
- Media intel — track when a public figure (CEO, author, researcher) makes podcast appearances; export a podcast guest list for any show in seconds.
- Entity-graph AI training — feed (person, show, date) triples into a knowledge graph for downstream ML or recommendation systems.
- Journalists — quickly trace a person's recent podcast circuit to map their messaging and audience reach.
- Founder-led PR — research which shows align with your topic before you pitch, using real guest history data rather than guessing.
⚙️ How to use it
- Create a free Spotify Developer app at https://developer.spotify.com/dashboard (takes 60 seconds). Copy your
client_idandclient_secret. - Click Try for free on the Apify Store page.
- Paste your
client_idandclient_secretinto the input form. Add the Spotify show ID(s) you want to scrape — it's the 22-character string after/show/in anyopen.spotify.comURL. - Click Start. Results stream into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the Apify API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
show_ids | array | no | ["2MAi0BvDc6GTFvKFPXnkCL"] | List of 22-character Spotify show IDs (the part after /show/ in any open.spotify.com URL). Provide this OR showSearchQuery. |
showSearchQuery | string | no | — | Free-text query passed to /v1/search. The top-matching show is used. Ignored when show_ids is set. |
maxEpisodesPerShow | integer | no | 20 | Newest episodes first. Hard cap 200. |
clientId | string | yes | — | From your Spotify Developer Dashboard application. Required. |
clientSecret | string | yes | — | From the same Spotify Developer application. Stored as an Apify Secret — never logged. |
market | string | no | "US" | ISO 3166-1 alpha-2 country code for availability filtering on Spotify episode endpoints. |
proxyConfiguration | object | no | {"useApifyProxy": false} | Proxy settings. Enable residential proxies for high-volume runs to avoid rate-limiting. |
Example input
{"show_ids": ["2MAi0BvDc6GTFvKFPXnkCL"],"maxEpisodesPerShow": 3,"clientId": "your_client_id_here","clientSecret": "your_client_secret_here","market": "US","proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item representing a single episode × guest pair.
| Field | Type | Notes |
|---|---|---|
show_id | string | Spotify show ID. |
show_name | string | Show name as returned by /v1/shows/{id}. |
episode_id | string | Spotify episode ID. |
episode_name | string | Episode title. |
episode_release_date | string | null | ISO date (YYYY-MM-DD). |
episode_duration_ms | integer | null | Episode length in milliseconds. |
guest_name | string | null | NLP-extracted guest name. Null when no guest could be identified (host-only row preserves episode metadata). |
guest_role | string | One of host, cohost, guest. |
confidence | number | Heuristic 0.0–1.0 — regex matches score 0.85–1.00, bare NER 0.55–0.75, host-fallback 0.0. |
episode_url | string | Public episode URL on open.spotify.com. |
scraped_at | string | ISO 8601 UTC timestamp at row creation. |
Example output
{"show_id": "2MAi0BvDc6GTFvKFPXnkCL","show_name": "Lex Fridman Podcast","episode_id": "5kF8w2Q9pNeLBxXxNH1mxJ","episode_name": "#412 — Demis Hassabis: AGI and the Future of AI","episode_release_date": "2026-04-30","episode_duration_ms": 9384210,"guest_name": "Demis Hassabis","guest_role": "guest","confidence": 0.92,"episode_url": "https://open.spotify.com/episode/5kF8w2Q9pNeLBxXxNH1mxJ","scraped_at": "2026-05-16T12:00:00Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.05 | One-off warm-up charge per run |
result | $0.005 | Per dataset item |
Example: 1 000 results at the rates above ≈ $5.05. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit. A typical PR firm researching 100 shows spends roughly $50 and gets back the hours their team would have spent doing it manually.
🚧 Limitations
Metadata-only: no audio, no transcripts. Guest extraction quality depends on episode description content — shows with sparse or non-English descriptions yield lower recall. Spotify catalog endpoints occasionally return 404 for regionally restricted episodes; these are logged and skipped. The spaCy en_core_web_sm NER model covers English well; multi-language support is planned for v2. Confidence scores are heuristic — filter on confidence >= 0.8 for high-precision use cases.
❓ FAQ
Why do I need a Spotify client_id and client_secret?
Spotify's Web API requires OAuth 2.0 — even for public catalog data. Creating a free app on the Spotify Developer Dashboard takes about 60 seconds: https://developer.spotify.com/dashboard. The Actor uses the client_credentials grant, which gives it read-only access to the public catalog and never touches any user account. Your credentials stay encrypted in Apify Secrets.
Are episode transcripts included?
No. Spotify does not expose transcripts via the public Web API; downloading episode audio would violate Spotify's Terms of Service. This Actor is metadata-only — episode title, description, release date, plus NLP-extracted guest names from the description text.
How accurate is the guest extraction?
The Actor first sweeps regex patterns common in podcast show notes (with guest X, interview with X, #412 — X, etc.) at confidence 0.85–1.00. It then falls back to spaCy's en_core_web_sm PERSON entity recognizer at confidence 0.55–0.75. Each row carries its confidence score — filter on confidence >= 0.8 for high-precision research.
Can I export a podcast guest list to CSV or Excel?
Yes. Once the run completes, go to Storage → Dataset in the Apify Console and click Export. JSON, CSV, and Excel are all supported. You can also fetch the full dataset via the Apify API with your token — useful for piping directly into Clay, Apollo, or your CRM.
What if a show is in a language other than English?
Regex patterns are English-only in v1. The spaCy NER fallback still surfaces PERSON entities but with reduced accuracy on non-English text. Multi-language NLP is planned for v2 — open an issue on the Actor's Issues tab with your language to upvote.
Why is one row sometimes emitted with guest_name: null?
If no guest can be extracted (solo episode or sparse description), the Actor still emits a single host-role row so the episode metadata is preserved in your dataset — useful for downstream joins. The confidence field is 0.0 in this case.
How large can a podcast guest database run get?
The hard cap is 200 episodes per show. A show with 200 episodes averaging 2–3 guests per episode yields up to 600 rows at roughly $3.00. For multi-show runs, results stream in real time so you can monitor progress and stop early if needed.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.