Biospace Jobs Scraper
Pricing
Pay per usage
Biospace Jobs Scraper
This scraper finds biotech and pharma hiring managers who post on BioSpace.com — the specialist board most lead-gen tools ignore because it doesn't show up in Google Jobs aggregation.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Saad Belcaid
Actor stats
2
Bookmarked
5
Total users
4
Monthly active users
7 days ago
Last modified
Categories
Share
BioSpace Jobs — The Niche Biotech Board Big Lead-Gen Misses
Built for Jean (SSM) by Saad Belcaid.
This scraper finds biotech and pharma hiring managers who post on BioSpace.com — the specialist board most lead-gen tools ignore because it doesn't show up in Google Jobs aggregation.
If you sell to biotech (CRO services, biostats, regulatory consulting, recruitment, lab services, software, equipment), the companies hiring on BioSpace are the ones nobody else is calling.
The dumb-simple version
Big lead-gen tools (the ones every SDR spams) feed off Google Jobs. Google Jobs aggregates Indeed, LinkedIn, Glassdoor — but not specialist boards like BioSpace. So when a small biotech posts only on BioSpace, every lead-gen agency in your inbox misses them. You don't.
This scraper:
- Reads BioSpace's job board (cookie-paginated, no scraping fragility — Madgex platform exposes job IDs cleanly)
- Pulls the structured
JobPostingdata each listing publishes (title, company, location, description, dates) - Tags every listing with three classifiers — therapeutic area, drug modality, job function
- Calculates hiring urgency from how long the role has been open
- Optionally enriches each company's size via Apollo's free people-search endpoint
- Outputs one row per listing with a one-line signal you can paste into a CRM
UK and US listings. No browser. No proxies. Just JSON-LD.
Why hiring = buying
A biotech hiring a VP of Clinical Operations is buying CRO services within 90 days. A biotech hiring a Head of Regulatory is about to file an IND or BLA. A biotech hiring Head of CMC is scaling manufacturing — they need contract manufacturers, quality consultants, process equipment.
Every job posting is a demand signal with a 60-180 day decision window. Catch them while they're staffing up, before the role closes and the budget goes elsewhere.
Read this before you run anything
Don't scrape it all in one go. Run it daily.
Yes, you can set maxJobs: 5000 and pull all 3,000+ active BioSpace listings in one run. Don't. Here's why:
A biotech hiring manager who posted a role today picks up the phone. One who posted 30 days ago has either filled the role, given up, or already heard from every other agency. The job's age is the dial.
| Day a role was posted | What's happening |
|---|---|
Day 0–3 (fresh) | Hiring manager is excited. Calling now beats every competitor by 2 weeks. |
Day 4–13 (normal) | Normal pipeline cadence. |
Day 14–29 (high) | Recruiter is starting to sweat. Now they need help. |
Day 30+ (critical) | They CAN'T fill it. They will pay anyone who can. Hot. |
| Day 60+ | Either filled or dead. Don't waste outreach. |
Math: a daily drip is 10× more lead value than a quarterly bulk dump.
200 fresh leads/day × 30 days = 6,000 leads, every single one in its hottest 2-week window. One bulk dump every quarter = 3,000 leads with mixed ages, half of them already cold. Daily wins.
There's exactly one reason to do a big bulk run: the very first time, to map the universe once and seed your CRM. After that, daily.
Max-value playbook
THE main loop — daily drip (set this up first)
maxJobs: 200maxDaysOld: 7politeDelayMs: 300# leave filters empty for full coverage; narrow in the dataset
Schedule it: cron 0 9 * * 1-5 (9am UTC, Mon–Fri). Apify will run it every weekday morning. Each run pushes 100–250 fresh-this-week biotech listings into your dataset. Your CRM imports them. You call them while they're hot.
That's it. That's the system.
Triage in your CRM (or directly in the dataset)
| If you sell... | Filter rows by... |
|---|---|
| CRO / clinical trial services | function_bucket = clinical_operations AND hiring_urgency ∈ {high, critical} |
| Regulatory consulting | function_bucket = regulatory_affairs |
| Biostats / data services | function_bucket ∈ {biostatistics, data_bioinformatics} |
| Manufacturing / CDMO | function_bucket = manufacturing_cmc |
| Recruitment / executive search | hiring_urgency = critical (open 30+ days = stuck = will pay) |
| Lab equipment / reagents | function_bucket = research AND modality ∈ {gene_therapy, cell_therapy, mrna_lnp} |
| Tax / legal / corp services | function_bucket ∈ {finance_legal, business_development} |
Optional — one-time bulk seed
Only do this once, on day zero, to backfill your CRM with the existing universe. Then never again — the daily loop takes over.
maxJobs: 5000maxDaysOld: 30politeDelayMs: 300
Wait ~45 min. Expect 2,500–3,500 rows. Import to CRM. Then enable the daily schedule and forget this exists.
Filter recipes (copy-paste)
"I want oncology biotech hiring this week"
therapeuticAreas: ["oncology"]maxDaysOld: 7maxJobs: 300
"I want gene/cell therapy companies hiring at any function"
modalities: ["gene_therapy", "cell_therapy", "mrna_lnp"]maxJobs: 500
"I want struggling-to-fill clinical ops roles (sell CROs)"
functionBuckets: ["clinical_operations"]maxDaysOld: 60# After: filter dataset by hiring_urgency = critical (30+ days open)
"I want UK biotech only"
location: "United Kingdom"maxJobs: 500
"I want commercial/BD roles at antibody companies (deal flow)"
modalities: ["antibody_biologic"]functionBuckets: ["business_development", "commercial"]
"I want CMC / manufacturing struggles (sell to operations)"
functionBuckets: ["manufacturing_cmc", "quality"]maxDaysOld: 45
What each row tells you
| Field | Example |
|---|---|
job_title | VP, Clinical Operations |
company_name | Acme Therapeutics |
company_size | 51-200 (with Apollo key) |
company_domain | acmetx.com |
therapeutic_area | oncology |
modality | antibody_biologic |
function_bucket | clinical_operations |
location | Cambridge, Massachusetts, US |
locality, region, country, postal_code | breakdown for filtering |
remote | true / false |
employment_type | FULL_TIME |
salary_min, salary_max, salary_currency | when listed |
date_posted, valid_through | ISO dates |
days_listed | 14 |
hiring_urgency | fresh / normal / high / critical |
description_text | full plain-text job description |
apply_url | direct link to apply |
signal | "Acme Therapeutics (antibody biologic, oncology) hiring clinical operations (VP, Clinical Operations) — open 14 days, struggling to fill" |
scraped_at | ISO timestamp |
Therapeutic areas (the dial for who they treat)
| Label | Catches |
|---|---|
oncology | cancer, tumor, leukemia, lymphoma, melanoma, metastatic |
immunology | autoimmune, lupus, allergy, asthma, atopic, psoriasis |
neurology_cns | Alzheimer, Parkinson, ALS, MS, depression, epilepsy |
cardiovascular | heart failure, coronary, thrombosis, hypertension |
metabolic | diabetes, obesity, NASH, fatty liver, endocrine |
rare_disease | orphan drug, ultra-rare |
ophthalmology | retina, macular, glaucoma, vision |
gastroenterology | Crohn, ulcerative colitis, IBD |
respiratory | COPD, pulmonary |
dermatology | skin disease |
infectious | antiviral, antibiotic, HIV, hepatitis, COVID |
womens_health | gynecology, fertility, menopause |
pediatric | pediatric, neonatal |
hematology | sickle cell, hemophilia, thalassemia |
nephrology | renal, kidney, dialysis |
urology | urology |
Modalities (the dial for what they make)
| Label | Catches |
|---|---|
mrna_lnp | mRNA, lipid nanoparticle |
gene_therapy | AAV, CRISPR, gene editing, viral vector |
cell_therapy | CAR-T, CAR-NK, allogeneic, autologous, TCR-T |
antibody_biologic | mAb, bispecific, ADC, biosimilar |
vaccine | vaccine, immunization, adjuvant |
small_molecule | small molecule, kinase inhibitor, PROTAC |
oligonucleotide | antisense, siRNA, ASO |
medical_device | wearable, implantable |
diagnostic | IVD, companion diagnostic, liquid biopsy |
digital_health | digital therapeutic, SaMD |
Function buckets (the dial for who you sell to)
| Label | Catches |
|---|---|
clinical_operations | clinical trial, CRA, study manager |
regulatory_affairs | RA, FDA submission, IND, BLA, NDA |
medical_affairs | medical affairs, MSL |
biostatistics | biostat, SAS programmer |
pharmacovigilance | PV, drug safety |
quality | QA, QC, GxP, GMP |
manufacturing_cmc | CMC, process development, fermentation |
research | research scientist, PI, discovery, preclinical |
business_development | BD, licensing, alliance management |
commercial | sales, marketing, market access, brand manager |
finance_legal | CFO, general counsel, IP counsel |
data_bioinformatics | bioinformatics, computational biology, ML |
hr_talent | talent acquisition, HR, recruiter |
Hiring urgency
| Urgency | Days listed | What it means |
|---|---|---|
fresh | 0–2 | Just posted — ride the energy |
normal | 3–13 | Active search |
high | 14–29 | Recruiter struggling |
critical | 30+ | They CAN'T fill this role — call them today |
critical rows are the ones to call first. A 30-day-open VP-level biotech role means the hiring manager is in pain. They will pick up your phone.
Input options
| Field | Default | Description |
|---|---|---|
searchKeywords | "" | Free-text BioSpace search. e.g. "oncology", "CMC" |
location | "" | Location filter. e.g. "Cambridge, MA" |
maxJobs | 200 | Cap on detail pages fetched per run |
maxDaysOld | 14 | Drop listings older than N days. 0 = no age filter |
therapeuticAreas | [] | Restrict to e.g. ["oncology","immunology"] |
modalities | [] | Restrict to e.g. ["gene_therapy","mrna_lnp"] |
functionBuckets | [] | Restrict to e.g. ["clinical_operations"] |
apolloApiKey | (empty) | Optional. Free Apollo key for company size enrichment |
politeDelayMs | 400 | Throttle between detail fetches |
How it works under the hood
- Search: hits
https://jobs.biospace.com/jobs/?Keywords=X&Page=N. Each response sets a cookieJobSearchResultIds=ID1|ID2|…|ID20with the IDs on that page. We read the cookie — no HTML scraping needed for the listing page. - Detail: each
/job/{ID}page embeds a<script type="application/ld+json">block with the full schema.orgJobPosting(title, company, location, description, dates, employment type, salary). We parse that — no DOM scrapes. - Classification: deterministic keyword match on (title + description). Each rule's keyword list is curated. No AI in the hot path.
- Enrichment: if you pass an Apollo key, we hit
mixed_people/search(the free Apollo endpoint, 0 credits) and readorganization.estimated_num_employees. Per-company cache in memory, so no duplicate calls. - Polite throttling: 400ms default delay between detail fetches. Tunable.
Apollo enrichment (optional but worth it)
Add a free Apollo key to enrich every row with company_size, company_domain, company_employee_count. The endpoint we use (mixed_people/search) costs 0 credits — Apollo's free tier is more than enough for tens of thousands of company lookups.
Sign up at apollo.io → Settings → API Keys → paste in the input.
Without it: every row still has therapeutic area, modality, function, urgency, and the signal sentence. Apollo just adds size and domain.
Costs
- BioSpace: free (public job board, polite scraping)
- Apollo (optional): free tier is enough
- Apify compute: ~5–15 min per 200-job run × 1024 MB ≈ trivial CU
Connector OS Station integration
Pipe the dataset into Station as the demand side. Each row's signal field plugs straight into the I Layer for evaluation against your supply network (CROs, regulatory consultants, biostats firms, etc.).
Flow: scrape → dataset → paste dataset ID into Station → match against your supply → scored introductions.
Built by Saad Belcaid for Jean's SSM workflow. Data sourced from BioSpace.com (public listings) and Apollo's free People Search endpoint. Polite, no proxies, no DOM brittleness.