Pricing

Pay per usage

Biospace Jobs Scraper

This scraper finds biotech and pharma hiring managers who post on BioSpace.com — the specialist board most lead-gen tools ignore because it doesn't show up in Google Jobs aggregation.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Saad Belcaid

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

BioSpace Jobs — The Niche Biotech Board Big Lead-Gen Misses

Built for Jean (SSM) by Saad Belcaid.

This scraper finds biotech and pharma hiring managers who post on BioSpace.com — the specialist board most lead-gen tools ignore because it doesn't show up in Google Jobs aggregation.

If you sell to biotech (CRO services, biostats, regulatory consulting, recruitment, lab services, software, equipment), the companies hiring on BioSpace are the ones nobody else is calling.

The dumb-simple version

Big lead-gen tools (the ones every SDR spams) feed off Google Jobs. Google Jobs aggregates Indeed, LinkedIn, Glassdoor — but not specialist boards like BioSpace. So when a small biotech posts only on BioSpace, every lead-gen agency in your inbox misses them. You don't.

This scraper:

Reads BioSpace's job board (cookie-paginated, no scraping fragility — Madgex platform exposes job IDs cleanly)
Pulls the structured JobPosting data each listing publishes (title, company, location, description, dates)
Tags every listing with three classifiers — therapeutic area, drug modality, job function
Calculates hiring urgency from how long the role has been open
Optionally enriches each company's size via Apollo's free people-search endpoint
Outputs one row per listing with a one-line signal you can paste into a CRM

UK and US listings. No browser. No proxies. Just JSON-LD.

Why hiring = buying

A biotech hiring a VP of Clinical Operations is buying CRO services within 90 days. A biotech hiring a Head of Regulatory is about to file an IND or BLA. A biotech hiring Head of CMC is scaling manufacturing — they need contract manufacturers, quality consultants, process equipment.

Every job posting is a demand signal with a 60-180 day decision window. Catch them while they're staffing up, before the role closes and the budget goes elsewhere.

Read this before you run anything

Don't scrape it all in one go. Run it daily.

Yes, you can set maxJobs: 5000 and pull all 3,000+ active BioSpace listings in one run. Don't. Here's why:

A biotech hiring manager who posted a role today picks up the phone. One who posted 30 days ago has either filled the role, given up, or already heard from every other agency. The job's age is the dial.

Day a role was posted	What's happening
Day 0–3 (`fresh`)	Hiring manager is excited. Calling now beats every competitor by 2 weeks.
Day 4–13 (`normal`)	Normal pipeline cadence.
Day 14–29 (`high`)	Recruiter is starting to sweat. Now they need help.
Day 30+ (`critical`)	They CAN'T fill it. They will pay anyone who can. Hot.
Day 60+	Either filled or dead. Don't waste outreach.

Math: a daily drip is 10× more lead value than a quarterly bulk dump.

200 fresh leads/day × 30 days = 6,000 leads, every single one in its hottest 2-week window. One bulk dump every quarter = 3,000 leads with mixed ages, half of them already cold. Daily wins.

There's exactly one reason to do a big bulk run: the very first time, to map the universe once and seed your CRM. After that, daily.

Max-value playbook

THE main loop — daily drip (set this up first)

maxJobs: 200
maxDaysOld: 7
politeDelayMs: 300
# leave filters empty for full coverage; narrow in the dataset

Schedule it: cron 0 9 * * 1-5 (9am UTC, Mon–Fri). Apify will run it every weekday morning. Each run pushes 100–250 fresh-this-week biotech listings into your dataset. Your CRM imports them. You call them while they're hot.

That's it. That's the system.

Triage in your CRM (or directly in the dataset)

If you sell...	Filter rows by...
CRO / clinical trial services	`function_bucket = clinical_operations` AND `hiring_urgency ∈ {high, critical}`
Regulatory consulting	`function_bucket = regulatory_affairs`
Biostats / data services	`function_bucket ∈ {biostatistics, data_bioinformatics}`
Manufacturing / CDMO	`function_bucket = manufacturing_cmc`
Recruitment / executive search	`hiring_urgency = critical` (open 30+ days = stuck = will pay)
Lab equipment / reagents	`function_bucket = research` AND `modality ∈ {gene_therapy, cell_therapy, mrna_lnp}`
Tax / legal / corp services	`function_bucket ∈ {finance_legal, business_development}`

Optional — one-time bulk seed

Only do this once, on day zero, to backfill your CRM with the existing universe. Then never again — the daily loop takes over.

maxJobs: 5000
maxDaysOld: 30
politeDelayMs: 300

Wait ~45 min. Expect 2,500–3,500 rows. Import to CRM. Then enable the daily schedule and forget this exists.

Filter recipes (copy-paste)

"I want oncology biotech hiring this week"

therapeuticAreas: ["oncology"]
maxDaysOld: 7
maxJobs: 300

"I want gene/cell therapy companies hiring at any function"

modalities: ["gene_therapy", "cell_therapy", "mrna_lnp"]
maxJobs: 500

"I want struggling-to-fill clinical ops roles (sell CROs)"

functionBuckets: ["clinical_operations"]
maxDaysOld: 60
# After: filter dataset by hiring_urgency = critical (30+ days open)

"I want UK biotech only"

location: "United Kingdom"
maxJobs: 500

"I want commercial/BD roles at antibody companies (deal flow)"

modalities: ["antibody_biologic"]
functionBuckets: ["business_development", "commercial"]

"I want CMC / manufacturing struggles (sell to operations)"

functionBuckets: ["manufacturing_cmc", "quality"]
maxDaysOld: 45

What each row tells you

Field	Example
`job_title`	VP, Clinical Operations
`company_name`	Acme Therapeutics
`company_size`	51-200 (with Apollo key)
`company_domain`	acmetx.com
`therapeutic_area`	oncology
`modality`	antibody_biologic
`function_bucket`	clinical_operations
`location`	Cambridge, Massachusetts, US
`locality`, `region`, `country`, `postal_code`	breakdown for filtering
`remote`	true / false
`employment_type`	FULL_TIME
`salary_min`, `salary_max`, `salary_currency`	when listed
`date_posted`, `valid_through`	ISO dates
`days_listed`	14
`hiring_urgency`	fresh / normal / high / critical
`description_text`	full plain-text job description
`apply_url`	direct link to apply
`signal`	"Acme Therapeutics (antibody biologic, oncology) hiring clinical operations (VP, Clinical Operations) — open 14 days, struggling to fill"
`scraped_at`	ISO timestamp

Therapeutic areas (the dial for who they treat)

Label	Catches
`oncology`	cancer, tumor, leukemia, lymphoma, melanoma, metastatic
`immunology`	autoimmune, lupus, allergy, asthma, atopic, psoriasis
`neurology_cns`	Alzheimer, Parkinson, ALS, MS, depression, epilepsy
`cardiovascular`	heart failure, coronary, thrombosis, hypertension
`metabolic`	diabetes, obesity, NASH, fatty liver, endocrine
`rare_disease`	orphan drug, ultra-rare
`ophthalmology`	retina, macular, glaucoma, vision
`gastroenterology`	Crohn, ulcerative colitis, IBD
`respiratory`	COPD, pulmonary
`dermatology`	skin disease
`infectious`	antiviral, antibiotic, HIV, hepatitis, COVID
`womens_health`	gynecology, fertility, menopause
`pediatric`	pediatric, neonatal
`hematology`	sickle cell, hemophilia, thalassemia
`nephrology`	renal, kidney, dialysis
`urology`	urology

Modalities (the dial for what they make)

Label	Catches
`mrna_lnp`	mRNA, lipid nanoparticle
`gene_therapy`	AAV, CRISPR, gene editing, viral vector
`cell_therapy`	CAR-T, CAR-NK, allogeneic, autologous, TCR-T
`antibody_biologic`	mAb, bispecific, ADC, biosimilar
`vaccine`	vaccine, immunization, adjuvant
`small_molecule`	small molecule, kinase inhibitor, PROTAC
`oligonucleotide`	antisense, siRNA, ASO
`medical_device`	wearable, implantable
`diagnostic`	IVD, companion diagnostic, liquid biopsy
`digital_health`	digital therapeutic, SaMD

Function buckets (the dial for who you sell to)

Label	Catches
`clinical_operations`	clinical trial, CRA, study manager
`regulatory_affairs`	RA, FDA submission, IND, BLA, NDA
`medical_affairs`	medical affairs, MSL
`biostatistics`	biostat, SAS programmer
`pharmacovigilance`	PV, drug safety
`quality`	QA, QC, GxP, GMP
`manufacturing_cmc`	CMC, process development, fermentation
`research`	research scientist, PI, discovery, preclinical
`business_development`	BD, licensing, alliance management
`commercial`	sales, marketing, market access, brand manager
`finance_legal`	CFO, general counsel, IP counsel
`data_bioinformatics`	bioinformatics, computational biology, ML
`hr_talent`	talent acquisition, HR, recruiter

Hiring urgency

Urgency	Days listed	What it means
`fresh`	0–2	Just posted — ride the energy
`normal`	3–13	Active search
`high`	14–29	Recruiter struggling
`critical`	30+	They CAN'T fill this role — call them today

critical rows are the ones to call first. A 30-day-open VP-level biotech role means the hiring manager is in pain. They will pick up your phone.

Input options

Field	Default	Description
`searchKeywords`	`""`	Free-text BioSpace search. e.g. "oncology", "CMC"
`location`	`""`	Location filter. e.g. "Cambridge, MA"
`maxJobs`	`200`	Cap on detail pages fetched per run
`maxDaysOld`	`14`	Drop listings older than N days. 0 = no age filter
`therapeuticAreas`	`[]`	Restrict to e.g. `["oncology","immunology"]`
`modalities`	`[]`	Restrict to e.g. `["gene_therapy","mrna_lnp"]`
`functionBuckets`	`[]`	Restrict to e.g. `["clinical_operations"]`
`apolloApiKey`	(empty)	Optional. Free Apollo key for company size enrichment
`politeDelayMs`	`400`	Throttle between detail fetches

How it works under the hood

Search: hits https://jobs.biospace.com/jobs/?Keywords=X&Page=N. Each response sets a cookie JobSearchResultIds=ID1|ID2|…|ID20 with the IDs on that page. We read the cookie — no HTML scraping needed for the listing page.
Detail: each /job/{ID} page embeds a <script type="application/ld+json"> block with the full schema.org JobPosting (title, company, location, description, dates, employment type, salary). We parse that — no DOM scrapes.
Classification: deterministic keyword match on (title + description). Each rule's keyword list is curated. No AI in the hot path.
Enrichment: if you pass an Apollo key, we hit mixed_people/search (the free Apollo endpoint, 0 credits) and read organization.estimated_num_employees. Per-company cache in memory, so no duplicate calls.
Polite throttling: 400ms default delay between detail fetches. Tunable.

Apollo enrichment (optional but worth it)

Add a free Apollo key to enrich every row with company_size, company_domain, company_employee_count. The endpoint we use (mixed_people/search) costs 0 credits — Apollo's free tier is more than enough for tens of thousands of company lookups.

Without it: every row still has therapeutic area, modality, function, urgency, and the signal sentence. Apollo just adds size and domain.

Costs

BioSpace: free (public job board, polite scraping)
Apollo (optional): free tier is enough
Apify compute: ~5–15 min per 200-job run × 1024 MB ≈ trivial CU

Connector OS Station integration

Pipe the dataset into Station as the demand side. Each row's signal field plugs straight into the I Layer for evaluation against your supply network (CROs, regulatory consultants, biostats firms, etc.).

Flow: scrape → dataset → paste dataset ID into Station → match against your supply → scored introductions.

Built by Saad Belcaid for Jean's SSM workflow. Data sourced from BioSpace.com (public listings) and Apollo's free People Search endpoint. Polite, no proxies, no DOM brittleness.

BioSpace Jobs Scraper

automation-lab/biospace-jobs-scraper

Extract BioSpace job listings, companies, locations, summaries, descriptions, apply links, and posting metadata from public BioSpace pages.

Stas Persiianenko

Google Jobs Scraper

thescrappa/google-jobs-scraper

Scrappa

Fierce Biotech News Scraper

parseforge/fiercebiotech-news-scraper

Pull articles from Fierce Biotech across main, research, regulatory, clinical, deals, and R and D feeds. Each record carries imageUrl, title, author, publish date, category, summary, and url. Great for biotech market intelligence, deal tracking, and content aggregation in pharma research.