SEPE Spain Job Scraper – Ofertas de Empleo ES avatar

SEPE Spain Job Scraper – Ofertas de Empleo ES

Under maintenance

Pricing

from $1.00 / 1,000 results

Go to Apify Store
SEPE Spain Job Scraper – Ofertas de Empleo ES

SEPE Spain Job Scraper – Ofertas de Empleo ES

Under maintenance

Scrapes job offers from SEPE Spain (sepe.es) with stealth Camoufox, proxy rotation, skills filtering (regex + TF-IDF ML), deduplication, change-detection alerts, and Prometheus metrics export. Supports province/CCAA filtering with Vizcaya/Bizkaia focus.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

David Cortes

David Cortes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

SEPE Spain Job Scraper – Ofertas de Empleo ES Pro

The #1 Apify Actor for scraping SEPE Spain job offers with full stealth, smart skills filtering, and K8s-ready Prometheus metrics.

  • Anti-bot max: Camoufox (Firefox stealth) + Apify residential proxies + random delays + cookie handling
  • Smart skills filter: regex + scikit-learn TF-IDF cosine similarity (catches "contenedores" → Docker, "orquestación" → Kubernetes)
  • Change-detection alerts: compares every run against the previous one → new / changed / removed offers
  • Deduplication: SHA-256 hash per offer, persisted across runs
  • K8s-ready: Prometheus metrics exported to KV store (scrapeable by any Prometheus server)
  • Province focus: Vizcaya / Bizkaia by default, all 52 Spanish provinces supported

Output Example

{
"url": "https://www.sepe.es/HomeSepe/Personas/encontrar-empleo/...",
"titulo_oferta": "DevOps Engineer – Kubernetes / AWS",
"empresa": "Tecnología Vasca S.L.",
"provincia": "Vizcaya",
"salario": "35.000 € - 50.000 €/año",
"skills_requeridas": ["Kubernetes", "Docker", "AWS", "Terraform", "Linux", "CI/CD"],
"fecha_publicacion": "2026-04-15",
"enlace_aplicar": "https://www.sepe.es/HomeSepe/Personas/encontrar-empleo/.../solicitar"
}

Input Schema

FieldTypeDefaultDescription
start_urlsarraySEPE national pagesExtra entry-point URLs
provinciasarray["Vizcaya","Bizkaia"]Province/CCAA filter (52 provinces supported)
skillsarray["Kubernetes","Docker","Python"]Skills to filter by (regex + ML)
max_pagesint50Max listing pages per entry-point
use_ml_skillsbooltrueEnable TF-IDF ML skill matching
use_proxybooltrueUse Apify residential proxy
proxy_groupsarray["RESIDENTIAL"]Proxy groups
proxy_countrystring"ES"Proxy country (ES = Spanish IP)
headlessbooltrueHeadless browser mode
min_delayfloat2.0Min delay between requests (s)
max_delayfloat5.0Max delay between requests (s)

Quick Start

Run locally

# 1. Clone / enter the project
cd sepe-empleo-es-pro
# 2. Create virtual env and install dependencies
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# 3. Install Camoufox browser binaries
python -m camoufox fetch
# 4. Put test input in place
cp test_input.json storage/key_value_stores/default/INPUT.json
# 5. Run
python -m my_actor

Run on Apify

# Login (one-time)
apify login
# Push and deploy
apify push
# Run with test input
apify run --input-file test_input.json

Run via API

curl -X POST \
"https://api.apify.com/v2/acts/YOUR_USERNAME~sepe-empleo-es-pro/runs" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{
"provincias": ["Vizcaya", "Bizkaia"],
"skills": ["Kubernetes", "Python", "DevOps"],
"max_pages": 50
}'

Architecture

my_actor/
├── main.py # Actor entry point, crawler setup, post-processing
├── routes.py # Crawlee router: NAV / LIST / DETAIL handlers + XHR interception
├── extractors.py # Multi-selector SEPE data extraction with JSON-LD + regex fallbacks
├── skills_matcher.py # Regex + TF-IDF scikit-learn skills detection (30+ tech skills)
├── dedup.py # SHA-256 offer deduplication, cross-run persistence
├── alerts.py # Change-detection: new / changed / removed offers diff
├── metrics.py # Prometheus metrics (counters, gauges, histograms)
└── config.py # Province codes (52), SEPE URLs, CSS selectors, rate-limit settings

Anti-bot Stack

LayerTechnologyConfig
Browser fingerprintCamoufox (Firefox stealth)os=windows/macos, locale=es-ES, geoip=true
IP rotationApify Residential proxiescountryCode=ES (Spanish IPs)
TimingRandom delays2–5 s per request (configurable)
Detection evasionCookie auto-acceptHandles SEPE's cookie banner
CAPTCHA detectionText/title heuristicsAuto-retry on fresh session
Header generationCamoufox built-inRealistic browser headers
ScrollingJS scroll simulationTriggers lazy-loaded content
XHR interceptionPlaywright response hookCatches SEPE's JSON API calls

Skills Matching Pipeline

Input text (job description)
┌───────────────────┐ ┌─────────────────────────┐
│ Regex matcher │ │ TF-IDF cosine similarity │
(30+ skills, │ + │ (scikit-learn, threshold │
50+ aliases) │ │ 0.25, ngram 1-2)
└───────────────────┘ └─────────────────────────┘
│ │
└───────────┬───────────────┘
Canonical skill names
["Kubernetes","Docker","Python"]

Prometheus Metrics (K8s-ready)

Metrics are exported in standard Prometheus text format to the Actor's Key-Value Store under the key prometheus_metrics. Retrieve them via:

curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/prometheus_metrics" \
-H "Authorization: Bearer YOUR_TOKEN"

Available metrics

MetricTypeDescription
sepe_offers_scraped_totalCounterTotal offers stored
sepe_offers_new_totalCounterNew vs previous run
sepe_offers_changed_totalCounterChanged offers
sepe_offers_removed_totalCounterRemoved offers
sepe_requests_total{status}CounterRequests by status (success/failed/retried)
sepe_pages_skipped_duplicates_totalCounterDedup skips
sepe_skills_matched_total{skill}CounterMatches per skill
sepe_offers_in_datasetGaugeCurrent dataset size
sepe_dedup_ratioGaugeDuplicate ratio (0–1)
sepe_pages_crawled_totalGaugePages visited
sepe_proxy_errors_totalGaugeProxy/network errors
sepe_scrape_duration_secondsHistogramTotal run duration
sepe_page_load_duration_secondsHistogramPer-page load time

Kubernetes scraping example

# prometheus-scrape-config.yaml
- job_name: sepe_scraper
metrics_path: /v2/key-value-stores/STORE_ID/records/prometheus_metrics
scheme: https
bearer_token: YOUR_APIFY_TOKEN
static_configs:
- targets: [api.apify.com]

Change Detection Alerts

After each run an alerts_report is saved to the Key-Value Store and also pushed to the dataset as a record with _record_type: "alerts_summary":

{
"_record_type": "alerts_summary",
"generated_at": "2026-04-15T10:30:00Z",
"stats": {
"new_count": 47,
"changed_count": 12,
"removed_count": 3,
"total_current": 1024,
"total_previous": 980
},
"sample_new_offers": [ ... ],
"changed_offers": [ {"offer": {...}, "changed_fields": ["salario"]} ],
"removed_offers": [ ... ]
}

Integrate with Zapier / Make / Slack via the Apify webhook → trigger on run completion.


Province Codes Reference

All 52 Spanish provinces are supported. Examples:

InputProvinceCode
"Vizcaya" or "Bizkaia"Vizcaya / Bizkaia48
"Madrid"Madrid28
"Barcelona"Barcelona08
"Guipúzcoa" or "Gipuzkoa"Guipúzcoa20
"Valencia" or "València"Valencia46
"Sevilla"Sevilla41

  • Only public, freely accessible data is scraped
  • Rate-limited to ≤ 1 request/second (configurable)
  • Respects SEPE's robots.txt structure
  • No login, no personal data, no GDPR-protected content
  • Data is from sepe.es which is a Spanish public institution

Troubleshooting

ProblemSolution
0 offers returnedSEPE may have changed page structure; check Actor logs for CSS selector misses
CAPTCHA detectedEnable Apify Residential proxy (use_proxy: true, proxy_groups: ["RESIDENTIAL"])
Slow runsIncrease max_concurrency in main.py or reduce max_delay
Missing skillsAdd aliases to SKILLS_TAXONOMY in config.py
Stale dedupDelete previous_offers_hashes and previous_offers_snapshot from KV store

Deploy to Apify

apify login
apify push

Deploy to Apify