Welcome to the Jungle Jobs: JSON/RAG Scraper
Pricing
from $3.99 / 1,000 jobs
Welcome to the Jungle Jobs: JSON/RAG Scraper
Scrape Welcome to the Jungle job listings with salary and location data across 50+ countries. Raw JSON or RAG-ready chunks. $3.99 per 1K jobs.
Pricing
from $3.99 / 1,000 jobs
Rating
0.0
(0)
Developer
GetAScraper
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
WTTJ Jobs Scraper: Raw JSON or RAG-Ready Chunks
Extract job listings from Welcome to the Jungle with two output modes: raw JSON or RAG-ready chunks. Get salary ranges, company data, locations, contract type, and remote policy across France, UK, US, Germany, and 50+ countries. Drop raw output into spreadsheets or pipelines. Use RAG-ready chunks directly in Qdrant, Pinecone, Weaviate, LangChain, or LlamaIndex for AI-powered job matching and resume screening.
What does Welcome to the Jungle Scraper do?
This Actor queries WTTJ's public Algolia search API to extract structured job data. It supports two output modes:
-
Raw JSON - Full job listings with title, company, salary, locations, contract type, remote policy, summary, and sectors. Ready for spreadsheets, CRMs, or data warehouses.
-
RAG-Ready - Job descriptions tokenized into fixed-size chunks using tiktoken cl100k_base. Configurable chunk size (default 512 tokens) with overlap. Drop straight into any vector database for LLM-based job matching, resume screening, or market intelligence.
The scraper covers 10,000+ companies across Europe and the US. No login, no API key, no proxy required.
Why use Welcome to the Jungle Scraper for RAG?
- Dual output modes. Raw JSON for traditional pipelines, RAG-ready chunks for AI agents and vector databases. One Actor handles both workflows.
- RAG-ready chunks. Job descriptions pre-split for LLM ingestion. No LaTeX stripping, no HTML parsing, no custom chunking logic on your side. Works with Qdrant, Pinecone, Weaviate, pgvector, and Chroma.
- Framework-ready. Raw output drops into Google Sheets, BigQuery, n8n, or any CRM. RAG chunks drop into LangChain, LlamaIndex, Haystack, or custom LLM pipelines.
- Europe plus US coverage. France, UK, Germany, Spain, and 50+ countries in one query. Compare salary ranges across markets.
- Rich job metadata. Salary ranges (EUR, GBP, USD), remote policy, contract type, experience level, company size, and hiring velocity.
- No auth required. Uses WTTJ's public Algolia API. No cookies, no API key, no account.
How to use Welcome to the Jungle Scraper
- Open the Actor in Apify Console.
- Set a search query (e.g.,
data engineer,product manager,sales). - Choose output mode:
rawfor standard JSON,ragfor chunked text. - Optional: Add filters for country, city, contract type, remote policy.
- Set max results (default 100, max 10,000).
- Click Start. Download as JSON, CSV, or Excel.
Input
| Field | Type | Description |
|---|---|---|
query | string | Job title, skill, or keyword to search (required). |
locale | string | Language: en or fr. Default: en. |
countryCodes | array | Filter by country codes (e.g., FR, GB, US). |
locations | array | Filter by city or region (e.g., Paris, London). |
contractTypes | array | Filter by contract type (FULL_TIME, PART_TIME, INTERNSHIP, FREELANCE). |
remotePolicies | array | Filter by remote policy (full, partial, punctual, no). |
categories | array | Filter by job category (tech, data, sales, marketing). |
maxItems | integer | Maximum jobs to return. Default: 100. Max: 10,000. |
outputMode | string | raw (full JSON) or rag (chunked text). Default: raw. |
chunkSize | integer | Target tokens per chunk for RAG mode. Default: 512. |
chunkOverlap | integer | Token overlap between chunks for RAG mode. Default: 50. |
Output
Raw Mode Example
{"jobId": "2d4fbe25-352d-47a4-8280-bd6d4642cfb1","title": "Data analytics engineer","company": {"name": "Visian","slug": "visian","size": "50-249 employees","employeeCount": 200,"description": "Consulting et execution technique pour projets data."},"locations": [{ "city": "Courbevoie", "country": "France", "countryCode": "FR", "region": "Ile-de-France" }],"primaryLocation": "Courbevoie, France","contractType": "full_time","remotePolicy": "partial","salary": {"hasSalary": true,"min": 42000,"max": 55000,"yearlyMinimum": 42000,"currency": "EUR","period": "yearly"},"publishedAt": "2026-06-04T09:39:54Z","summary": "Rejoignez Visian, une societe de conseil specialisee en innovation...","sectors": ["Artificial Intelligence / Machine Learning", "IT / Digital"],"url": "https://www.welcometothejungle.com/en/jobs/data-engineer-visian-xxx","scrapedAt": "2026-06-06T10:00:00.000Z"}
RAG Mode Example
{"jobId": "2d4fbe25-352d-47a4-8280-bd6d4642cfb1","title": "Data analytics engineer","company": { "name": "Visian", "slug": "visian" },"primaryLocation": "Courbevoie, France","chunks": [{ "idx": 0, "text": "Job Title: Data analytics engineer\nCompany: Visian\nLocation: Courbevoie, France\n\nRejoignez Visian...", "tokens": 256 },{ "idx": 1, "text": "...SQL, Databricks, Git et Power BI...", "tokens": 128 }],"scrapedAt": "2026-06-06T10:00:00.000Z"}
Data Table
| Field | Description |
|---|---|
jobId | WTTJ unique job identifier. |
title | Job title as posted. |
company.name | Hiring company name. |
company.slug | Company URL slug. |
company.size | Company size category (e.g., "50-249 employees"). |
company.employeeCount | Total employees when available. |
locations | Array of location objects with city, country, countryCode, region. |
primaryLocation | Primary location as formatted string. |
contractType | Contract type enum (full_time, part_time, internship, freelance). |
remotePolicy | Remote work policy (full, partial, punctual, no, unknown). |
salary | Salary object with min, max, currency, period when disclosed. |
publishedAt | ISO date of job posting. |
summary | Short job summary or description. |
sectors | Array of job category sectors. |
url | Canonical job URL on welcometothejungle.com. |
chunks | RAG mode only: array of { idx, text, tokens } chunks. |
scrapedAt | ISO timestamp of scrape. |
Pricing
$3.99 per 1,000 job listings (pay-per-result).
| Volume | Estimated cost |
|---|---|
| 100 jobs | $0.40 |
| 1,000 jobs | $3.99 |
| 10,000 jobs | $39.90 |
No subscription. No minimum. You pay only for successful records.
Tips
- Use RAG mode for AI agents. Feed chunks directly into a vector database for job matching, resume screening, or market intelligence. Compatible with LangChain, LlamaIndex, Qdrant, Pinecone, and Weaviate.
- Filter by country first. WTTJ has different job densities per market. France and UK have the most listings.
- Combine with salary filters. Many jobs disclose salary ranges. Filter
salary.hasSalary: truedownstream for benchmarking. - Schedule weekly for market tracking. Compare job counts and salary ranges over time to detect hiring trend shifts.
FAQ, Disclaimers, and Support
Is scraping Welcome to the Jungle legal? This Actor uses WTTJ's public Algolia search API, which requires no authentication. It collects publicly visible job listing data for research, market analysis, and recruiting intelligence. Users are responsible for ensuring their use complies with applicable laws and WTTJ's terms of service.
Why are some salary fields empty? Many jobs on WTTJ do not disclose salary. The Actor only saves salary values that the source exposes. Filter salary.hasSalary: true downstream if salary data is critical.
What is RAG mode? RAG (Retrieval-Augmented Generation) mode splits job descriptions into fixed-token chunks using tiktoken cl100k_base encoding. These chunks are ready to embed and store in a vector database for LLM-based job matching or resume screening.
How is this different from other WTTJ scrapers? This is the only WTTJ scraper with dual output modes: raw JSON for traditional pipelines and RAG-ready chunks for AI agents. No other scraper offers tiktoken-based chunking optimized for LLM ingestion.
Support: Open an issue on the Actor's Issues tab in Apify Console for bug reports, feature requests, or custom-run quotes.
Built with Apify + Crawlee + TypeScript. Part of the actorstack portfolio.