Welcome to the Jungle Jobs: JSON/RAG Scraper avatar

Welcome to the Jungle Jobs: JSON/RAG Scraper

Pricing

from $3.99 / 1,000 jobs

Go to Apify Store
Welcome to the Jungle Jobs: JSON/RAG Scraper

Welcome to the Jungle Jobs: JSON/RAG Scraper

Scrape Welcome to the Jungle job listings with salary and location data across 50+ countries. Raw JSON or RAG-ready chunks. $3.99 per 1K jobs.

Pricing

from $3.99 / 1,000 jobs

Rating

0.0

(0)

Developer

GetAScraper

GetAScraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

WTTJ Jobs Scraper: Raw JSON or RAG-Ready Chunks

Extract job listings from Welcome to the Jungle with two output modes: raw JSON or RAG-ready chunks. Get salary ranges, company data, locations, contract type, and remote policy across France, UK, US, Germany, and 50+ countries. Drop raw output into spreadsheets or pipelines. Use RAG-ready chunks directly in Qdrant, Pinecone, Weaviate, LangChain, or LlamaIndex for AI-powered job matching and resume screening.

What does Welcome to the Jungle Scraper do?

This Actor queries WTTJ's public Algolia search API to extract structured job data. It supports two output modes:

  1. Raw JSON - Full job listings with title, company, salary, locations, contract type, remote policy, summary, and sectors. Ready for spreadsheets, CRMs, or data warehouses.

  2. RAG-Ready - Job descriptions tokenized into fixed-size chunks using tiktoken cl100k_base. Configurable chunk size (default 512 tokens) with overlap. Drop straight into any vector database for LLM-based job matching, resume screening, or market intelligence.

The scraper covers 10,000+ companies across Europe and the US. No login, no API key, no proxy required.

Why use Welcome to the Jungle Scraper for RAG?

  • Dual output modes. Raw JSON for traditional pipelines, RAG-ready chunks for AI agents and vector databases. One Actor handles both workflows.
  • RAG-ready chunks. Job descriptions pre-split for LLM ingestion. No LaTeX stripping, no HTML parsing, no custom chunking logic on your side. Works with Qdrant, Pinecone, Weaviate, pgvector, and Chroma.
  • Framework-ready. Raw output drops into Google Sheets, BigQuery, n8n, or any CRM. RAG chunks drop into LangChain, LlamaIndex, Haystack, or custom LLM pipelines.
  • Europe plus US coverage. France, UK, Germany, Spain, and 50+ countries in one query. Compare salary ranges across markets.
  • Rich job metadata. Salary ranges (EUR, GBP, USD), remote policy, contract type, experience level, company size, and hiring velocity.
  • No auth required. Uses WTTJ's public Algolia API. No cookies, no API key, no account.

How to use Welcome to the Jungle Scraper

  1. Open the Actor in Apify Console.
  2. Set a search query (e.g., data engineer, product manager, sales).
  3. Choose output mode: raw for standard JSON, rag for chunked text.
  4. Optional: Add filters for country, city, contract type, remote policy.
  5. Set max results (default 100, max 10,000).
  6. Click Start. Download as JSON, CSV, or Excel.

Input

FieldTypeDescription
querystringJob title, skill, or keyword to search (required).
localestringLanguage: en or fr. Default: en.
countryCodesarrayFilter by country codes (e.g., FR, GB, US).
locationsarrayFilter by city or region (e.g., Paris, London).
contractTypesarrayFilter by contract type (FULL_TIME, PART_TIME, INTERNSHIP, FREELANCE).
remotePoliciesarrayFilter by remote policy (full, partial, punctual, no).
categoriesarrayFilter by job category (tech, data, sales, marketing).
maxItemsintegerMaximum jobs to return. Default: 100. Max: 10,000.
outputModestringraw (full JSON) or rag (chunked text). Default: raw.
chunkSizeintegerTarget tokens per chunk for RAG mode. Default: 512.
chunkOverlapintegerToken overlap between chunks for RAG mode. Default: 50.

Output

Raw Mode Example

{
"jobId": "2d4fbe25-352d-47a4-8280-bd6d4642cfb1",
"title": "Data analytics engineer",
"company": {
"name": "Visian",
"slug": "visian",
"size": "50-249 employees",
"employeeCount": 200,
"description": "Consulting et execution technique pour projets data."
},
"locations": [
{ "city": "Courbevoie", "country": "France", "countryCode": "FR", "region": "Ile-de-France" }
],
"primaryLocation": "Courbevoie, France",
"contractType": "full_time",
"remotePolicy": "partial",
"salary": {
"hasSalary": true,
"min": 42000,
"max": 55000,
"yearlyMinimum": 42000,
"currency": "EUR",
"period": "yearly"
},
"publishedAt": "2026-06-04T09:39:54Z",
"summary": "Rejoignez Visian, une societe de conseil specialisee en innovation...",
"sectors": ["Artificial Intelligence / Machine Learning", "IT / Digital"],
"url": "https://www.welcometothejungle.com/en/jobs/data-engineer-visian-xxx",
"scrapedAt": "2026-06-06T10:00:00.000Z"
}

RAG Mode Example

{
"jobId": "2d4fbe25-352d-47a4-8280-bd6d4642cfb1",
"title": "Data analytics engineer",
"company": { "name": "Visian", "slug": "visian" },
"primaryLocation": "Courbevoie, France",
"chunks": [
{ "idx": 0, "text": "Job Title: Data analytics engineer\nCompany: Visian\nLocation: Courbevoie, France\n\nRejoignez Visian...", "tokens": 256 },
{ "idx": 1, "text": "...SQL, Databricks, Git et Power BI...", "tokens": 128 }
],
"scrapedAt": "2026-06-06T10:00:00.000Z"
}

Data Table

FieldDescription
jobIdWTTJ unique job identifier.
titleJob title as posted.
company.nameHiring company name.
company.slugCompany URL slug.
company.sizeCompany size category (e.g., "50-249 employees").
company.employeeCountTotal employees when available.
locationsArray of location objects with city, country, countryCode, region.
primaryLocationPrimary location as formatted string.
contractTypeContract type enum (full_time, part_time, internship, freelance).
remotePolicyRemote work policy (full, partial, punctual, no, unknown).
salarySalary object with min, max, currency, period when disclosed.
publishedAtISO date of job posting.
summaryShort job summary or description.
sectorsArray of job category sectors.
urlCanonical job URL on welcometothejungle.com.
chunksRAG mode only: array of { idx, text, tokens } chunks.
scrapedAtISO timestamp of scrape.

Pricing

$3.99 per 1,000 job listings (pay-per-result).

VolumeEstimated cost
100 jobs$0.40
1,000 jobs$3.99
10,000 jobs$39.90

No subscription. No minimum. You pay only for successful records.

Tips

  • Use RAG mode for AI agents. Feed chunks directly into a vector database for job matching, resume screening, or market intelligence. Compatible with LangChain, LlamaIndex, Qdrant, Pinecone, and Weaviate.
  • Filter by country first. WTTJ has different job densities per market. France and UK have the most listings.
  • Combine with salary filters. Many jobs disclose salary ranges. Filter salary.hasSalary: true downstream for benchmarking.
  • Schedule weekly for market tracking. Compare job counts and salary ranges over time to detect hiring trend shifts.

FAQ, Disclaimers, and Support

Is scraping Welcome to the Jungle legal? This Actor uses WTTJ's public Algolia search API, which requires no authentication. It collects publicly visible job listing data for research, market analysis, and recruiting intelligence. Users are responsible for ensuring their use complies with applicable laws and WTTJ's terms of service.

Why are some salary fields empty? Many jobs on WTTJ do not disclose salary. The Actor only saves salary values that the source exposes. Filter salary.hasSalary: true downstream if salary data is critical.

What is RAG mode? RAG (Retrieval-Augmented Generation) mode splits job descriptions into fixed-token chunks using tiktoken cl100k_base encoding. These chunks are ready to embed and store in a vector database for LLM-based job matching or resume screening.

How is this different from other WTTJ scrapers? This is the only WTTJ scraper with dual output modes: raw JSON for traditional pipelines and RAG-ready chunks for AI agents. No other scraper offers tiktoken-based chunking optimized for LLM ingestion.

Support: Open an issue on the Actor's Issues tab in Apify Console for bug reports, feature requests, or custom-run quotes.


Built with Apify + Crawlee + TypeScript. Part of the actorstack portfolio.