Junior Guru Job Scraper Demo
Pricing
Pay per usage
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Kateřina Hroníková
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
StartupJobs.cz demo scraper
An Apify Actor that collects developers job listings from StartupJobs.cz using their public API.
Built as a live demo for the junior.guru community talk "Web scraping: Nechte internet pracovat za vás".
What does it do?
You give it a keyword (e.g. junior, python, javascript) and it returns a list of matching developer/engineer job offers including title, company, location, salary, and a direct link. Non-tech roles (sales, marketing, etc.) are filtered out automatically.
Results are stored in an Apify Dataset and can be exported to CSV, JSON, or other formats in one click.
Prerequisites
- Apify account (free tier is enough)
- Node.js 18+
- Apify CLI
npm install -g apify-cliapify login
Step 1 — Find the API using DevTools
Before writing any code, open startupjobs.cz/nabidky in your browser and explore how it loads data.
- Press F12 to open DevTools
- Go to the Network tab
- Filter by Fetch/XHR
- Reload the page or type a keyword in the search box
- Look for a request to
/api/offers
You'll see something like:
GET https://www.startupjobs.cz/api/offers?keyword=junior&limit=20&page=1
Open it in a new tab — you get clean JSON back. No HTML parsing needed. 🎉
{"resultSet": [{"name": "Junior TypeScript Developer","company": "Acme s.r.o.","url": "/nabidka/12345/junior-typescript-developer","locations": "Praha","isRemote": true,"seniorities": ["junior"],"areaSlugs": ["back-end-vyvojar", "vyvoj"],"salary": { "min": 40000, "max": 60000, "currency": "CZK", "measure": "monthly" }}]}
Step 2 — Walk through the code
The entire actor is in src/main.ts. Here's what it does:
await Actor.init();const { keyword = '', seniority = '', maxResults = 50 } = await Actor.getInput() ?? {};while (collected < maxResults) {// 1. Call the StartupJobs API — plain fetch(), JSON responseconst response = await fetch(`${API_URL}?keyword=${keyword}&page=${page}`);const { resultSet: offers } = await response.json();for (const offer of offers) {// 2. Skip non-developer roles (sales, marketing, etc.) and wrong seniorityconst isDevRole = offer.areaSlugs.some((slug) => DEV_AREA_SLUGS.has(slug));const isSeniorityMatch = !seniority || offer.seniorities.includes(seniority);if (!isDevRole || !isSeniorityMatch) continue;// 3. Pick the fields we care about and save to Apify Datasetawait Actor.pushData({title: offer.name,company: offer.company,url: `${BASE_URL}${offer.url}`,// ...});}}
Three concepts, that's it: fetch → filter → save.
StartupJobs has a clean API, so we get JSON directly. If it didn't, we'd have to fetch the HTML page and extract data from it using CSS selectors — this is called parsing:
// Without an API you'd do something like this instead:import * as cheerio from 'cheerio';const response = await fetch('https://www.startupjobs.cz/nabidky?q=javascript');const html = await response.text(); // raw HTML string, not JSONconst $ = cheerio.load(html); // parse the HTML$('.offer-title').each((_, el) => { // find all elements matching a CSS selectorconst title = $(el).text().trim(); // extract the text contentconst url = $(el).attr('href'); // or an attributeconsole.log(title, url);});
HTML structure changes whenever the site redesigns — APIs are much more stable.
Step 3 — Run locally
# Install dependenciesnpm install# Run without building (great for development)npm run dev# Or build first, then runnpm run buildnpm start
To set a custom keyword, create storage/key_value_stores/default/INPUT.json:
{"keyword": "javascript","seniority": "junior","maxResults": 20}
Step 4 — Deploy to Apify
$apify push
Your actor is now live at console.apify.com under My Actors.
Step 5 — Schedule & export
Run on a schedule — e.g. every morning at 8:00:
- Open your actor in Apify Console
- Go to Schedules → + New Schedule
- Set cron:
0 8 * * 1-5(Mon–Fri at 8:00)
Export results:
- Dataset → Export → CSV / JSON
- Or connect directly to Gmail via Apify integrations
Build your own scraper
Want to scrape a different site? You can use this repo as a starting point.
-
Pick your starting point based on what the target site looks like:
Situation Template Site has a JSON API (like this demo) Clone this repo No API, static HTML ts-crawlee-cheerioNo API, heavy JavaScript / dynamic content ts-crawlee-playwright$apify create my-scraper --template ts-crawlee-cheerio -
Find the data source — open the target site in your browser, go to DevTools → Network → Fetch/XHR, and look for an API call returning JSON. If there's no API, switch to the Elements tab and find the CSS selectors for the data you need.
-
Edit
src/main.ts— replace thefetch()URL and the fields insideActor.pushData({...})with whatever your target API or page returns. The structure stays the same: fetch → filter → save. -
Update
.actor/input_schema.jsonto define the inputs your scraper needs (keywords, URLs, limits, etc.). -
Run locally with
npm run dev, then deploy withapify push.
The Apify documentation and Academy are great next steps from here.
Going further
| What | How |
|---|---|
| Compare day-over-day | Store results with a timestamp, diff on next run |
| Scrape a JS-heavy site | Switch to PlaywrightCrawler from Crawlee |
| Browse 29 000+ ready-made scrapers | apify.com/store |
Glossary
Web scraping — Automatically collecting data from websites by sending requests and extracting the relevant parts from the response (HTML or JSON).
Server — A computer (or program) that listens for requests over the internet and sends back a response. When you open a website, your browser sends a request to a server, which replies with the page content.
API (Application Programming Interface) — A formal agreement between two programs on how to exchange data: what you can ask for, how to ask it, and what format the answer comes back in. This scraper uses StartupJobs' public API, which means we get clean JSON instead of having to dig through HTML.
Parsing — Analyzing and processing structured text (HTML or JSON) to pull out specific pieces of data. When a site has no API, you parse the raw HTML to find what you need.
JS site (JavaScript-rendered site) — A site that builds its content in the browser using JavaScript. A plain HTTP request returns only an empty shell — the actual data isn't in the source HTML at all. You need a headless browser to load these properly.
Headless browser — A web browser that runs without a visible window. It works exactly like a normal browser (loads pages, runs JavaScript, processes CSS), but everything happens in memory in the background. Used to scrape JS-rendered sites.
LLM (Large Language Model) — A type of AI trained on massive amounts of text, capable of understanding and generating human-like language. In scraping, LLMs can help extract or structure data from unstructured text that would be hard to parse with code alone.
Proxy — An intermediary server between you and the target website. Your requests go through it, so the website sees the proxy's IP address instead of yours. Used to avoid IP bans when scraping at scale.
Resources
- Apify SDK for JavaScript/TypeScript
- Apify Academy — Web scraping for beginners
- junior.guru — community and handbook for junior developers in CZ/SK
- Talk slides