Czechitas is a Czech non-profit organization of women and men on a mission to promote and increase diversity in IT industry and reach a higher level of digital proficiency among women and the new generation. They organize programming courses and workshops and have a wide partnership network helping their graduates in finding the right employment opportunities in the field.
“All roads lead to Czechitas, but some of them take too long” project was created as a part of the “Czechitas Digital Academy: Data” study program. The project goal was to optimize Czechitas branch structure in order to make their courses more accessible for potential participants. The project was kicked off with analyzing commuting times from all around the Czech Republic to current and possible future locations of Czechitas’ offices. But to get the full picture, as the second part of the plan, it was also necessary to collect and analyze posted job advertisements for junior IT positions all over the country in respective cities. This was the moment they realized web scraping could simplify this task for them.
Apify has been a long-time partner for Czechitas and stepped in as a mentor for this part of the project. With their advisory and support, Czechitas created their own actor to scrape startupjobs.cz, an advertisement platform dedicated to job search specifically in IT industry. Czechitas started with a universal apify/web-scraper which has been modified according to their request. It took some time to finetune all the details, but they managed to scrape job advertisement content in its entirety on the website with preserved information structure as follows:
- position name, company
- location, possibility to work remotely
- scope of work
- form of cooperation
- required language skills
- job description (full text)
- issue date of the advertisement
- url of the advertisement
The last one, tricky part of the scraping process was pagination which Czechitas figured out in the process. It turned out that the startupjobs.cz home-page actually stated the number of advertisements to display, so with a set number of items per page the website was rather easy to get through.
All scraped information was downloaded in JSON format. For further clearing, transformation and categorization, this dataset was also uploaded to Keboola (directly via its dedicated Apify extractor). All data transformations have been made in Keboola via Snowflake SQL and the optimized dataset was then exported to Tableau to perform final visualizations. Once everything was working smoothly, Czechitas automated all the above-mentioned steps. They scheduled the actor in Apify to run repeatedly, and set up orchestration in Keboola to extract and transform data. These processes ensured that the resulting .tbe file regularly filled up with new data, and therefore always remained up-to-date.
Thanks to this automated workflow, we can now analyze current job trends on the IT market. This helps Czechitas to make the right decisions about opening up new offices and targeting the right audience for their courses. It has been a wonderful experience to work with our Apify mentors and we’re looking forward to new ambitious scraping projects.
Martina Gelnerová, Tereza Morávková
Coders for Czechitas