🛍️ Skroutz Product Scraper avatar
🛍️ Skroutz Product Scraper

Pricing

from $0.01 / 1,000 results

Go to Apify Store
🛍️ Skroutz Product Scraper

🛍️ Skroutz Product Scraper

Scrape 100+ products from Skroutz.gr in under 2 minutes! Perfect for price monitoring and market research. No coding required. 📊

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Michalis Paignigiannis

Michalis Paignigiannis

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 hours ago

Last modified

Categories

Share

SKROUTZ SCRAPER

I'll create a comprehensive Apify actor to scrape products from Skroutz.gr, focusing on the XHR API endpoint that returns the product JSON data.

main.py+112-24 Perfect! I've created a specialized Apify actor for scraping Skroutz.gr products. Here's what the code does:

Key Features:

  1. XHR API Intercception:

Listens for the variations.json XHR endpoint that contains product data Automatically captures and parses the JSON responses 2. Product Data Extraction:

Extracts SKU ID, product name, price, URLs (product & image), and color variations Handles the nested JSON structure from Skroutz API responses 3. Helper Functions:

parse_skroutz_products(): Parses the complex API response structure extract_search_query(): Extracts the search term from the URL 4. Data Enrichment:

Adds page URL, search query, and page title to each product record Structured output ready for storage or processing How to Use: Input Format:

{  "start_urls": [    {"url": "https://www.skroutz.gr/search?keyphrase=tplink+tapo+camera"}  ],  "max_requests_per_crawl": 5} Output Format: Each product will be saved with:

{  "sku_id": 46360886,  "name": "TP-LINK Tapo C220 v1 IP Κάμερα Wi-Fi Full HD+ 4MP",  "price": "24,64 €",  "url": "/s/46360886/...",  "image_url": "//b.scdn.gr/images/...",  "follow": true,  "page_url": "https://www.skroutz.gr/search?keyphrase=...",  "search_query": "tplink tapo camera",  "page_title": "..."} The actor uses Playwright to handle JavaScript rendering and Cloudflare's protections, then intercepts the XHR responses to extract the product JSON data efficiently!

Python Crawlee with Playwright template

A template for web scraping data from websites starting from provided URLs using Python. The starting URLs are passed through the Actor's input schema, defined by the input schema. The template uses Crawlee for Python for efficient web crawling, making requests via headless browser managed by Playwright, and handling each request through a user-defined handler that uses Playwright API to extract data from the page. Enqueued URLs are managed in the request queue, and the extracted data is saved in a dataset for easy access.

Included features

  • Apify SDK - a toolkit for building Apify Actors in Python.
  • Crawlee for Python - a web scraping and browser automation library.
  • Input schema - define and validate a schema for your Actor's input.
  • Request queue - manage the URLs you want to scrape in a queue.
  • Dataset - store and access structured data extracted from web pages.
  • Playwright - a library for managing headless browsers.

Resources

Getting started

For complete information see this article. To run the Actor use the following command:

$apify run

Deploy to Apify

Connect Git repository to Apify

If you've created a Git repository for the project, you can easily connect to Apify:

  1. Go to Actor creation page
  2. Click on Link Git Repository button

Push project on your local machine to Apify

You can also deploy the project on your local machine to Apify without the need for the Git repository.

  1. Log in to Apify. You will need to provide your Apify API Token to complete this action.

    $apify login
  2. Deploy your Actor. This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.

    $apify push

Documentation reference

To learn more about Apify and Actors, take a look at the following resources: