Bizi Si - Spider companies avatar

Bizi Si - Spider companies

Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Bizi Si - Spider companies

Bizi Si - Spider companies

potocnik-jure/bizi-si

Bizi.si Crawler - Apify Actor

This Apify actor crawls the Bizi.si website to extract company details such as title, URL, phone number, and email. It iterates through all browse pages based on specified criteria and collects data from each listed company.

Features

  • Iterates through all browse pages: The crawler navigates through all pages in the browse section of Bizi.si, capturing links to individual company profiles.
  • Scrapes company details: For each company, it extracts the title, URL, phone number, and email address.
  • Handles dynamic content and pagination: The crawler is designed to navigate through multiple pages and handle dynamic content loading.

Input Configuration

The crawler can be configured with the following inputs:

  • startUrl (String, Required): The starting URL of the browse page you wish to crawl. Example: https://www.bizi.si/TSMEDIA/V/vulkanizerstvo-4940/?f=activity&cls=TSMEDIA&chr=V&actss=4940&actsd=vulkanizerstvo&rw=1.
  • maxConcurrency (Number, Optional): The maximum number of concurrent pages that the crawler will process. Default is 10.
  • proxyConfiguration (Object, Optional): Configure proxies to avoid being blocked. Default is to use Apify's proxies.

Example Input

json

Copy code

{ "startUrl": "https://www.bizi.si/TSMEDIA/V/vulkanizerstvo-4940/?f=activity&cls=TSMEDIA&chr=V&actss=4940&actsd=vulkanizerstvo&rw=1", "maxConcurrency": 5, "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["SHADER"] } }

Output

The crawler will output a dataset with the following structure for each company:

  • title: The name of the company.
  • url: The URL of the company's profile page.
  • info_phone: The phone number listed for the company.
  • info_email: The email address listed for the company.

Example:

json

Copy code

{ "title": "Vulkanizerstvo ABC", "url": "https://www.bizi.si/TSMEDIA/V/vulkanizerstvo-4940/?f=activity&cls=TSMEDIA&chr=V&actss=4940&actsd=vulkanizerstvo&rw=1", "info_phone": "01 234 5678", "info_email": "info@vulkanizerstvoabc.si" }

Installation and Usage

  1. Clone the repository or create a new actor on Apify using the code.
  2. Set up the input: Provide the startUrl in the input configuration to specify the browse page you want to crawl.
  3. Run the actor: You can run the actor on the Apify platform. The actor will start from the specified URL, navigate through all pages, and collect data from each company listed.

Notes

  • Captcha Handling: If the site presents captchas, you may need to integrate a captcha-solving service or manually intervene.
  • Rate Limiting: To avoid being blocked, consider adjusting the concurrency settings and implementing random delays between requests.

Contributing

If you'd like to contribute to this project, feel free to submit a pull request. Any improvements, bug fixes, or feature requests are welcome!

License

This project is licensed under the MIT License - see the LICENSE file for details.


Developer
Maintained by Community