Bizi Si - Spider companies
This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?
See alternative ActorsBizi Si - Spider companies
Bizi.si Crawler - Apify Actor
This Apify actor crawls the Bizi.si website to extract company details such as title, URL, phone number, and email. It iterates through all browse pages based on specified criteria and collects data from each listed company.
Features
- Iterates through all browse pages: The crawler navigates through all pages in the browse section of Bizi.si, capturing links to individual company profiles.
- Scrapes company details: For each company, it extracts the title, URL, phone number, and email address.
- Handles dynamic content and pagination: The crawler is designed to navigate through multiple pages and handle dynamic content loading.
Input Configuration
The crawler can be configured with the following inputs:
- startUrl (String, Required): The starting URL of the browse page you wish to crawl. Example:
https://www.bizi.si/TSMEDIA/V/vulkanizerstvo-4940/?f=activity&cls=TSMEDIA&chr=V&actss=4940&actsd=vulkanizerstvo&rw=1
. - maxConcurrency (Number, Optional): The maximum number of concurrent pages that the crawler will process. Default is 10.
- proxyConfiguration (Object, Optional): Configure proxies to avoid being blocked. Default is to use Apify's proxies.
Example Input
json
Copy code
{ "startUrl": "https://www.bizi.si/TSMEDIA/V/vulkanizerstvo-4940/?f=activity&cls=TSMEDIA&chr=V&actss=4940&actsd=vulkanizerstvo&rw=1", "maxConcurrency": 5, "proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["SHADER"] } }
Output
The crawler will output a dataset with the following structure for each company:
- title: The name of the company.
- url: The URL of the company's profile page.
- info_phone: The phone number listed for the company.
- info_email: The email address listed for the company.
Example:
json
Copy code
{ "title": "Vulkanizerstvo ABC", "url": "https://www.bizi.si/TSMEDIA/V/vulkanizerstvo-4940/?f=activity&cls=TSMEDIA&chr=V&actss=4940&actsd=vulkanizerstvo&rw=1", "info_phone": "01 234 5678", "info_email": "info@vulkanizerstvoabc.si" }
Installation and Usage
- Clone the repository or create a new actor on Apify using the code.
- Set up the input: Provide the
startUrl
in the input configuration to specify the browse page you want to crawl. - Run the actor: You can run the actor on the Apify platform. The actor will start from the specified URL, navigate through all pages, and collect data from each company listed.
Notes
- Captcha Handling: If the site presents captchas, you may need to integrate a captcha-solving service or manually intervene.
- Rate Limiting: To avoid being blocked, consider adjusting the concurrency settings and implementing random delays between requests.
Contributing
If you'd like to contribute to this project, feel free to submit a pull request. Any improvements, bug fixes, or feature requests are welcome!
License
This project is licensed under the MIT License - see the LICENSE file for details.