
AIPure Python Scraper
Deprecated
Pricing
$10.00 / 1,000 results

AIPure Python Scraper
Deprecated
Scrapes from https://aipure.ai/ the first pagination of all products grouped by categories and sorted by popularity 2K+ entries.
0.0 (0)
Pricing
$10.00 / 1,000 results
0
Total users
1
Monthly users
1
Last modified
5 months ago
Apify Actor: Python AIPure Scraper
Overview
The Python AIPure Scraper is an Apify Actor designed to scrape categories and tools from the AIPure website. The Actor retrieves categories, fetches tools within each category, and pushes the extracted data to the Apify dataset. This scraper supports proxy integration and robust error handling to ensure efficient and reliable data collection.
Features
- Category Retrieval: Fetches all categories from the AIPure website.
- Tool Extraction: Collects details about tools within each category, including name, description, link, and tags.
- Proxy Support: Integrates with Apify Proxy to rotate IPs and avoid blocking.
- Concurrency Control: Limits the number of concurrent tasks to prevent overloading the server.
- Robust Error Handling: Logs errors during HTTP requests and continues execution for other tasks.
Workflow
-
Retrieve Categories:
- Scrapes the list of categories from
https://aipure.ai/category/
. - Each category includes a name and a URL.
- Scrapes the list of categories from
-
Fetch Tools for Each Category:
- For each category, fetches tools sorted by popularity.
- Extracts the following details for each tool:
- Name
- Description
- External Link (fetched from the tool's internal page)
- Tags
-
Store Data:
- Pushes extracted data to the Apify dataset in JSON format.
-
Error Handling:
- Logs errors for inaccessible URLs.
- Catches exceptions during tool fetching to ensure continued execution for other tasks.
Input Configuration
The Actor accepts the following input parameters:
- Proxy Configuration: Automatically retrieved from Apify Proxy settings.
Error Handling
- Logs errors for inaccessible URLs or failed tasks.
- Skips problematic tasks and continues processing others.
Output
Data is pushed to the Apify dataset in the following format:
{"category": "Category Name","name": "Tool Name","description": "Tool Description","link": "External Link","tags": ["Tag1", "Tag2"]}
Proxy Integration
- Uses
Actor.create_proxy_configuration()
to enable proxy support. - Proxy URLs are dynamically retrieved using
proxy_configuration.new_url()
.
Concurrency Management
- Implements an
asyncio.Semaphore
to limit the number of concurrent tasks. - Default limit: 20 tasks.
Example Run
- Deploy the Actor on Apify.
- Specify input parameters (if required).
- Run the Actor and monitor logs for progress.
- Access the extracted data in the Apify dataset.
Dependencies
- Apify SDK: Used for Actor functionality and proxy management.
- HTTPX: Handles asynchronous HTTP requests.
- BeautifulSoup: Parses HTML content.
- Asyncio: Manages asynchronous tasks.
Notes
- Ensure the AIPure website is accessible from your network.
- Configure the Apify Proxy settings if accessing from restricted regions.
For more details, refer to the Apify SDK Documentation.