AIPure Python Scraper avatar
AIPure Python Scraper

Deprecated

Pricing

$10.00 / 1,000 results

Go to Store
AIPure Python Scraper

AIPure Python Scraper

Deprecated

Developed by

Adhil Alcyone

Adhil Alcyone

Maintained by Community

Scrapes from https://aipure.ai/ the first pagination of all products grouped by categories and sorted by popularity 2K+ entries.

0.0 (0)

Pricing

$10.00 / 1,000 results

0

Total users

1

Monthly users

1

Last modified

5 months ago

Apify Actor: Python AIPure Scraper

Overview

The Python AIPure Scraper is an Apify Actor designed to scrape categories and tools from the AIPure website. The Actor retrieves categories, fetches tools within each category, and pushes the extracted data to the Apify dataset. This scraper supports proxy integration and robust error handling to ensure efficient and reliable data collection.

Features

  • Category Retrieval: Fetches all categories from the AIPure website.
  • Tool Extraction: Collects details about tools within each category, including name, description, link, and tags.
  • Proxy Support: Integrates with Apify Proxy to rotate IPs and avoid blocking.
  • Concurrency Control: Limits the number of concurrent tasks to prevent overloading the server.
  • Robust Error Handling: Logs errors during HTTP requests and continues execution for other tasks.

Workflow

  1. Retrieve Categories:

    • Scrapes the list of categories from https://aipure.ai/category/.
    • Each category includes a name and a URL.
  2. Fetch Tools for Each Category:

    • For each category, fetches tools sorted by popularity.
    • Extracts the following details for each tool:
      • Name
      • Description
      • External Link (fetched from the tool's internal page)
      • Tags
  3. Store Data:

    • Pushes extracted data to the Apify dataset in JSON format.
  4. Error Handling:

    • Logs errors for inaccessible URLs.
    • Catches exceptions during tool fetching to ensure continued execution for other tasks.

Input Configuration

The Actor accepts the following input parameters:

  • Proxy Configuration: Automatically retrieved from Apify Proxy settings.

Error Handling

  • Logs errors for inaccessible URLs or failed tasks.
  • Skips problematic tasks and continues processing others.

Output

Data is pushed to the Apify dataset in the following format:

{
"category": "Category Name",
"name": "Tool Name",
"description": "Tool Description",
"link": "External Link",
"tags": ["Tag1", "Tag2"]
}

Proxy Integration

  • Uses Actor.create_proxy_configuration() to enable proxy support.
  • Proxy URLs are dynamically retrieved using proxy_configuration.new_url().

Concurrency Management

  • Implements an asyncio.Semaphore to limit the number of concurrent tasks.
  • Default limit: 20 tasks.

Example Run

  1. Deploy the Actor on Apify.
  2. Specify input parameters (if required).
  3. Run the Actor and monitor logs for progress.
  4. Access the extracted data in the Apify dataset.

Dependencies

  • Apify SDK: Used for Actor functionality and proxy management.
  • HTTPX: Handles asynchronous HTTP requests.
  • BeautifulSoup: Parses HTML content.
  • Asyncio: Manages asynchronous tasks.

Notes

  • Ensure the AIPure website is accessible from your network.
  • Configure the Apify Proxy settings if accessing from restricted regions.

For more details, refer to the Apify SDK Documentation.