The Python AIPure Scraper is an Apify Actor designed to scrape categories and tools from the AIPure website. The Actor retrieves categories, fetches tools within each category, and pushes the extracted data to the Apify dataset. This scraper supports proxy integration and robust error handling to ensure efficient and reliable data collection.

Features

Category Retrieval: Fetches all categories from the AIPure website.
Tool Extraction: Collects details about tools within each category, including name, description, link, and tags.
Proxy Support: Integrates with Apify Proxy to rotate IPs and avoid blocking.
Concurrency Control: Limits the number of concurrent tasks to prevent overloading the server.
Robust Error Handling: Logs errors during HTTP requests and continues execution for other tasks.

Workflow

Retrieve Categories:
- Scrapes the list of categories from https://aipure.ai/category/.
- Each category includes a name and a URL.
Fetch Tools for Each Category:
- For each category, fetches tools sorted by popularity.
- Extracts the following details for each tool:
  - Name
  - Description
  - External Link (fetched from the tool's internal page)
  - Tags
Store Data:
- Pushes extracted data to the Apify dataset in JSON format.
Error Handling:
- Logs errors for inaccessible URLs.
- Catches exceptions during tool fetching to ensure continued execution for other tasks.

Input Configuration

The Actor accepts the following input parameters:

Proxy Configuration: Automatically retrieved from Apify Proxy settings.

Error Handling

Logs errors for inaccessible URLs or failed tasks.
Skips problematic tasks and continues processing others.

Output

Data is pushed to the Apify dataset in the following format:

{
    "category": "Category Name",
    "name": "Tool Name",
    "description": "Tool Description",
    "link": "External Link",
    "tags": ["Tag1", "Tag2"]
}

Proxy Integration

Uses Actor.create_proxy_configuration() to enable proxy support.
Proxy URLs are dynamically retrieved using proxy_configuration.new_url().

Concurrency Management

Implements an asyncio.Semaphore to limit the number of concurrent tasks.
Default limit: 20 tasks.

Example Run

Deploy the Actor on Apify.
Specify input parameters (if required).
Run the Actor and monitor logs for progress.
Access the extracted data in the Apify dataset.

Dependencies

Apify SDK: Used for Actor functionality and proxy management.
HTTPX: Handles asynchronous HTTP requests.
BeautifulSoup: Parses HTML content.
Asyncio: Manages asynchronous tasks.

Notes

Ensure the AIPure website is accessible from your network.
Configure the Apify Proxy settings if accessing from restricted regions.

For more details, refer to the Apify SDK Documentation.

On this page

Apify Actor: Python AIPure Scraper

Share Actor:

Etsy Review Scraper

getdataforme/etsy-review-scraper

The Etsy Review Scraper is an easy scraper to extract reviews from specified Etsy shop. The Etsy review scraper is easy to use, simply provide the shop name and the scraper will do the magic providing you the reviews from the products. etsy scraper python

GetDataForMe

5.0

Booking.com All-Year Multi-Area Hotel Data Scraper

moving_beacon-owner1/my-actor-25

A Python-based scraper that automatically collects hotel listings from Booking.com across multiple cities and every month of the year, capturing data such as price, rating, and availability for travel and market analysis

Jamshaid Arif

Y Combinator API Scraper

clearpath/ycombinator-api-scraper

Extract complete Y Combinator ecosystem data - 5000+ companies, 8000+ founders, 3500+ jobs. Perfect for VCs, recruiters, and researchers. Get startup intelligence, funding trends, team data, and job listings. Reliable Python scraper with proxy support. Start at $3.50.

ClearPath

TikTok Explore Scraper

clockworks/tiktok-explore-scraper

Extract data from TikTok explore categories including post, author, video, and music data. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Clockworks

4.6

Linkedin Profile Scraper No Cookies

logical_scrapers/linkedin-profile-scraper-no-cookies

LinkedIn Bulk Profile Scraper with No Cookies Required. scrapers all publicly available data from a given LinkedIn profile URL.

Goldmine

153

1.0

Google Maps Reviews Scraper

webscrapewizard/google-maps-reviews-scraper

Extract all reviews of Google Maps places using place URLs or Review urls. Get review text, published date, response from owner, review URL, and reviewer's details.

WebScraperWizard

232

Instagram Search Users by Cookies

shareze001/instagram-search-users-by-cookies

Use cookies to scrape Instagram users for a specific keyword. The reason for splitting is that after obtaining a cookie, it can be used for multiple days to avoid your account being blocked.

shareze

Google Map Review Scraper

scrapercoder/google-map-review-scraper

Extract Google Maps place reviews using URLs. Retrieve review text, publication date, formatted data, owner responses, review URLs, and reviewer details. Download scraped data, run via API, schedule and monitor runs, or integrate with other tools

wallnut.ai

TikTok User Search Scraper

clockworks/tiktok-user-search-scraper

Extract data about users based on TikTok user search. You'll get full user profiles, including name, nickname, signature, number of followers, number of videos, bio link, and author’s ID.

Clockworks

276

4.4

Facebook post scraper ppr

mina_safwat/facebook-post-scraper-ppr

This actor will scrape post info from facebook like caption, like , shares and comments and more quickly

Mina Safwat

220

5.0

TikTok Live Scraper

clockworks/tiktok-live-scraper

Extract data from TikTok live sessions such as owner ID, profile URL, bio and followers, number of live session guests, viewers, hashtags, or time stamps. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

Clockworks

103

4.6