WebLink Scraper
Pricing
from $10.00 / 1,000 results
WebLink Scraper
This Actor Scraps or extract all links present in single domain its crawls all pages and gets list of third partly along current URL endpoints to analyze what's hidden links are present in domains.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer

Ruturaj Sharbidre
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
2 months ago
Last modified
Categories
Share
Deep Link Crawler Pro
A powerful Python-based web crawler designed for Apify. It uses Playwright to render JavaScript and extract links from HTML, scripts, CSS, and other sources.
Features
- Deep Crawling: Follows links recursively up to a specified depth.
- JavaScript Rendering: Uses Playwright to execute JS and find dynamic links.
- Advanced Filtering: Include/Exclude by file extension, regex patterns, or domain.
- Multiple Sources: Extracts from
<a>tags,srcattributes, plain text regex, and more. - Structured Output: Saves data in JSON, CSV, and TXT formats, organized by domain.
Installation
-
Clone the repository:
git clone <repository_url>cd WebLinkScraper -
Install dependencies:
pip install -r requirements.txtplaywright install
Usage
Local Testing
You can run the crawler locally using the provided test script:
$python tests/test_local_crawl.py
Or run the main module (requires mocking Apify input or setting environment variables):
$python -m src.main
Apify Deployment
-
Push to Apify: Use the Apify CLI:
$apify push -
Configuration (Input): The Actor accepts the following input:
{"startUrls": ["https://example.com"],"maxDepth": 3,"maxPagesPerDomain": 1000,"includeExtensions": ".pdf,.doc","excludeExtensions": ".png,.jpg,.css","outputFormat": ["CSV", "JSON"]}
Input Parameters
startUrls: List of URLs to start crawling.maxDepth: Maximum recursion depth (default: 3).maxPagesPerDomain: Limit pages per domain to avoid getting stuck.includeExtensions: Comma-separated list of extensions to include (whitelist).excludeExtensions: Comma-separated list of extensions to exclude (blacklist).csvFile: Upload a CSV file containing URLs to crawl.
Output
Results are saved in the results/ directory (locally) or the default Key-Value Store (Apify).
Structure:
results/├── example.com/│ ├── links.txt│ ├── links.csv│ └── links.json└── ...
License
Author : Ruturaj Sharbidre


