Pricing

from $10.00 / 1,000 results

WebLink Scraper

This Actor Scraps or extract all links present in single domain its crawls all pages and gets list of third partly along current URL endpoints to analyze what's hidden links are present in domains.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Ruturaj Sharbidre

Actor stats

Bookmarked

Total users

Monthly active users

6 months ago

Last modified

Deep Link Crawler Pro

A powerful Python-based web crawler designed for Apify. It uses Playwright to render JavaScript and extract links from HTML, scripts, CSS, and other sources.

Features

Deep Crawling: Follows links recursively up to a specified depth.
JavaScript Rendering: Uses Playwright to execute JS and find dynamic links.
Advanced Filtering: Include/Exclude by file extension, regex patterns, or domain.
Multiple Sources: Extracts from <a> tags, src attributes, plain text regex, and more.
Structured Output: Saves data in JSON, CSV, and TXT formats, organized by domain.

Installation

Clone the repository:

git clone <repository_url>
cd WebLinkScraper

Install dependencies:

pip install -r requirements.txt
playwright install

Usage

Local Testing

You can run the crawler locally using the provided test script:

$python tests/test_local_crawl.py

Or run the main module (requires mocking Apify input or setting environment variables):

$python -m src.main

Apify Deployment

Push to Apify: Use the Apify CLI:
```
$apify push
```

Configuration (Input): The Actor accepts the following input:

{
    "startUrls": ["https://example.com"],
    "maxDepth": 3,
    "maxPagesPerDomain": 1000,
    "includeExtensions": ".pdf,.doc",
    "excludeExtensions": ".png,.jpg,.css",
    "outputFormat": ["CSV", "JSON"]
}

Input Parameters

startUrls: List of URLs to start crawling.
maxDepth: Maximum recursion depth (default: 3).
maxPagesPerDomain: Limit pages per domain to avoid getting stuck.
includeExtensions: Comma-separated list of extensions to include (whitelist).
excludeExtensions: Comma-separated list of extensions to exclude (blacklist).
csvFile: Upload a CSV file containing URLs to crawl.

Output

Results are saved in the results/ directory (locally) or the default Key-Value Store (Apify).

Structure:

results/
├── example.com/
│   ├── links.txt
│   ├── links.csv
│   └── links.json
└── ...

License

Author : Ruturaj Sharbidre

JSON Content Checker

scrapeworks/json-content-checker

Check and validate a list of JSON URLs or API endpoints in bulk: confirm each response is valid JSON, inspect its structure, assert that required fields and values are present, and get a stable content hash to detect when a feed changes.

Nicolas van Arkens

Website Email Scraper - All Contacts

thenetaji/website-email-scraper

Extract emails from websites. This Apify actor crawls pages to discover media links with configurable depth, proxy support & domain filtering. Boost content research & lead gen.

The Netaji

1.3K

4.0

Domain Scraper

ib4ngz/domain-scraper

This actor scrapes unique domains from a list of provided URLs. It crawls each page, extracts domains, and stores them in a dataset. The actor respects a defined maximum depth and filters domains based on whether they are ICANN-approved and whether private domains are allowed.

Iqbal R

5.0

Mercado Libre Scraper

trudax/mercadolibre-scraper

MercadoLibre Scraper enables you to extract data about products, sellers, and listings present on this marketplace.

Trudax

733

2.9

Social Links Scraper

akaza010/Social-Links-Scraper

This actor crawls a list of input URLs and extracts social media links (e.g. Facebook, Twitter/X, Instagram, LinkedIn, YouTube, TikTok, etc.) that exist on those pages.

Akaza

5.0

VirusTotal Analysis

ruturaj04/VirusTotal

This Actor Analysis Single or Multiple URLs whether they are criminal ip or malicious website also its output show in json getting scanned link along with screenshot to help see its data. Its Usage via virustotal APi along if not provided it can do result showing via default without API as well

Ruturaj Sharbidre

Domain Inspector

visita/domain-inspector

A powerful, all-in-one tool to perform DNS lookups, WHOIS queries, HTTP status checks, and SSL certificate validation for a list of domains. It can clean full URLs down to the bare domain (e.g., https://www.apify.com/store → apify.com) and run all checks in a single batch.

Visita Intelligence

629

5.0

Walmart Seller Email and info Scraper

getdataforme/walmart-seller-email-and-info-scraper

This Scraper scrapes Walmart Seller information details and also provides email address present in the detail section of the selller. good for lead collection

GetDataForMe

Website Contact Crawler

competent_clarinet/website-contact-crawler

Crawls websites to extract emails, phones, and social links.

Man Mohit verma

5.0

RemoteOK Jobs Scraper - Search & All Current Remote Jobs

lentic_clockss/remoteok-all-jobs-scraper

Search current RemoteOK jobs by keyword or export all current remote roles with salary, tags, location, apply links, and normalized fields for hiring and lead workflows.