Webpage Link Extractor
Pricing
from $10.00 / 1,000 results
What does this actor do?
Webpage Link Extractor is an Apify actor that extract all links from webpages with optional depth crawling. It runs on the Apify platform and delivers structured data in JSON, CSV, or Excel formats that you can easily integrate into your workflows. For each item found, the actor extracts key data fields including source url, target url, anchor text, external, and more. All results are stored in an Apify dataset that you can download or connect to via the Apify API.
Why use this actor?
Manually collecting this data would be extremely time-consuming and error-prone. Webpage Link Extractor automates the entire process, saving you hours of manual work. This actor is ideal for data analysts, researchers, marketers, and developers who need reliable, structured data. You can schedule regular runs to keep your data fresh, integrate results directly into spreadsheets or databases, and scale your data collection without any coding required. The actor handles pagination, rate limiting, and data normalization automatically.
How does it work?
This actor uses the Cheerio HTTP scraping library to efficiently parse HTML pages from the target website. It sends lightweight HTTP requests without rendering JavaScript, making it fast and resource-efficient. The actor processes search results, follows pagination, and extracts structured data from each page using CSS selectors.
Input parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| url | string | Starting URL to extract links from | None |
| maxDepth | integer | Maximum crawl depth (1 = only the starting page) | 1 |
| maxLinks | integer | Maximum number of links to extract | 1000 |
Output fields
Each item in the output dataset contains the following fields:
| Field | Description | Format |
|---|---|---|
| sourceUrl | Source URL | text |
| targetUrl | Target URL | text |
| anchorText | Anchor Text | text |
| isExternal | External | text |
| isNofollow | Nofollow | text |
Example output:
{"sourceUrl": "Sample Source URL","targetUrl": "Sample Target URL","anchorText": "Sample Anchor Text","isExternal": "Sample External","isNofollow": "Sample Nofollow"}
Cost and performance
This actor runs with a default memory allocation of 1024 MB. Using lightweight HTTP requests, each run typically costs around $0.10-0.25 in Apify platform credits per 1,000 results. A typical run processing 100 results completes in 1-3 minutes. You can reduce costs by limiting the number of results with the maxResults parameter and by scheduling runs during off-peak hours.
Tips and best practices
- Start with a small number of results to test your configuration before scaling up.
- Use the Apify scheduling feature to automate regular data collection runs.
- Export results in the format that best fits your workflow: JSON for APIs, CSV for spreadsheets, or Excel for reports.
- Connect this actor with other actors on the Apify platform for more comprehensive data pipelines.
Related actors you might find useful:
