
Get Urls Pro PPR
Pricing
$12.00 / 1,000 results

Get Urls Pro PPR
This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages)
0.0 (0)
Pricing
$12.00 / 1,000 results
0
Total users
4
Monthly users
2
Runs succeeded
>99%
Last modified
a month ago
Website Crawler
This Apify actor crawls websites, extracts and creates a hierarchy of links, allowing you to visualize the structure of a website. The crawler can be configured to use either standard HTTP requests with BeautifulSoup (fast HTML parsing) or Selenium (for JavaScript-heavy pages).
Features
- Crawl any website starting from a specified URL
- Control crawl depth and number of links per page
- Filter out specific file extensions
- Option to use Selenium for JavaScript-heavy websites
- Prevent duplicate URLs in the output
- Proxy support (via Apify Proxy)
Input Parameters
Parameter | Type | Description |
---|---|---|
startUrl | String | The starting URL to crawl (e.g., https://jamesclear.com/five-step-creative-process) |
useSelenium | Boolean | Use Selenium for JavaScript-heavy pages |
allowDuplicates | Boolean | Allow duplicate URLs in the output |
maxDepth | Integer | Maximum depth of link recursion (1-30) |
maxChildrenPerLink | Integer | Maximum number of children per parent link (1-100) |
sameDomainOnly | Boolean | only crawl urls with the same domain as the start url, (default: true) |
ignoredExtensions | Array | File extensions to ignore when crawling |
Output
The actor outputs a JSON object with the following structure:
[{"url": "https://jamesclear.com/five-step-creative-process","name": null,"query": "","depth": 0,"parentUrl": null},{"url": "https://jamesclear.com/","name": null,"query": "","depth": 1,"parentUrl": "https://jamesclear.com/five-step-creative-process"},{"url": "https://jamesclear.com/books","name": "Books","query": "","depth": 1,"parentUrl": "https://jamesclear.com/five-step-creative-process"},{"url": "https://jamesclear.com/articles","name": "Articles","query": "","depth": 1,"parentUrl": "https://jamesclear.com/five-step-creative-process"},{"url": "https://jamesclear.com/3-2-1","name": "Newsletter","query": "","depth": 2,"parentUrl": "https://jamesclear.com/"},{"url": "https://jamesclear.com/events?g=4","name": "Speaking","query": "g=4","depth": 2,"parentUrl": "https://jamesclear.com/"}]
Example Usage
Basic Crawl
To create a basic map of a website with default settings:
{"startUrl": "https://google.com","useSelenium": false,"maxDepth": 2,"maxChildrenPerLink": 5,}
Deep Crawl with Selenium
For a deeper crawl of a JavaScript-heavy website:
{"startUrl": "https://jamesclear.com/five-step-creative-process","useSelenium": true,"maxDepth": 2,"maxChildrenPerLink": 5,"allowDuplicates": false,"ignoredExtensions": ["gif", "jpg", "png", "css", "jpeg", "pdf", "doc", "docx"]}
Implementation Details
This actor is built with:
- Apify Python SDK
- BeautifulSoup for standard HTML parsing
- Selenium with Chrome WebDriver for JavaScript-heavy pages
- Asynchronous processing for better performance
notes
- JavaScript-heavy pages may require the
useSelenium
option enabled - Very large websites should use lower
maxDepth
andmaxChildrenPerLink
values to avoid hitting memory limits, or talking way long time