Get URLs from link
Pricing
$2.95/month + usage
Get URLs from link
Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.
5.0 (1)
Pricing
$2.95/month + usage
6
Total users
77
Monthly users
16
Runs succeeded
>99%
Last modified
6 months ago
This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need.
Features
- Extract URLs from XML sitemaps or webpages
- Smart URL path matching:
- Use '/tags/' to match exact path
- Use '/product' to match paths starting with /product
- Use 'product' to match URLs containing this text anywhere
- Exclude specific file extensions (e.g., images)
- Exclude URLs using the same smart path matching
- Limit the number of processed URLs
- Simple comma-separated syntax for filters
Input Configuration
Field | Type | Description |
---|---|---|
link | String | URL to process (required) |
urlPattern | String | List of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL |
maxUrls | Integer | Maximum number of URLs to process (0 for no limit). Good for testing purposes |
excludeExtensions | String | List of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif |
customExcludePattern | String | List of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author' |
Output
The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field:
{"url": "https://example.com/page"}
Usage Examples
Basic Usage
Extract all URLs from a sitemap:
{"link": "https://example.com/sitemap.xml"}
Smart Path Matching
Get only product URLs with different matching options:
{"link": "https://example.com/sitemap.xml","urlPattern": "/products/,productId,deals/"}
This will match:
- URLs containing exact '/products/' path
- URLs containing 'productId' anywhere
- URLs ending with 'deals/'
Exclude File Types and Sections
Get URLs excluding images and specific sections:
{"link": "https://example.com/sitemap.xml","excludeExtensions": "jpg,jpeg,png,gif","customExcludePattern": "/tags/,/category/,author"}
Limit Results
Get first 100 URLs for testing:
{"link": "https://example.com/sitemap.xml","maxUrls": 100}