Get URLs from link
7 days trial then $2.95/month - No credit card required now
Get URLs from link
7 days trial then $2.95/month - No credit card required now
Extracts URLs from a sitemap or webpage with intuitive path matching. Use comma-separated patterns to include or exclude URL paths with smart matching: '/tags/' for exact paths, '/product' for paths starting with, or simple text for substring matches.
This actor extracts URLs from a sitemap or any webpage containing links. It provides intuitive URL path matching and flexible filtering options to get exactly the URLs you need.
Features
- Extract URLs from XML sitemaps or webpages
- Smart URL path matching:
- Use '/tags/' to match exact path
- Use '/product' to match paths starting with /product
- Use 'product' to match URLs containing this text anywhere
- Exclude specific file extensions (e.g., images)
- Exclude URLs using the same smart path matching
- Limit the number of processed URLs
- Simple comma-separated syntax for filters
Input Configuration
Field | Type | Description |
---|---|---|
link | String | URL to process (required) |
urlPattern | String | List of URL parts to include (comma separated). Use '*' to include all URLs. When using slashes: '/tags/' matches exact path, '/tags' matches path starting with /tags, 'tags/' matches path ending with tags/. Without slashes (e.g., 'product') matches anywhere in URL |
maxUrls | Integer | Maximum number of URLs to process (0 for no limit). Good for testing purposes |
excludeExtensions | String | List of file extensions to exclude (comma separated). Example: jpg,jpeg,png,gif |
customExcludePattern | String | List of URL parts to exclude (comma separated). Uses same pattern matching as urlPattern. Examples: '/tags/,category' or '/blog/,author' |
Output
The actor outputs a dataset containing URLs that match your specified criteria. Each record has the following field:
1{ 2 "url": "https://example.com/page" 3}
Usage Examples
Basic Usage
Extract all URLs from a sitemap:
1{ 2 "link": "https://example.com/sitemap.xml" 3}
Smart Path Matching
Get only product URLs with different matching options:
1{ 2 "link": "https://example.com/sitemap.xml", 3 "urlPattern": "/products/,productId,deals/" 4}
This will match:
- URLs containing exact '/products/' path
- URLs containing 'productId' anywhere
- URLs ending with 'deals/'
Exclude File Types and Sections
Get URLs excluding images and specific sections:
1{ 2 "link": "https://example.com/sitemap.xml", 3 "excludeExtensions": "jpg,jpeg,png,gif", 4 "customExcludePattern": "/tags/,/category/,author" 5}
Limit Results
Get first 100 URLs for testing:
1{ 2 "link": "https://example.com/sitemap.xml", 3 "maxUrls": 100 4}
Actor Metrics
8 monthly users
-
4 stars
>99% runs succeeded
Created in Oct 2024
Modified 3 months ago