Website Media Link Scraper
Pricing
from $2.00 / 1,000 results
Website Media Link Scraper
Quickly find video, audio, docs, pdf, image and more links from websites using this fast and lightweight web crawler. No browser needed—just clean and efficient media extraction.
Pricing
from $2.00 / 1,000 results
Rating
4.3
(5)
Developer
The Netaji
Actor stats
2
Bookmarked
250
Total users
6
Monthly active users
3 days ago
Last modified
Categories
Share
🔍 Media Link Crawler
TL;DR: Extract videos, images, documents, and other media from any website. Automatically bypasses anti-bot protections with adaptive stealth escalation powered by Scrapling.
✅ Features
- Extracts 18 media types: videos, audio, images, PDFs, documents, archives, eBooks, fonts, text, APKs, contacts, subtitles, 3D models, datasets, design assets, source code, disk images
- Adaptive stealth: starts with a fast HTTP session, automatically escalates to a full stealth browser when blocked
- Cloudflare bypass: optional solver for Turnstile and Interstitial challenges
- XHR capture: intercepts background API calls to catch media that never appears in HTML
- Crawls multiple pages with depth and URL count limits
- Proxy support (Apify Proxy or custom proxy list)
- Contact extraction from visible text (emails, phones, addresses)
🎯 Supported Media Types
| Media Type | Formats |
|---|---|
| Video | mp4, webm, mkv, mov, avi, flv, m3u8, ts, 3gp… |
| Audio | mp3, wav, aac, flac, m4a, opus, wma, ogg… |
| Image | jpg, png, gif, webp, svg, avif, heic, tiff, ico… |
| Document | doc, docx, ppt, pptx, xls, xlsx, odt, ods, odp, rtf… |
| Archive | zip, rar, tar, gz, 7z, bz2, xz, zst… |
| eBook | epub, mobi, azw3, fb2, djvu… |
| Font | ttf, otf, woff, woff2, eot |
| Text | txt, md, json, xml, ndjson, jsonl… |
| APK | apk, xapk, apks |
| Contact | emails, phones, social profiles, addresses |
| Subtitle | srt, vtt, sub, ass, ssa, sbv… |
| 3D Model | obj, stl, gltf, glb, fbx, blend, dae, ply… |
| Dataset / DB | sql, sqlite, parquet, geojson, csv, feather… |
| Design Asset | psd, ai, sketch, xd, indd, afdesign… |
| Source Code | py, js, ts, java, cpp, go, rs, rb, php, sh… |
| Disk Image | iso, dmg, vmdk, vhd, img, qcow2… |
⚙️ Input Configuration
{"startUrls": [{ "url": "https://example.com" }],"mediaType": "all","maxCrawlDepth": 2,"maxUrlsToCrawl": 100,"concurrentRequests": 10,"maxRequestRetries": 3,"maxBlockedRetries": 3,"downloadDelay": 0,"stayOnDomain": true,"useStealth": true,"solveCloudflare": false,"captureXhr": false,"xhrPattern": ".*","includeContactText": true,"useProxy": { "useApifyProxy": false }}
📊 Output Schema
Each result is one media item:
{"url": "https://example.com/video.mp4","sourceUrl": "https://example.com/gallery","type": "video","subType": null,"title": "Gallery Page","foundAt": "2026-03-30T12:00:00+00:00","foundBy": "dom"}
foundBy values: dom, link-scan, inline, text-scan, xhr-capture
💡 Pro Tips
- Use
mediaType: "all"first to discover what's available - Set
useStealth: true(default) to handle sites with anti-bot protections - Enable
captureXhr: truefor streaming sites that serve media via API calls - Set
solveCloudflare: trueonly for Cloudflare-protected sites (slower) - Use
maxCrawlDepth: 0to scan only the start pages (no link following) - Set
crawlDirto enable pause/resume on long crawls
❓ FAQ
How deep should I crawl?
Start with depth 2 for most sites. Higher depths find more content but take longer.
When should I enable XHR capture?
Enable it for streaming sites, video platforms, or any site that loads media dynamically via JavaScript API calls.
Is Cloudflare bypass necessary?
Only for sites actively protected by Cloudflare challenges. Most sites work fine with useStealth: true alone.
⚠️ Limitations
- Very complex JavaScript-rendered apps may require
captureXhr: true - Cloudflare enterprise plans may still block the solver
- Large crawls with high depth may take substantial time and memory
📮 Need Help?
Contact @thenetaji through the Apify platform for support, implementation questions, or feature requests.