Site Researcher

Pricing

from $5.00 / 1,000 results

Site Researcher

Stop wasting hours on competitor research. Paste any URL → get every image, video, and data point in 60 seconds. Full site crawl, tech stack detection, media download. Perfect for agencies, creators, and marketers.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Rahul Hinduja

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Dockerfile

FROM apify/actor-python:3.11

# Copy requirements first for caching
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY . ./

# Run the actor
CMD ["python", "-m", "src"]

requirements.txt

1apify>=1.7.0
2requests>=2.31.0
3beautifulsoup4>=4.12.0
4markdownify>=0.11.0

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "site-researcher",
    "title": "Site Researcher - Full Website Media Scraper",
    "description": "Extract all images, videos, and data from any website. Crawls sitemap, detects tech stack, downloads media for B-roll.",
    "version": "1.2",
    "buildTag": "latest",
    "dockerfile": "./Dockerfile"
}

.actor/dataset_schema.json

{
    "actorSpecification": 1,
    "title": "Site Research Dossier",
    "description": "Complete website research output including pages, tech stack, and media files",
    "type": "object",
    "properties": {
        "domain": {
            "title": "Domain",
            "type": "string",
            "description": "The domain that was researched",
            "example": "stripe.com"
        },
        "crawled_at": {
            "title": "Crawled At",
            "type": "string",
            "format": "date-time",
            "description": "ISO timestamp when the research was performed",
            "example": "2024-12-15T06:30:00Z"
        },
        "sitemap_total": {
            "title": "Total Sitemap URLs",
            "type": "integer",
            "description": "Total number of URLs found in sitemap (may exceed stored URLs)",
            "example": 12500
        },
        "sitemap_urls": {
            "title": "Sitemap URLs",
            "type": "array",
            "description": "Up to 500 URLs discovered from sitemap",
            "items": {
                "type": "string"
            }
        },
        "tech_stack": {
            "title": "Tech Stack",
            "type": "object",
            "description": "Technologies detected on the website",
            "properties": {
                "shopify": { "type": "boolean" },
                "wordpress": { "type": "boolean" },
                "react": { "type": "boolean" },
                "vue": { "type": "boolean" },
                "webflow": { "type": "boolean" },
                "wix": { "type": "boolean" },
                "squarespace": { "type": "boolean" },
                "stripe": { "type": "boolean" },
                "google_analytics": { "type": "boolean" },
                "facebook_pixel": { "type": "boolean" },
                "cloudflare": { "type": "boolean" },
                "server": { "type": "string" }
            }
        },
        "pages": {
            "title": "Crawled Pages",
            "type": "array",
            "description": "Detailed data for each crawled page",
            "items": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "Page URL",
                        "type": "string"
                    },
                    "title": {
                        "title": "Page Title",
                        "type": "string"
                    },
                    "description": {
                        "title": "Meta Description",
                        "type": "string"
                    },
                    "og_data": {
                        "title": "Open Graph Data",
                        "type": "object",
                        "description": "Facebook/Twitter sharing metadata"
                    },
                    "json_ld": {
                        "title": "JSON-LD",
                        "type": "array",
                        "description": "Structured data (Schema.org)"
                    },
                    "headings": {
                        "title": "Headings",
                        "type": "array",
                        "description": "H1-H3 headings on the page"
                    },
                    "images": {
                        "title": "Images",
                        "type": "array",
                        "description": "Images found on this page"
                    },
                    "videos": {
                        "title": "Videos",
                        "type": "array",
                        "description": "Videos found on this page"
                    }
                }
            }
        },
        "media": {
            "title": "Downloaded Media",
            "type": "object",
            "description": "All media files extracted from the site",
            "properties": {
                "images": {
                    "title": "Images",
                    "type": "array",
                    "description": "List of all images with download info",
                    "items": {
                        "type": "object",
                        "properties": {
                            "src": { "type": "string", "title": "Source URL" },
                            "alt": { "type": "string", "title": "Alt Text" },
                            "filename": { "type": "string", "title": "Filename" },
                            "size_bytes": { "type": "integer", "title": "File Size (bytes)" },
                            "downloaded": { "type": "boolean", "title": "Was Downloaded" },
                            "download_url": { "type": "string", "title": "Direct Download URL" }
                        }
                    }
                },
                "videos": {
                    "title": "Videos",
                    "type": "array",
                    "description": "List of all videos with download info",
                    "items": {
                        "type": "object",
                        "properties": {
                            "src": { "type": "string", "title": "Source URL" },
                            "type": { "type": "string", "title": "Video Type" },
                            "filename": { "type": "string", "title": "Filename" },
                            "size_bytes": { "type": "integer", "title": "File Size (bytes)" },
                            "downloaded": { "type": "boolean", "title": "Was Downloaded" },
                            "download_url": { "type": "string", "title": "Direct Download URL" }
                        }
                    }
                }
            }
        }
    }
}

.actor/input_schema.json

{
    "title": "Site Researcher Input",
    "description": "Configuration for website research and media download",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "startUrl": {
            "title": "Website URL",
            "type": "string",
            "description": "URL of the website to research (e.g., https://example.com)",
            "editor": "textfield",
            "prefill": "https://example.com"
        },
        "deepCrawl": {
            "title": "Deep Crawl (All Pages)",
            "type": "boolean",
            "description": "Crawl all pages from sitemap (up to 50 pages)",
            "default": true
        },
        "downloadMedia": {
            "title": "Download Media",
            "type": "boolean",
            "description": "Download all images and videos found",
            "default": true
        },
        "mediaLimit": {
            "title": "Media Limit",
            "type": "integer",
            "description": "Maximum number of media files to download",
            "default": 100,
            "minimum": 1,
            "maximum": 500
        }
    },
    "required": [
        "startUrl"
    ]
}

src/main.py

1import asyncio
2import logging
3from .main import main
4
5# Set up logging
6handler = logging.StreamHandler()
7handler.setFormatter(logging.Formatter('%(asctime)s [%(levelname)s] %(message)s'))
8logging.getLogger().addHandler(handler)
9logging.getLogger().setLevel(logging.INFO)
10
11# Run the main function
12asyncio.run(main())

src/main.py

1"""
2Site Researcher - Apify Actor
3Extract all images, videos, and data from any website.
4
5Features:
6- Crawl sitemap to discover all pages
7- Extract structured data (JSON-LD, meta tags)
8- Detect technology stack
9- Download all media files
10- RAG-efficient Markdown extraction for AI/LLM use
11"""
12
13import json
14import re
15import os
16import tempfile
17import traceback
18from datetime import datetime
19from typing import Optional, Tuple
20from urllib.parse import urljoin, urlparse
21
22import requests
23from apify import Actor
24from bs4 import BeautifulSoup
25from markdownify import markdownify as md
26
27
28# Tech stack detection signatures
29TECH_SIGNATURES = {
30    "shopify": ["cdn.shopify.com", "Shopify.theme", "myshopify.com"],
31    "wordpress": ["wp-content", "wp-includes", "WordPress"],
32    "wix": ["wix.com", "wixstatic.com"],
33    "squarespace": ["squarespace.com", "sqsp.net"],
34    "webflow": ["webflow.com", "website-files.com"],
35    "react": ["__NEXT_DATA__", "reactroot", "_next/"],
36    "vue": ["__VUE__", "vue.js"],
37    "google_analytics": ["gtag(", "google-analytics.com", "googletagmanager.com"],
38    "facebook_pixel": ["fbq(", "connect.facebook.net"],
39    "stripe": ["stripe.com", "Stripe("],
40    "cloudflare": ["cloudflare.com", "cf-ray"],
41}
42
43
44def extract_markdown(soup: BeautifulSoup, url: str) -> str:
45    """
46    Extract RAG-efficient markdown from HTML.
47    Strips boilerplate (nav, footer, scripts) and converts main content to clean markdown.
48    """
49    # Clone soup to avoid modifying original
50    from copy import deepcopy
51    soup_copy = deepcopy(soup)
52    
53    # Remove boilerplate elements that add noise
54    for tag in soup_copy.find_all(['nav', 'header', 'footer', 'aside', 'script', 
55                                    'style', 'noscript', 'iframe', 'form']):
56        tag.decompose()
57    
58    # Remove elements by common boilerplate class/id patterns
59    boilerplate_patterns = ['menu', 'sidebar', 'footer', 'header', 'nav', 'cookie', 
60                           'popup', 'modal', 'advertisement', 'ad-', 'social', 'share']
61    for pattern in boilerplate_patterns:
62        for tag in soup_copy.find_all(class_=re.compile(pattern, re.I)):
63            tag.decompose()
64        for tag in soup_copy.find_all(id=re.compile(pattern, re.I)):
65            tag.decompose()
66    
67    # Try to find main content area
68    main_content = (
69        soup_copy.find('main') or 
70        soup_copy.find('article') or 
71        soup_copy.find(id='content') or
72        soup_copy.find(class_='content') or
73        soup_copy.find('body')
74    )
75    
76    if not main_content:
77        return ""
78    
79    # Convert to markdown
80    try:
81        markdown_text = md(str(main_content), heading_style="ATX", strip=['img', 'a'])
82        
83        # Clean up excessive whitespace
84        lines = markdown_text.split('\n')
85        cleaned_lines = []
86        prev_empty = False
87        for line in lines:
88            line = line.rstrip()
89            is_empty = len(line.strip()) == 0
90            if is_empty and prev_empty:
91                continue
92            cleaned_lines.append(line)
93            prev_empty = is_empty
94        
95        markdown_text = '\n'.join(cleaned_lines).strip()
96        
97        # Limit size for RAG efficiency (max 50KB per page)
98        if len(markdown_text) > 50000:
99            markdown_text = markdown_text[:50000] + "\n\n[Content truncated for size]"
100        
101        return markdown_text
102    except Exception as e:
103        return ""
104
105
106async def main():
107    """Main entry point for the Apify Actor."""
108    async with Actor:
109        # Get input
110        actor_input = await Actor.get_input() or {}
111        
112        start_url = actor_input.get("startUrl", "")
113        deep_crawl = actor_input.get("deepCrawl", True)
114        download_media = actor_input.get("downloadMedia", True)
115        media_limit = actor_input.get("mediaLimit", 100)
116        
117        if not start_url:
118            Actor.log.error("No startUrl provided!")
119            return
120        
121        Actor.log.info(f"🔍 Researching: {start_url}")
122        
123        # Initialize session
124        session = requests.Session()
125        session.headers.update({
126            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
127        })
128        
129        # Parse URL
130        parsed = urlparse(start_url)
131        domain = parsed.netloc
132        base_url = f"{parsed.scheme}://{domain}"
133        
134        # Research result
135        research = {
136            "domain": domain,
137            "crawled_at": datetime.now().isoformat(),
138            "pages": [],
139            "sitemap_urls": [],
140            "tech_stack": {},
141            "media": {"images": [], "videos": []}
142        }
143        
144        # Get sitemap URLs (limit to prevent dataset size issues)
145        Actor.log.info("🗺️ Discovering sitemap...")
146        all_sitemap_urls = get_sitemap_urls(session, base_url)
147        research["sitemap_urls"] = all_sitemap_urls[:500]  # Limit to 500 URLs
148        research["sitemap_total"] = len(all_sitemap_urls)
149        Actor.log.info(f"   Found {len(all_sitemap_urls)} URLs (storing {len(research['sitemap_urls'])})")
150        
151        # Detect tech stack
152        Actor.log.info("🔧 Detecting tech stack...")
153        research["tech_stack"] = detect_tech(session, start_url)
154        
155        # Crawl main page
156        Actor.log.info("📥 Crawling main page...")
157        main_page = crawl_page(session, start_url)
158        if main_page:
159            research["pages"].append(main_page)
160        
161        # Deep crawl if enabled
162        if deep_crawl and research["sitemap_urls"]:
163            pages_to_crawl = [u for u in research["sitemap_urls"] if not u.endswith('.xml')][:50]
164            Actor.log.info(f"🕷️ Deep crawling {len(pages_to_crawl)} pages...")
165            
166            for i, page_url in enumerate(pages_to_crawl, 1):
167                if page_url != start_url:
168                    Actor.log.info(f"   [{i}/{len(pages_to_crawl)}] {page_url[:60]}...")
169                    page_data = crawl_page(session, page_url)
170                    if page_data:
171                        research["pages"].append(page_data)
172        
173        # Collect all media
174        all_images = []
175        all_videos = []
176        seen_urls = set()
177        
178        for page in research["pages"]:
179            for img in page.get("images", []):
180                if img["src"] not in seen_urls and img["src"].startswith("http"):
181                    seen_urls.add(img["src"])
182                    all_images.append(img)
183            for vid in page.get("videos", []):
184                if vid["src"] not in seen_urls and vid["src"].startswith("http"):
185                    seen_urls.add(vid["src"])
186                    all_videos.append(vid)
187        
188        # Limit media
189        all_images = all_images[:media_limit]
190        all_videos = all_videos[:media_limit]
191        
192        research["media"]["images"] = all_images
193        research["media"]["videos"] = all_videos
194        
195        # Download media if enabled
196        if download_media:
197            Actor.log.info(f"📥 Processing {len(all_images)} images, {len(all_videos)} videos...")
198            
199            MAX_FILE_SIZE = 100 * 1024 * 1024  # 100MB max per file
200            downloaded_count = 0
201            
202            for i, img in enumerate(all_images, 1):
203                try:
204                    filename = get_filename(img["src"]) or f"image_{i}.jpg"
205                    img["filename"] = filename
206                    
207                    # Stream download the file
208                    content, size = download_file_streaming(session, img["src"], MAX_FILE_SIZE)
209                    img["size_bytes"] = size
210                    
211                    if content is not None:
212                        await Actor.set_value(f"img_{filename}", content, content_type="image/jpeg")
213                        img["downloaded"] = True
214                        downloaded_count += 1
215                        size_str = f"{size//1024}KB" if size < 1024*1024 else f"{size//1024//1024}MB"
216                        Actor.log.info(f"   📷 [{i}] {filename[:40]} ({size_str})")
217                    else:
218                        img["downloaded"] = False
219                        img["download_url"] = img["src"]
220                        size_str = f"{size//1024//1024}MB" if size > 0 else "unknown"
221                        Actor.log.info(f"   🔗 [{i}] {filename[:40]} - URL provided ({size_str})")
222                except Exception as e:
223                    img["downloaded"] = False
224                    img["download_url"] = img["src"]
225                    Actor.log.warning(f"   ❌ [{i}] Image error: {type(e).__name__}: {str(e)[:80]}")
226            
227            # Process videos - same streaming approach
228            for i, vid in enumerate(all_videos, 1):
229                try:
230                    filename = get_filename(vid["src"]) or f"video_{i}.mp4"
231                    vid["filename"] = filename
232                    
233                    # Stream download the file
234                    content, size = download_file_streaming(session, vid["src"], MAX_FILE_SIZE)
235                    vid["size_bytes"] = size
236                    
237                    if content is not None:
238                        await Actor.set_value(f"vid_{filename}", content, content_type="video/mp4")
239                        vid["downloaded"] = True
240                        downloaded_count += 1
241                        size_str = f"{size//1024}KB" if size < 1024*1024 else f"{size//1024//1024}MB"
242                        Actor.log.info(f"   🎬 [{i}] {filename[:40]} ({size_str})")
243                    else:
244                        vid["downloaded"] = False
245                        vid["download_url"] = vid["src"]
246                        size_str = f"{size//1024//1024}MB" if size > 0 else "unknown"
247                        Actor.log.info(f"   🔗 [{i}] {filename[:40]} - URL provided ({size_str})")
248                except Exception as e:
249                    vid["downloaded"] = False
250                    vid["download_url"] = vid["src"]
251                    Actor.log.warning(f"   ❌ [{i}] Video error: {type(e).__name__}: {str(e)[:80]}")
252            
253            Actor.log.info(f"   ✅ Downloaded {downloaded_count} files, {len(all_images) + len(all_videos) - downloaded_count} URLs provided")
254
255        
256        # Truncate page data to prevent hitting 9MB dataset limit
257        for page in research["pages"]:
258            # Limit headings per page
259            if len(page.get("headings", [])) > 50:
260                page["headings"] = page["headings"][:50]
261            # Limit images/videos metadata per page (full list is in media section)
262            if len(page.get("images", [])) > 20:
263                page["images"] = page["images"][:20]
264            if len(page.get("videos", [])) > 20:
265                page["videos"] = page["videos"][:20]
266            # Truncate very long descriptions
267            if len(page.get("description", "")) > 500:
268                page["description"] = page["description"][:500] + "..."
269        
270        # Check estimated size before push
271        import sys
272        estimated_size = sys.getsizeof(json.dumps(research))
273        Actor.log.info(f"📊 Estimated output size: {estimated_size / 1024 / 1024:.2f} MB")
274        
275        # Push results to dataset
276        await Actor.push_data(research)
277        
278        Actor.log.info(f"✅ Complete: {len(research['pages'])} pages, {len(all_images)} images, {len(all_videos)} videos")
279
280
281def get_sitemap_urls(session: requests.Session, base_url: str) -> list:
282    """Extract all URLs from sitemap."""
283    urls = []
284    sitemap_locations = [
285        f"{base_url}/sitemap.xml",
286        f"{base_url}/sitemap_index.xml",
287    ]
288    
289    for sitemap_url in sitemap_locations:
290        try:
291            resp = session.get(sitemap_url, timeout=30)
292            if resp.status_code == 200:
293                urls.extend(parse_sitemap(session, resp.text))
294                break
295        except:
296            continue
297    
298    return list(set(urls))
299
300
301def parse_sitemap(session: requests.Session, xml_content: str) -> list:
302    """Parse sitemap XML."""
303    urls = []
304    loc_pattern = r'<loc>(.*?)</loc>'
305    matches = re.findall(loc_pattern, xml_content, re.IGNORECASE)
306    
307    for match in matches:
308        url = match.strip()
309        if url.endswith('.xml'):
310            try:
311                resp = session.get(url, timeout=30)
312                if resp.status_code == 200:
313                    urls.extend(parse_sitemap(session, resp.text))
314            except:
315                pass
316        else:
317            urls.append(url)
318    
319    return urls
320
321
322def crawl_page(session: requests.Session, url: str) -> Optional[dict]:
323    """Crawl a single page and extract data."""
324    try:
325        resp = session.get(url, timeout=30)
326        if resp.status_code != 200:
327            return None
328        
329        soup = BeautifulSoup(resp.text, 'html.parser')
330        
331        page = {
332            "url": url,
333            "title": "",
334            "description": "",
335            "markdown_content": "",
336            "og_data": {},
337            "json_ld": [],
338            "headings": [],
339            "images": [],
340            "videos": []
341        }
342        
343        # Title
344        title_tag = soup.find('title')
345        page["title"] = title_tag.get_text(strip=True) if title_tag else ""
346        
347        # Meta description
348        desc_tag = soup.find('meta', attrs={'name': 'description'})
349        page["description"] = desc_tag.get('content', '') if desc_tag else ""
350        
351        # Open Graph
352        for og_tag in soup.find_all('meta', property=re.compile(r'^og:')):
353            prop = og_tag.get('property', '').replace('og:', '')
354            page["og_data"][prop] = og_tag.get('content', '')
355        
356        # JSON-LD
357        for script in soup.find_all('script', type='application/ld+json'):
358            try:
359                data = json.loads(script.string)
360                page["json_ld"].append(data)
361            except:
362                pass
363        
364        # Headings
365        for level in range(1, 4):
366            for h in soup.find_all(f'h{level}'):
367                text = h.get_text(strip=True)
368                if text:
369                    page["headings"].append({"level": level, "text": text[:100]})
370        
371        # Images
372        for img in soup.find_all('img'):
373            src = img.get('src') or img.get('data-src') or ''
374            if src:
375                page["images"].append({
376                    "src": urljoin(url, src),
377                    "alt": img.get('alt', '')
378                })
379        
380        # Videos
381        for video in soup.find_all('video'):
382            src = video.get('src') or ''
383            if src:
384                page["videos"].append({"src": urljoin(url, src), "type": "video"})
385            for source in video.find_all('source'):
386                src = source.get('src', '')
387                if src:
388                    page["videos"].append({"src": urljoin(url, src), "type": source.get('type', 'video')})
389        
390        # Videos in scripts
391        for script in soup.find_all('script'):
392            if script.string:
393                for pattern in [r'https?://[^\s"]+\.mp4[^\s"]*']:
394                    for match in re.findall(pattern, script.string):
395                        page["videos"].append({"src": match.rstrip('"\',;'), "type": "embedded"})
396        
397        # RAG-efficient Markdown extraction
398        page["markdown_content"] = extract_markdown(soup, url)
399        
400        return page
401        
402    except Exception as e:
403        return None
404
405
406def detect_tech(session: requests.Session, url: str) -> dict:
407    """Detect technology stack."""
408    tech = {}
409    try:
410        resp = session.get(url, timeout=30)
411        html = resp.text
412        headers = dict(resp.headers)
413        
414        for tech_name, signatures in TECH_SIGNATURES.items():
415            for sig in signatures:
416                if sig.lower() in html.lower() or sig.lower() in str(headers).lower():
417                    tech[tech_name] = True
418                    break
419        
420        if 'Server' in headers:
421            tech['server'] = headers['Server']
422    except:
423        pass
424    
425    return tech
426
427
428def get_filename(url: str) -> str:
429    """Extract filename from URL."""
430    parsed = urlparse(url)
431    filename = parsed.path.split('/')[-1] or ""
432    # Clean filename
433    filename = re.sub(r'[^\w\-_\.]', '_', filename)[:80]
434    return filename
435
436
437def download_file_streaming(session: requests.Session, url: str, max_size: int = 100*1024*1024) -> Tuple[Optional[bytes], int]:
438    """
439    Stream download a file in chunks to avoid loading large files into memory all at once.
440    Uses a temp file to accumulate chunks, then reads back the complete content.
441    
442    Args:
443        session: Requests session
444        url: URL to download
445        max_size: Maximum file size to download (default 100MB)
446    
447    Returns:
448        Tuple of (file_content or None, total_size)
449        Returns None if file exceeds max_size or download fails
450    """
451    tmp_path = None
452    try:
453        # Use session headers (already has User-Agent), don't override
454        with session.get(url, stream=True, timeout=120) as resp:
455            if resp.status_code != 200:
456                return None, 0
457            
458            # Check content type - reject HTML error pages
459            content_type = resp.headers.get('content-type', '')
460            if 'text/html' in content_type.lower():
461                return None, 0  # Got error page, not media
462            
463            total_size = 0
464            with tempfile.NamedTemporaryFile(delete=False) as tmp:
465                tmp_path = tmp.name
466                for chunk in resp.iter_content(chunk_size=1024*1024):  # 1MB chunks
467                    if chunk:
468                        total_size += len(chunk)
469                        if total_size > max_size:
470                            return None, total_size  # File too large
471                        tmp.write(chunk)
472            
473            # Read the complete file back
474            with open(tmp_path, 'rb') as f:
475                content = f.read()
476            
477            return content, total_size
478            
479    except Exception as e:
480        return None, 0
481    finally:
482        # Clean up temp file
483        if tmp_path and os.path.exists(tmp_path):
484            try:
485                os.unlink(tmp_path)
486            except:
487                pass

src/pycache/main.cpython-311.pyc

Download

src/pycache/main.cpython-311.pyc

Download

Website To Video

apiritif/website-to-video

Transform any webpage into a dynamic scrolling video with our Website to Video actor. Simply input a URL, and receive a high-quality video showcasing the entire page content.

Apiritif

165

Website GIF Generator

powerful_bachelor/website-gif-generator

🎥 Website GIF Generator: Capture dynamic, high-quality GIFs of any website with automated scrolling! 🚀 Features timestamps, customizable quality, and batch processing. Perfect for demos, documentation, and marketing. Transform web experiences into engaging visuals with just a URL! ✨

Powerful Bachelor

GIF Scroll Animation

glenn/gif-scroll-animation

Free tool to automatically create an animated GIF of any scrolling web page. Useful for testing UX, showcasing your work, and capturing any website as a GIF, including clickable elements and animations. Includes settings to adjust speed, wait before scrolling, slow down on-page animations, and more.

Glenn Goossens

5.2K

2.0

Website Screenshotter

eunit/website-screenshotter

Capture high-quality website screenshots. Features full-page capture, mobile emulation, custom resolutions, and export to JPG, PNG, or PDF. Fast and reliable. Export your result in any format of your choosing.

Emmanuel Uchenna

5.0

Screenshot: Website Image and Video Capture

yeyo/screenshot

Capture high-quality screenshots and videos of any website. Perfect for design verification, content creation, and website snapshots. Compatible across diverse device resolutions. Instant results.

sametcodes

272

5.0

Screenshot Websites With Ease

erinle_sam/screenshot-webapps-with-ease

Capture pixel-perfect screenshots and PDFs of any website. Features smart crawling, auto-scrolling, authentication, and waiting strategies for SPAs.

Erinle Samuel

Website Screenshot Generator

apify/screenshot-url

Create a screenshot of a website based on a specified URL. The screenshot is stored as the output in a key-value store. It can be used to monitor web changes regularly after setting up the scheduler.

Apify

4.4

Screenshot

dz_omar/screenshot

📸 Capture high-quality screenshots of any website with full-page support, custom viewports, and cookie authentication. Perfect for web monitoring, documentation, competitor analysis, and automated testing. Features smart loading detection and 30-second interval pricing for cost efficiency.