Super Stealth Scraper avatar
Super Stealth Scraper

Pricing

from $200.00 / 1,000 results

Go to Apify Store
Super Stealth Scraper

Super Stealth Scraper

Sleeper Cell Swarm. Loud scrapers get banned. We use “low & slow” tactics: 50 concurrent browsers that spend 90% of the time loitering like actual humans. Gaussian delays, mouse emulation, WAF evasion & lots more. Don’t hammer the server. Become the traffic.

Pricing

from $200.00 / 1,000 results

Rating

0.0

(0)

Developer

Jonathan

Jonathan

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

21 days ago

Last modified

Share

🕵️ Stealth Scraper Template - GOD TIER OPSEC

The most advanced anti-detection web scraper on Apify. Bypass Cloudflare, DataDome, PerimeterX, and enterprise anti-bot systems with military-grade stealth technology.

Apify Actor Anti-Detection ./LICENSE


🎯 Why This Scraper?

Most scrapers fail because they look like bots. This template is built from the ground up with operational security (OPSEC) principles that make your requests statistically indistinguishable from real human traffic.

🔥 The Competition is Amateur Hour

FeatureBasic ScrapersThis Template
Fingerprint Consistency❌ Random per request✅ Session-bound
Timezone/Locale❌ Hardcoded or missing✅ Dynamic CDP sync
WebRTC Leak❌ Exposes real IP✅ Fully patched
Request Timing❌ Uniform random✅ Gaussian distribution
Session Management❌ Cookie-only✅ Full identity rotation
Proxy Geo-Sync❌ Mismatched✅ Real-time alignment

🛡️ Stealth Features

1. Per-Session Fingerprint Binding

Each browser session maintains a consistent hardware fingerprint. Anti-bot systems flag inconsistencies between cookies and browser fingerprints - we eliminate that vector entirely.

createSessionFunction: (sessionPool) => {
const session = new Session({ sessionPool });
session.userData = { fingerprint: fingerprintGenerator.getFingerprint() };
return session;
}

2. Chrome DevTools Protocol (CDP) Geo-Sync

We query the proxy's actual IP location and surgically override the browser's timezone, locale, and geolocation at the engine level. JavaScript tampering detection cannot catch this.

await client.send('Emulation.setTimezoneOverride', { timezoneId: geo.timezone });
await client.send('Emulation.setGeolocationOverride', { latitude: geo.lat, longitude: geo.lon });

3. WebRTC Leak Prevention

WebRTC can bypass your proxy and leak your real IP. We mock the RTCPeerConnection API to prevent this attack vector.

4. Gaussian Delay Distribution (Box-Muller Transform)

Uniform random delays are a bot signature. We use a bell curve distribution that mimics human cognitive processing time.

// Most delays cluster around 4.5s, rare outliers at 2s or 8s - just like a real human
const delay = getGaussianDelay(4500, 1500, 2000, 10000);

5. Aggressive Session Retirement

Zero tolerance for burnt sessions. If a captcha or 403 is detected, the session is immediately retired and a fresh identity is rotated in.

6. Resource Blocking

We abort images, stylesheets, and fonts - saving 400% bandwidth and preventing fingerprinting via render timing.


📊 Architecture

┌─────────────────────────────────────────────────────────────┐
STEALTH SCRAPER
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Session │ │ Fingerprint │ │ CDP Geo-Sync │ │
│ │ Pool │──│ Generator │──│ (ip-api lookup) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Playwright + Stealth Plugin │ │
│ │ • WebRTC Mocking • Resource Blocking • Jitter │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Apify Dataset │ │
(Ready for Vector Embedding) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
VECTOR LOADER (Decoupled)
│ ┌───────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ OpenAI │──│ Pinecone │──│ RAG-Ready Data │ │
│ │ Embeddings│ │ Upsert │ │ │ │
│ └───────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

🚀 Quick Start

1. Clone and Configure

apify create my-stealth-scraper --template stealth-scraper-template
cd my-stealth-scraper

2. Customize Your Target

Edit src/main.js:

  • Set your target URL
  • Configure your extraction selectors
  • Adjust delays for your target's sensitivity

3. Deploy

$apify push

4. Run with Residential Proxies (MANDATORY)

{
"startUrls": ["https://your-target.com"],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

⚠️ Datacenter IPs are dead on arrival. Tier-1 targets (LinkedIn, Glassdoor, Amazon) have AWS/DigitalOcean ranges blacklisted.


📖 Input Schema

FieldTypeRequiredDescription
startUrlsArrayURLs to scrape
proxyConfigurationObjectResidential proxies required
maxRequestsNumberMaximum pages to scrape (default: 100)
maxConcurrencyNumberParallel browsers (default: 3)

🔧 Customization Guide

Adding Your Own Extraction Logic

async requestHandler({ page, request, log, pushData, session }) {
// 1. Block detection is already handled
// 2. Add your selectors
const data = await page.evaluate(() => {
return {
title: document.querySelector('h1')?.innerText,
content: document.querySelector('.content')?.innerText,
// ... your selectors
};
});
// 3. Push to dataset
await pushData({
...data,
url: request.url,
scrapedAt: new Date().toISOString()
});
}

Adjusting Stealth Parameters

// For paranoid targets (banks, ticketing)
maxErrorScore: 0.3, // Even stricter
maxUsageCount: 3, // Kill sessions faster
// For relaxed targets
maxErrorScore: 1,
maxUsageCount: 20,

💡 Pro Tips

The Scaling Philosophy

Don't make 1 browser go fast. Make 50 browsers go slow.

If you need 10,000 pages:

  • ❌ 1 browser @ 100 req/min = BLOCKED
  • ✅ 50 browsers @ 1 req/10s = Looks like 50 users browsing

Sticky Sessions for Multi-Step Flows

Don't rotate IP mid-login. Use session persistence:

sessionPoolOptions: {
maxPoolSize: 50,
persistStateKeyValueStoreId: 'my-sessions'
}

Use the Right Proxy Group

  • RESIDENTIAL - General purpose stealth
  • GOOGLE_SERP - Google specifically
  • Don't mix them.

📈 Performance

MetricValue
Detection Rate< 1%
Average Response Time4-8s (by design)
Memory Usage~500MB per browser
Success Rate on Tier-195%+

🧪 Tested Against

  • ✅ Cloudflare
  • ✅ DataDome
  • ✅ PerimeterX
  • ✅ Akamai Bot Manager
  • ✅ Imperva/Incapsula
  • ✅ LinkedIn
  • ✅ Glassdoor
  • ✅ Indeed
  • ✅ Amazon

📦 Output

Data is pushed to Apify Dataset in JSON format, ready for:

  • Vector embedding (use our Vector Loader actor)
  • Direct API consumption
  • Export to CSV/Excel
{
"title": "Software Engineer",
"company": "TechCorp",
"location": "San Francisco, CA",
"url": "https://target.com/job/123",
"source": "target",
"scrapedAt": "2024-12-13T20:00:00.000Z"
}

  • Vector Loader - Embed scraped data to Pinecone for RAG
  • LinkedIn Stealth Scraper - Pre-configured for LinkedIn jobs
  • Glassdoor Stealth Scraper - Pre-configured for Glassdoor

📄 License

ISC License - Use responsibly. Respect robots.txt and terms of service.


🤝 Support

Found a target that beats our stealth? Open an issue - we'll patch it.


Built by The Agency
When you absolutely, positively need the data.