Scrapie

Under maintenance

Pricing

$3.00 / 1,000 results

Try for free

Go to Apify Store

Scrapie

Under maintenance

Try for free

Developed by

Saksham Sharma

Maintained by Community

Scrapie is a production-ready microservice that scrapes public web pages (including JS-heavy sites) and returns structured JSON with: title, description headings features pricing testimonials links, images contact info (emails, phones) raw text

0.0 (0)

Pricing

$3.00 / 1,000 results

Last modified

23 days ago

Lead generation

SEO tools

Scrapie - Web Scraping API

Scrapie is a production-ready microservice that scrapes public web pages (including JS-heavy sites) and returns structured JSON with:

title, description
headings
features
pricing
testimonials
links, images
contact info (emails, phones)
raw text

Runs on Node.js with Playwright for full JS rendering. Ships with OpenAPI docs, rate limiting, caching, and API key auth.

Features
Endpoints & Auth
Request/Response schema
Error codes
Environment variables
Rate limiting & caching
Usage examples
- curl
- Node (fetch/axios)
- Next.js (App Router & Pages Router)
- React (client) via backend proxy
- Simple Node/Express proxy
SDK-style minimal client
Swagger/OpenAPI usage
Deployment notes (Render, Docker)
RapidAPI / Apify listing
Best practices & limitations

Features

Playwright-powered scraping for JS-heavy websites
Heuristic extraction for common landing page sections (pricing, testimonials, features)
API key auth via header x-api-key
In-memory caching (per-instance) with configurable TTL
Rate limiting (per IP)
Swagger UI at /docs and OpenAPI spec in openapi.yaml

Endpoints & Auth

Base URL (local): http://localhost:3000
Health: GET /health → { status: "ok" }
Docs: GET /docs
Scrape: POST /scrape (requires API key)

Auth header:

x-api-key: <your-api-key>

Never include your API key in public client-side code. Use a server-side proxy (examples below).

Request schema

POST /scrape

{
  "url": "https://example.com",
  "waitUntil": "networkidle",
  "timeoutMs": 45000,
  "userAgent": "<optional-ua>"
}

waitUntil: one of load | domcontentloaded | networkidle | commit (default networkidle)
timeoutMs: 1000..120000 (default 45000)
userAgent: optional custom UA string

Response schema (200)

{
  "cached": false,
  "data": {
    "url": "https://example.com",
    "title": "...",
    "description": "...",
    "headings": [{"tag":"h1","text":"..."}],
    "features": ["..."],
    "pricing": ["..."],
    "testimonials": ["..."],
    "links": [{"href":"/contact","text":"Contact"}],
    "images": [{"src":"/logo.png","alt":"Logo"}],
    "contacts": {"emails":["..."], "phones":["..."]},
    "rawText": "..."
  }
}

Error codes

400: invalid request body
401: missing/invalid API key
500: scrape failure (navigation timeout, blocked, etc.)

Environment variables

Set these in .env (local) or provider dashboard (Render):

API_KEY (required): your secret key
API_KEYS (optional): comma-separated list of allowed keys
PORT (default 3000)
RATE_LIMIT_PER_MIN (default 30)
CACHE_TTL_SECONDS (default 300)
EXTRA_WAIT_MS (default 1000): extra wait after load for late content

Rate limiting & caching

Rate limit applies per IP per minute.
Cache: responses per URL are cached in-memory for CACHE_TTL_SECONDS on a single instance.
- Horizontal scaling instances won’t share cache; use a distributed cache (e.g., Redis) if needed.

Usage examples

curl

curl -X POST https://your-service/scrape \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{"url":"https://example.com","waitUntil":"networkidle"}'

Node (native fetch)

const res = await fetch('https://your-service/scrape', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': process.env.SCRAPIE_API_KEY,
  },
  body: JSON.stringify({ url: 'https://example.com' }),
});
const json = await res.json();

Node (axios)

const axios = require('axios');

const { data } = await axios.post(
  'https://your-service/scrape',
  { url: 'https://example.com', waitUntil: 'networkidle' },
  { headers: { 'x-api-key': process.env.SCRAPIE_API_KEY } }
);

Next.js (App Router: route handler, server-side)

Never call the Scrapie service from a client component with your secret key. Create a route handler as a proxy.

app/api/scrape/route.ts

import { NextResponse } from 'next/server';

export async function POST(req: Request) {
  const body = await req.json(); // { url, waitUntil, timeoutMs, userAgent }
  const res = await fetch(process.env.SCRAPIE_BASE_URL + '/scrape', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.SCRAPIE_API_KEY!,
    },
    body: JSON.stringify(body),
    // Optionally set a timeout with AbortController
  });
  const data = await res.json();
  return NextResponse.json(data, { status: res.status });
}

Usage in a Client Component:

// in a client component
const res = await fetch('/api/scrape', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ url: 'https://example.com' }),
});
const data = await res.json();

Next.js (Pages Router: API route, server-side)

pages/api/scrape.ts

import type { NextApiRequest, NextApiResponse } from 'next';

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
  if (req.method !== 'POST') return res.status(405).end();

  const upstream = await fetch(process.env.SCRAPIE_BASE_URL + '/scrape', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.SCRAPIE_API_KEY as string,
    },
    body: JSON.stringify(req.body),
  });
  const json = await upstream.json();
  res.status(upstream.status).json(json);
}

React (client) via backend proxy

Do not embed your API key in the browser. Use your backend to proxy requests as shown above (Next.js route or Express server). Client calls your backend endpoint, not Scrapie directly.

Simple Node/Express proxy

const express = require('express');
const fetch = require('node-fetch');

const app = express();
app.use(express.json());

app.post('/api/scrape', async (req, res) => {
  const r = await fetch(process.env.SCRAPIE_BASE_URL + '/scrape', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'x-api-key': process.env.SCRAPIE_API_KEY,
    },
    body: JSON.stringify(req.body),
  });
  const json = await r.json();
  res.status(r.status).json(json);
});

app.listen(4000, () => console.log('Proxy on :4000'));

Minimal SDK-style client

export class ScrapieClient {
  constructor(private baseUrl: string, private apiKey: string) {}
  async scrape(input: { url: string; waitUntil?: 'load'|'domcontentloaded'|'networkidle'|'commit'; timeoutMs?: number; userAgent?: string }) {
    const res = await fetch(`${this.baseUrl}/scrape`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-api-key': this.apiKey,
      },
      body: JSON.stringify(input),
    });
    if (!res.ok) {
      const err = await res.text();
      throw new Error(`Scrapie error ${res.status}: ${err}`);
    }
    return res.json();
  }
}

Swagger/OpenAPI

Live docs: /docs
Spec: openapi.yaml
You can import the spec into Postman or generate clients with openapi-generator-cli.

Deployment

Render.com: use included render.yaml (Docker); set API_KEY in env vars

Docker:

docker build -t scrapie .
docker run -p 3000:3000 --env-file .env scrapie

Best practices & limitations

Respect robots.txt and website terms. Obtain consent where required.
Avoid scraping authenticated or private content.
Heuristic extraction may miss content on atypical layouts; review data.rawText as fallback.
For large-scale use, add a queue, retries, rotating IPs/UAs, and distributed caching.

On this page

Share Actor:

Contact Info Scraper with Emails and Phones

intelecta/fast-contact-info-scraper-with-emails

A powerful Apify actor that scrapes emails, phone numbers, and social media profiles from a list of websites, following internal links for thorough contact extraction. Ideal for lead generation, research, and building structured contact databases.

Intelecta.ai

3.7

Advanced Product Hunt Scraper

danpoletaev/product-hunt-scraper

Scrape product hunt "Top Products Launching Today" section. Actor crawls products and extracts information about the product: title, description, categories, images, maker info with contact links and website info with raw text and email. Export scraped datasets in JSON, csv, etc. Run via API.

Danil Poletaev

538

5.0

Pro Web Content Crawler (With Images)

assertive_analogy/pro-web-content-crawler

Pro Web Content Crawler is a powerful tool that digs deep into web content and images. It handles complex sites, dynamic pages, and hidden content, making it perfect for extracting both data and images. Customizable and API-ready for your unique data needs.

Gideon Nesh

145

5.0

Advanced Contact Details Scraper

ecomscrape/advanced-contact-details-scraper

Extract contact data from any website with advanced Cloudflare bypass technology. Our professional web scraper collects emails, phones, social media profiles from LinkedIn, Facebook, Instagram & more. Get structured JSON data for lead generation, recruitment & market research.

ecomscrape

Facebook Page Contact Info Scraper

powerful_bachelor/Facebook-Page-Contact-Info-Scraper

The Facebook Page Contact Info Scraper 🛠️ is a tool for extracting contact info from Facebook Pages, including addresses, emails, and phone numbers 📞, ideal for market research 📊, competitor analysis 🔍, and strategic planning 📈.

Powerful Bachelor

179

Instantly Scrape Emails, Phones & Social Media from Any Site 🔥

danielnestle2024/instantly-scrape-emails-phones-social-media-from-any-site

🔥 INSTANT LEADS! 🔥 Our scraper extracts emails, phones & social links from any website. Perfect for marketers & businesses to boost contacts & sales. ⚡️ Fast. Targeted. Effective. ➡️ Get your leads now!

Daniel

URL to Metadata

njoylab/url-summary-scraper

A powerful Apify actor that extracts essential website information, including title, description, images, and social media links. Perfect for quick data gathering and insights from any URL.

njoylab

5.0

LeadScraper

cdubiel/lead-scraper

Scrape a list of urls and receive business contact information, social media links, and a description of the services. This actor will scrape across multiple pages in the sitemap and returns a confidence score to every phone number and email that it finds. webscraper, scrape leads, web scraper

Claire Dubiel

Contact Info Scraper

delicious_zebu/contact-info-scraper

Effortlessly scrape contact information, including emails, phone numbers, and social media links like Twitter, YouTube, Facebook, Instagram, TikTok, and LinkedIn, from any website URL.

ВAH

126

Universal Apify Email & Metadata Scraper (Puppeteer + Crawlee)

lucrateresults/universal-apify-email-metadata-scraper-puppeteer-crawlee

Description: A production-ready Apify actor built with PuppeteerCrawler (Crawlee) to extract emails and metadata from public websites. Optimized for parallel crawling, JavaScript rendering, and IP rotation. Disclaimer: Scrape only public data. Respect each site’s terms.

Lucrate Results

Scrapie

Scrapie

Scrapie - Web Scraping API

Contents

Features

Endpoints & Auth

Request schema

Response schema (200)

Error codes

Environment variables

Rate limiting & caching

Usage examples

curl

Node (native fetch)

Node (axios)

Next.js (App Router: route handler, server-side)

Next.js (Pages Router: API route, server-side)

React (client) via backend proxy

Simple Node/Express proxy

Minimal SDK-style client

Swagger/OpenAPI

Deployment

Best practices & limitations

Contact Info Scraper with Emails and Phones

Advanced Product Hunt Scraper

Pro Web Content Crawler (With Images)

Advanced Contact Details Scraper

Facebook Page Contact Info Scraper

Instantly Scrape Emails, Phones & Social Media from Any Site 🔥

URL to Metadata

LeadScraper

Contact Info Scraper

Universal Apify Email & Metadata Scraper (Puppeteer + Crawlee)

Related articles

Scrapie

Scrapie

Scrapie - Web Scraping API

Contents

Features

Endpoints & Auth

Request schema

Response schema (200)

Error codes

Environment variables

Rate limiting & caching

Usage examples

curl

Node (native fetch)

Node (axios)

Next.js (App Router: route handler, server-side)

Next.js (Pages Router: API route, server-side)

React (client) via backend proxy

Simple Node/Express proxy

Minimal SDK-style client

Swagger/OpenAPI

Deployment

Best practices & limitations

You might also like

Contact Info Scraper with Emails and Phones

Advanced Product Hunt Scraper

Pro Web Content Crawler (With Images)

Advanced Contact Details Scraper

Facebook Page Contact Info Scraper

Instantly Scrape Emails, Phones & Social Media from Any Site 🔥

URL to Metadata

LeadScraper

Contact Info Scraper

Universal Apify Email & Metadata Scraper (Puppeteer + Crawlee)

Related articles