
Scrapie
Pricing
$3.00 / 1,000 results

Scrapie
Under maintenanceScrapie is a production-ready microservice that scrapes public web pages (including JS-heavy sites) and returns structured JSON with: title, description headings features pricing testimonials links, images contact info (emails, phones) raw text
0.0 (0)
Pricing
$3.00 / 1,000 results
0
1
1
Last modified
3 days ago
Scrapie - Web Scraping API
Scrapie is a production-ready microservice that scrapes public web pages (including JS-heavy sites) and returns structured JSON with:
- title, description
- headings
- features
- pricing
- testimonials
- links, images
- contact info (emails, phones)
- raw text
Runs on Node.js with Playwright for full JS rendering. Ships with OpenAPI docs, rate limiting, caching, and API key auth.
Contents
- Features
- Endpoints & Auth
- Request/Response schema
- Error codes
- Environment variables
- Rate limiting & caching
- Usage examples
- curl
- Node (fetch/axios)
- Next.js (App Router & Pages Router)
- React (client) via backend proxy
- Simple Node/Express proxy
- SDK-style minimal client
- Swagger/OpenAPI usage
- Deployment notes (Render, Docker)
- RapidAPI / Apify listing
- Best practices & limitations
Features
- Playwright-powered scraping for JS-heavy websites
- Heuristic extraction for common landing page sections (pricing, testimonials, features)
- API key auth via header
x-api-key
- In-memory caching (per-instance) with configurable TTL
- Rate limiting (per IP)
- Swagger UI at
/docs
and OpenAPI spec inopenapi.yaml
Endpoints & Auth
- Base URL (local):
http://localhost:3000
- Health:
GET /health
→{ status: "ok" }
- Docs:
GET /docs
- Scrape:
POST /scrape
(requires API key)
Auth header:
x-api-key: <your-api-key>
Never include your API key in public client-side code. Use a server-side proxy (examples below).
Request schema
POST /scrape
{"url": "https://example.com","waitUntil": "networkidle","timeoutMs": 45000,"userAgent": "<optional-ua>"}
- waitUntil: one of
load | domcontentloaded | networkidle | commit
(defaultnetworkidle
) - timeoutMs: 1000..120000 (default 45000)
- userAgent: optional custom UA string
Response schema (200)
{"cached": false,"data": {"url": "https://example.com","title": "...","description": "...","headings": [{"tag":"h1","text":"..."}],"features": ["..."],"pricing": ["..."],"testimonials": ["..."],"links": [{"href":"/contact","text":"Contact"}],"images": [{"src":"/logo.png","alt":"Logo"}],"contacts": {"emails":["..."], "phones":["..."]},"rawText": "..."}}
Error codes
- 400: invalid request body
- 401: missing/invalid API key
- 500: scrape failure (navigation timeout, blocked, etc.)
Environment variables
Set these in .env
(local) or provider dashboard (Render):
API_KEY
(required): your secret keyAPI_KEYS
(optional): comma-separated list of allowed keysPORT
(default 3000)RATE_LIMIT_PER_MIN
(default 30)CACHE_TTL_SECONDS
(default 300)EXTRA_WAIT_MS
(default 1000): extra wait after load for late content
Rate limiting & caching
- Rate limit applies per IP per minute.
- Cache: responses per URL are cached in-memory for
CACHE_TTL_SECONDS
on a single instance.- Horizontal scaling instances won’t share cache; use a distributed cache (e.g., Redis) if needed.
Usage examples
curl
curl -X POST https://your-service/scrape \-H "Content-Type: application/json" \-H "x-api-key: $API_KEY" \-d '{"url":"https://example.com","waitUntil":"networkidle"}'
Node (native fetch)
const res = await fetch('https://your-service/scrape', {method: 'POST',headers: {'Content-Type': 'application/json','x-api-key': process.env.SCRAPIE_API_KEY,},body: JSON.stringify({ url: 'https://example.com' }),});const json = await res.json();
Node (axios)
const axios = require('axios');const { data } = await axios.post('https://your-service/scrape',{ url: 'https://example.com', waitUntil: 'networkidle' },{ headers: { 'x-api-key': process.env.SCRAPIE_API_KEY } });
Next.js (App Router: route handler, server-side)
Never call the Scrapie service from a client component with your secret key. Create a route handler as a proxy.
app/api/scrape/route.ts
import { NextResponse } from 'next/server';export async function POST(req: Request) {const body = await req.json(); // { url, waitUntil, timeoutMs, userAgent }const res = await fetch(process.env.SCRAPIE_BASE_URL + '/scrape', {method: 'POST',headers: {'Content-Type': 'application/json','x-api-key': process.env.SCRAPIE_API_KEY!,},body: JSON.stringify(body),// Optionally set a timeout with AbortController});const data = await res.json();return NextResponse.json(data, { status: res.status });}
Usage in a Client Component:
// in a client componentconst res = await fetch('/api/scrape', {method: 'POST',headers: { 'Content-Type': 'application/json' },body: JSON.stringify({ url: 'https://example.com' }),});const data = await res.json();
Next.js (Pages Router: API route, server-side)
pages/api/scrape.ts
import type { NextApiRequest, NextApiResponse } from 'next';export default async function handler(req: NextApiRequest, res: NextApiResponse) {if (req.method !== 'POST') return res.status(405).end();const upstream = await fetch(process.env.SCRAPIE_BASE_URL + '/scrape', {method: 'POST',headers: {'Content-Type': 'application/json','x-api-key': process.env.SCRAPIE_API_KEY as string,},body: JSON.stringify(req.body),});const json = await upstream.json();res.status(upstream.status).json(json);}
React (client) via backend proxy
Do not embed your API key in the browser. Use your backend to proxy requests as shown above (Next.js route or Express server). Client calls your backend endpoint, not Scrapie directly.
Simple Node/Express proxy
const express = require('express');const fetch = require('node-fetch');const app = express();app.use(express.json());app.post('/api/scrape', async (req, res) => {const r = await fetch(process.env.SCRAPIE_BASE_URL + '/scrape', {method: 'POST',headers: {'Content-Type': 'application/json','x-api-key': process.env.SCRAPIE_API_KEY,},body: JSON.stringify(req.body),});const json = await r.json();res.status(r.status).json(json);});app.listen(4000, () => console.log('Proxy on :4000'));
Minimal SDK-style client
export class ScrapieClient {constructor(private baseUrl: string, private apiKey: string) {}async scrape(input: { url: string; waitUntil?: 'load'|'domcontentloaded'|'networkidle'|'commit'; timeoutMs?: number; userAgent?: string }) {const res = await fetch(`${this.baseUrl}/scrape`, {method: 'POST',headers: {'Content-Type': 'application/json','x-api-key': this.apiKey,},body: JSON.stringify(input),});if (!res.ok) {const err = await res.text();throw new Error(`Scrapie error ${res.status}: ${err}`);}return res.json();}}
Swagger/OpenAPI
- Live docs:
/docs
- Spec:
openapi.yaml
- You can import the spec into Postman or generate clients with
openapi-generator-cli
.
Deployment
- Render.com: use included
render.yaml
(Docker); setAPI_KEY
in env vars - Docker:
docker build -t scrapie .docker run -p 3000:3000 --env-file .env scrapie
Best practices & limitations
- Respect robots.txt and website terms. Obtain consent where required.
- Avoid scraping authenticated or private content.
- Heuristic extraction may miss content on atypical layouts; review
data.rawText
as fallback. - For large-scale use, add a queue, retries, rotating IPs/UAs, and distributed caching.