JS Content Crawler Lite for RAG
Pricing
Pay per usage
JS Content Crawler Lite for RAG
Extract clean Markdown, text, metadata, links, and diagnostics from web pages. Static-first for low cost, Browserless rendering only when needed.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
George Kioko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
1.8 days
Issues response
5 days ago
Last modified
Categories
Share
Extract clean Markdown, text, metadata, links, headings, images, and quality diagnostics from URL lists. The actor is static-first for margin control and only uses Browserless when JavaScript rendering is required.
Why this actor
Many RAG crawlers run a full browser for every page. That is expensive and slow. This actor tries normal HTTP extraction first, checks content quality, then escalates to Browserless only when renderMode requires it or static output is blocked or too thin.
Inputs
urls: Required list of HTTP or HTTPS URLs. Duplicates are processed once.renderMode:auto,never, oralways. Default isauto.minWords: Minimum visible words forcontent_ready. Default is80.timeoutMs: Per-request timeout. Default is30000.
Output fields
url,finalUrl,title,descriptionmarkdown,text,wordCountheadings,links,imagessourceLane:static,browserless, ornonequalityState:content_ready,blocked,low_content,fetch_failed,render_failed,render_unavailable, orinvalid_urlqualityReasons: Diagnostic reason codesbillingState,chargedEvent
Pricing events
page-extracted: $0.005 per successful static extractionjs-page-rendered: $0.015 per successful Browserless-rendered extraction Failed, blocked, invalid, and low-quality rows are pushed with diagnostics but are not billed.
Browserless setup
Set these environment variables when rendering is needed:
BROWSERLESS_BASE_URL=http://127.0.0.1:3000BROWSERLESS_TOKEN=your-token
If Browserless is not configured and a page needs rendering, the actor emits a diagnostic row and does not charge a rendered event.
Local smoke
npm installapify run -i '{"urls":["https://example.com"],"renderMode":"never","minWords":10}'
Browserless smoke:
$env:BROWSERLESS_BASE_URL="http://127.0.0.1:3000"$env:BROWSERLESS_TOKEN="your-token"apify run -i '{"urls":["https://example.com"],"renderMode":"always","minWords":10}'