SEO Content Extraction
Pricing
Pay per usage
SEO Content Extraction
Extract SEO-ready content from public web pages with robots.txt checks, strict limits, SSRF protection, and clean structured output.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
ping
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
1
Monthly active users
24 days ago
Last modified
Categories
Share
SEO Content Extraction reads public web pages and returns clean structured data: title, meta description, headings, body text, and normalized links.
It is built for lightweight SEO audits, content inventories, RAG inputs, and agent workflows that need page content without opening unsafe network access.
The Actor is intentionally conservative. It does not log in, bypass access controls, execute page JavaScript, solve CAPTCHAs, or scrape private networks.
What It Returns
Each dataset item contains:
urlandfinalUrl- HTTP
statusCodeandcontentType title- meta
description headings(h1,h2,h3)- cleaned
text - normalized outbound
links
Example Input
{"startUrls": ["https://example.com"],"maxPages": 3,"maxDepth": 1,"sameDomainOnly": true,"respectRobotsTxt": true,"includeLinks": true,"textMaxChars": 4000}
Input Notes
startUrls: 1 to 10 public HTTP/HTTPS URLs.maxPages: 1 to 25 pages per run.maxDepth: 0 to 3 link-following depth.sameDomainOnly: enabled by default.respectRobotsTxt: enabled by default.includeHtml: disabled by default.
Good Uses
- SEO page inventory
- Title, meta description, and heading extraction
- Lightweight content checks for public websites
- RAG and agent data collection from public pages
- Internal link discovery within a small site section
Security And Privacy
The Actor blocks:
- localhost and private network targets
- link-local and metadata IP targets
- special-use hostnames such as
.localand.internal - URLs with embedded credentials
- shell/process/proxy override fields in input JSON
- script-like input strings
The Actor does not accept custom proxy settings, shell commands, environment variables, worker URLs, or worker tokens from callers.
Limitations
This is a public-page content extractor. It is not a browser automation Actor, does not render JavaScript-only content, and is not designed for login-only sites, CAPTCHA flows, anti-bot bypass, or high-volume harvesting.