Proxy Page to Markdown scraper
Pricing
from $5.00 / 1,000 markdown results
Proxy Page to Markdown scraper
Fetches pages through Apify proxy in your chosen country (residential or datacenter). Returns clean markdown per URL; optional unique outbound domains for brand checks. Cheerio first, Playwright fallback. Social URLs → blocked_social.
Pricing
from $5.00 / 1,000 markdown results
Rating
0.0
(0)
Developer
Olek Coder
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
10 days ago
Last modified
Share
Fetch web pages through Apify Proxy in a country you choose (residential or datacenter) and get clean markdown per URL. Optional outbound domain extraction for brand or affiliate checks. Uses fast HTTP (Cheerio) first, then Playwright when the page needs rendering.
Actor ID: olek_automate~proxy-page-to-markdown
This Actor is not affiliated with any third-party website you scrape. You are responsible for complying with each site's terms of use and applicable laws.
Features
- Geo-targeted requests via Apify Proxy (
country+RESIDENTIAL/DATACENTER) - One dataset row per input URL (including failures and blocked social URLs)
- Readability-based main content → markdown with links preserved
- Optional external links: one sample URL per external domain (filtered share buttons, analytics, assets)
- Social profile/post URLs →
blocked_social(use dedicated social Actors instead)
Input
| Field | Default | Description |
|---|---|---|
country | US | Two-letter ISO proxy country (required) |
urls | https://example.com | Page URLs to scrape (max maxUrls per run) |
batchSize | 50 | Internal batch size (10–100) |
maxUrls | 1000 | Hard cap on URLs per run |
extractExternalLinks | false | Unique external domains (brand/leak checks) |
maxExternalLinks | 100 | Max domains per page when links enabled |
usePlaywrightFallback | true | Retry thin Cheerio results with Chrome |
minContentLength | 200 | Min markdown length before Playwright |
proxyType | RESIDENTIAL | RESIDENTIAL or DATACENTER |
Output
Each input URL produces one dataset item.
status | Meaning |
|---|---|
success_static | Markdown via HTTP/Cheerio |
success_rendered | Markdown via Playwright |
blocked_social | Social URL — not scraped |
failed_fetch | HTTP or network error |
failed_dynamic | Empty content after Playwright |
failed_timeout | Request timeout |
Example (success)
{"url": "https://example.com","country": "US","status": "success_static","title": "Example Domain","markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...","externalLinks": null,"method": "cheerio","httpStatus": 200,"errorMessage": null,"fetchedAt": "2026-05-23T12:00:00.000Z","billable": true}
Example (blocked_social)
{"url": "https://www.instagram.com/someprofile/","country": "US","status": "blocked_social","platform": "instagram","blockedReason": "social_network_not_supported","message": "Social network URLs are blocked. Use a dedicated social scraper Actor.","markdown": null,"billable": false}
Pricing
When the Actor uses pay-per-event pricing on Apify Store:
- Successful web scrapes (
success_static,success_rendered) charge thescraped-urlevent (once per URL). blocked_socialand failed statuses are not charged for that event.
You also pay standard Apify platform usage (compute, proxy traffic) according to your plan. Residential proxy traffic is typically higher than datacenter.
Check the Pricing tab on this Actor's Store page for current event prices.
Run from API
POST https://api.apify.com/v2/acts/olek_automate~proxy-page-to-markdown/runs?token=YOUR_TOKENContent-Type: application/json{"country": "US","urls": ["https://example.com"],"extractExternalLinks": false}
Read results from the run's defaultDatasetId (one item per URL). Use webhooks on ACTOR.RUN.SUCCEEDED for automation.
Limits and tips
- Up to 1000 URLs per run; the Actor batches internally (
batchSize, default 50). - Use
extractExternalLinks: truemainly for landing / brand pages, not large blogs or news sites. - Some sites block proxies or return 502 — item will be
failed_fetch. - For Instagram, Facebook, LinkedIn, TikTok, X, YouTube, Reddit URLs expect
blocked_social, not markdown.
Support
Open an issue from the Actor page or contact the publisher. For development and deployment notes, see DEVELOPMENT.md in the repository.
