Check Page Sizes avatar

Check Page Sizes

Pricing

$4.99/month + usage

Go to Apify Store
Check Page Sizes

Check Page Sizes

Page size checker that crawls any website and flags HTML pages over 2MB or PDFs over 64MB, the exact thresholds where Google stops indexing — so SEO teams can fix oversized files before they drop from search.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Categories

Share

Page Size Checker: Audit HTML and PDF sizes for Google indexing

Google won't index HTML pages over 2 MB or PDF files over 64 MB. Most sites are fine. But content-heavy sites, documentation hubs, and large PDF libraries can get caught off guard, and you won't know until pages stop showing up in search. This actor crawls every internal page on your site, measures the actual content size, and flags anything that exceeds Google's limits.

Use cases

  • SEO audits: rule out page size as a reason Google stopped indexing certain pages
  • Documentation sites: check whether long-form content pages are pushing past the 2 MB limit
  • PDF libraries: find oversized PDF files before they fall outside Google's 64 MB indexing range
  • Pre-launch checks: run a size audit before deploying a new site or major content update
  • Ongoing monitoring: schedule regular runs to catch newly added pages that grow too large

Input

ParameterTypeDefaultDescription
startUrlstring(required)The website URL to start crawling from
maxUrlsinteger100Maximum number of pages to check
checkPdfsbooleantrueAlso check linked PDF files
htmlSizeLimitMbnumber2Flag HTML pages above this size in MB
pdfSizeLimitMbnumber64Flag PDF files above this size in MB
requestTimeoutSecsinteger30Per-request timeout in seconds

Example input

{
"startUrl": "https://apify.com",
"maxUrls": 500,
"checkPdfs": true,
"htmlSizeLimitMb": 2,
"pdfSizeLimitMb": 64
}

Output

The actor stores one record per page in a dataset. Each entry includes:

{
"url": "https://apify.com/blog/web-scraping-guide",
"pageType": "html",
"sizeBytes": 2458624,
"sizeMb": 2.345,
"limitMb": 2,
"exceedsLimit": true,
"statusCode": 200,
"scrapedAt": "2025-06-01T12:34:56.789Z"
}
FieldTypeDescription
urlstringFinal URL after any redirects
pageTypestringhtml or pdf
sizeBytesintegerDecompressed page size in bytes
sizeMbnumberPage size in megabytes, rounded to 3 decimal places
limitMbnumberApplicable Google indexing limit in MB
exceedsLimitbooleanTrue if the page exceeds the limit
statusCodeintegerHTTP response status code
errorstringError message if the page could not be fetched
scrapedAtstringISO 8601 timestamp

How it works

  1. The actor starts at the URL you provide and fetches the page
  2. It measures the full decompressed content size using the response body
  3. For HTML pages, it extracts all internal links and adds them to the crawl queue
  4. PDF files linked from those pages are optionally checked against the 64 MB limit
  5. Results are pushed to the dataset as each page is checked, with exceedsLimit: true for any pages over the limit

FAQ

Does Google actually stop indexing large pages?

Yes. Google updated its indexing rules to skip HTML files over 2 MB and PDFs over 64 MB. Both are fairly high limits. Most sites won't hit them. But large CMS exports, documentation pages, or auto-generated reports occasionally push past 2 MB.

Does this actor check JavaScript-rendered content?

No. It measures the raw HTML size served by the server, which is what Google's crawler sees. JavaScript that expands the DOM after load is not counted.

Can I adjust the size limits?

Yes. Use htmlSizeLimitMb and pdfSizeLimitMb to set custom thresholds. Setting a lower value, say 1.5 MB, lets you catch pages that are getting close before they actually hit the Google limit.

How many pages can it check?

Up to 1000 per run using the maxUrls input. For larger sites, run multiple times starting from different sections, or increase the limit toward the maximum.

Integrations

Connect Page Size Checker with other apps and services using Apify integrations. You can integrate with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and many more. You can also use webhooks to trigger actions whenever results are available.