Crawl Apify Store pages with cleaned HTML output

Created by

Hanna Nosova

Actor

Website Content Crawler Lite

Crawl public pages and include cleaned HTML alongside text/Markdown for extraction-quality checks.

Website Content Crawler Litefetch_cat/website-content-crawler-lite

Requested URL

Loaded URL

Title

Description

+12 fields

Input

🌐 Start URLs(required)

url:https://apify.com/store

📄 Maximum pages:3

🔗 Maximum link depth:1

Stay on the same domain:true

Include URL globs:https://apify.com/store**

Exclude URL globs:**/login**+1

Main content format:html

Respect robots.txt:true

Request timeout (seconds):25

Output fields

Requested URL

Loaded URL

Title

Description

H1

Text

Markdown

HTML

Links

Status code

Content type

Depth

Parent URL

Fetched at

Error

Skipped reason

How it works

Sign up on Apify01

Create your Apify account to access the Website Content Crawler Lite.

Start the run02

The Actor will start running based on the input automatically.

Receive the output03

Monitor the progress in real-time. You will be notified as soon as your dataset is complete and ready for review.

Integrate into your workflow04

The final output is delivered in JSON, CSV, or Excel format, ready to be plugged into your workflow.

Integrate Actor directly into your workflow

Choose from one of 100+ integration options we provide or integrate via API

Webhook

n8n

Make

Zapier

Airbyte

Keboola

IFTTT

Hubspot

GDrive

Gmail

Apify MCP

GitHub

Slack

LangChain

LlamaIndex

Flowise

Pinecone

OpenAI

Mastra

Clay