Pricing

from $10.00 / 1,000 page curateds

AI Training Data Curator

Crawl websites to extract quality-scored, deduplicated text for LLM fine-tuning and RAG. Built-in PII detection, content fingerprinting, and JSONL/Markdown/plain output formats.

Pricing

from $10.00 / 1,000 page curateds

Rating

0.0

(0)

Developer

ryan clinton

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Why use AI Training Data Curator?

Building high-quality training datasets from web content is tedious and error-prone. You have to strip navigation chrome and ad blocks, filter out thin pages with no real substance, detect and handle duplicate content that inflates dataset size without adding value, and scan for PII that could create compliance issues downstream. Most teams cobble this together with fragile Python scripts, custom regex, and manual spot-checks -- a process that breaks every time a site changes its layout and scales poorly beyond a few hundred pages.

AI Training Data Curator solves all of these problems in a single configurable actor. It uses priority-ordered CSS selectors to find the main content area on any page layout, applies a six-factor quality scoring model to filter out low-value pages automatically, runs trigram-based fingerprinting to catch near-duplicate content even when URLs differ, and detects five categories of PII with optional automatic redaction. You get a clean, scored, deduplicated dataset with rich metadata -- ready to feed directly into your fine-tuning job, embedding pipeline, or vector store -- without writing a single line of preprocessing code.

Key features

Intelligent noise removal -- strips 21 categories of boilerplate HTML including navigation bars, headers, footers, sidebars, ads, cookie banners, modals, comments, and widget containers before extracting content
Priority-ordered content selection -- tries 12 CSS selectors in priority order (from main article down to [role="main"]) to isolate the actual content area, with a body fallback for unconventional layouts
Six-factor quality scoring -- every page receives a 0-to-1 quality score based on content length, text-to-HTML ratio, paragraph structure, sentence quality, vocabulary diversity, and metadata completeness
Trigram-based deduplication -- generates content fingerprints from the first 500 characters using sorted trigram hashes, flagging pages with 80%+ similarity as duplicates per domain
PII detection and redaction -- scans for email addresses, US phone numbers, Social Security Numbers, credit card numbers, and IP addresses with optional automatic redaction to placeholder tokens like [EMAIL] and [PHONE]
HTML-to-Markdown conversion -- converts headers (h1-h6), code blocks, inline code, bold, italic, links, and lists into clean Markdown formatting while collapsing excessive whitespace
Three output formats -- export as JSONL with full metadata fields, Markdown with YAML frontmatter, or stripped plain text depending on your downstream pipeline
Configurable crawl scope -- control maximum pages (up to 10,000), crawl depth (up to 20 levels), minimum content length, quality score threshold, and URL exclusion patterns
Rich per-page metadata -- each output record includes title, description, author, published date, language, word count, content hash, crawl depth, and scrape timestamp
Proxy support -- use Apify datacenter or residential proxies, or provide custom proxy configuration for geo-restricted or rate-limited sites

How to use AI Training Data Curator

Using Apify Console

Navigate to the actor -- go to AI Training Data Curator on Apify and click "Try for free" or "Start".
Enter your start URLs -- add one or more website URLs in the Start URLs field. The actor follows same-origin internal links automatically, so a single homepage URL often covers an entire site.
Configure crawl limits and quality thresholds -- set the maximum pages, crawl depth, minimum content length, and minimum quality score. For a typical documentation site, 100-500 pages at depth 3 with a 0.3 quality threshold works well.
Set PII and output options -- enable PII detection to flag pages containing personal data, optionally enable PII removal to redact with placeholder tokens, and choose your preferred output format (JSONL, Markdown, or plain text).
Run and export -- click "Start" and wait for the run to complete. Download your curated dataset from the Dataset tab as JSON, CSV, JSONL, XML, or Excel. Feed the results directly into your fine-tuning script, vector database, or data pipeline.

Using the API

You can start the actor programmatically via the Apify API, Python SDK, or JavaScript SDK. See the API & Integration section below for complete code examples in Python, JavaScript, and cURL.

Input parameters

Field	Type	Required	Default	Description
`startUrls`	string[]	Yes	--	Starting URLs to crawl and extract training data from
`maxPages`	integer	No	`100`	Maximum number of pages to crawl (1--10,000)
`maxCrawlDepth`	integer	No	`3`	Maximum link-following depth from start URLs (0--20)
`minContentLength`	integer	No	`200`	Minimum text length in characters to keep a page
`minQualityScore`	number	No	`0.3`	Minimum quality score (0--1) to include in output
`detectPII`	boolean	No	`true`	Detect and flag pages containing personally identifiable information
`removePII`	boolean	No	`false`	Redact detected PII with placeholder tokens like `[EMAIL]`, `[PHONE]`
`outputFormat`	string	No	`"jsonl"`	Output format: `jsonl`, `markdown`, or `plain`
`includeMetadata`	boolean	No	`true`	Include metadata (URL, title, timestamps) with extracted content
`deduplicateContent`	boolean	No	`true`	Skip near-duplicate pages based on trigram content similarity
`excludePatterns`	string[]	No	`[]`	URL patterns to exclude (e.g., `/login`, `/cart`, `/admin`)
`proxy`	object	No	--	Proxy configuration for crawling

Example input

{
    "startUrls": [
        "https://docs.example.com",
        "https://blog.example.com"
    ],
    "maxPages": 500,
    "maxCrawlDepth": 3,
    "minContentLength": 300,
    "minQualityScore": 0.5,
    "detectPII": true,
    "removePII": true,
    "outputFormat": "jsonl",
    "includeMetadata": true,
    "deduplicateContent": true,
    "excludePatterns": ["/login", "/signup", "/admin", "/tag/", "/page/"]
}

Tips for input

Start small -- run with 20-50 pages first to verify content quality and tune thresholds before launching a full crawl
Raise quality score for fine-tuning -- set minQualityScore to 0.5 or higher when building LLM training corpora to ensure only well-structured, substantive content passes the filter
Use exclude patterns generously -- add paths like /login, /signup, /cart, /admin, /tag/, /page/ to filter out authentication pages, shopping cart pages, and paginated archive listings
Depth 0 for curated lists -- set maxCrawlDepth to 0 if you provide an explicit list of URLs and do not want the actor to follow any links
Combine PII detection with removal -- enable both detectPII and removePII for production datasets to reduce compliance risk while still tracking which PII types were found

Output

Each crawled page that passes the quality and length filters produces one output record. Below is a realistic example of a single output item.

{
    "url": "https://docs.example.com/guides/getting-started",
    "title": "Getting Started Guide - Example Docs",
    "description": "Learn how to set up and configure Example in under 5 minutes.",
    "author": "Jane Smith",
    "publishedDate": "2024-11-15T10:30:00Z",
    "language": "en",
    "content": "# Getting Started Guide\n\nThis guide walks you through setting up Example from scratch. You will install the CLI, configure your project, and deploy your first application in under five minutes.\n\n## Prerequisites\n\nBefore you begin, make sure you have the following installed:\n\n- Node.js 18 or later\n- npm or yarn package manager\n- A free Example account\n\n## Installation\n\nInstall the Example CLI globally using npm:\n\n```\nnpm install -g @example/cli\n```\n\nVerify the installation by running:\n\n```\nexample --version\n```\n\n## Creating Your First Project\n\nRun the init command to scaffold a new project:\n\n```\nexample init my-project\ncd my-project\n```\n\nThis creates a project directory with the default configuration files and a sample application. Open `example.config.js` to customize your settings.\n\n## Deploying\n\nWhen you are ready, deploy with a single command:\n\n```\nexample deploy\n```\n\nYour application will be live at `https://my-project.example.com` within seconds.",
    "contentLength": 847,
    "wordCount": 138,
    "qualityScore": 0.782,
    "qualityFactors": {
        "contentLength": 0.15,
        "textToHtmlRatio": 0.213,
        "paragraphCount": 0.12,
        "sentenceQuality": 0.13,
        "vocabularyDiversity": 0.079,
        "metadataPresent": 0.1
    },
    "piiDetected": false,
    "piiTypes": [],
    "isDuplicate": false,
    "duplicateOf": null,
    "metadata": {
        "crawlDepth": 1,
        "scrapedAt": "2025-01-20T14:32:17.445Z",
        "contentHash": "a3f2c1b8"
    }
}

Output fields

Field	Type	Description
`url`	string	The final loaded URL of the crawled page
`title`	string	Page title extracted from `<title>`, Open Graph tags, or first `<h1>`
`description`	string or null	Meta description from `<meta name="description">` or Open Graph
`author`	string or null	Author from `<meta name="author">`, `[rel="author"]`, or author CSS class
`publishedDate`	string or null	ISO 8601 publish date from article meta tags or `<time>` elements
`language`	string or null	Language code from the `<html lang>` attribute
`content`	string	Cleaned, formatted text content (format depends on `outputFormat` setting)
`contentLength`	integer	Character count of the cleaned content
`wordCount`	integer	Word count of the cleaned content
`qualityScore`	number	Composite quality score from 0 to 1
`qualityFactors`	object	Breakdown of the six individual quality factor scores
`piiDetected`	boolean	Whether any PII patterns were found in the content
`piiTypes`	string[]	List of PII types detected (e.g., `["email", "phone"]`)
`isDuplicate`	boolean	Whether the page was flagged as a near-duplicate (always `false` in output since duplicates are skipped)
`duplicateOf`	string or null	URL of the original page if a duplicate was detected
`metadata`	object	Crawl metadata including `crawlDepth`, `scrapedAt` timestamp, and `contentHash`

Use cases

LLM fine-tuning datasets -- crawl documentation sites, technical blogs, or niche knowledge bases to build domain-specific corpora for fine-tuning GPT, LLaMA, Mistral, Claude, or other large language models
RAG pipeline ingestion -- extract and clean website content to populate vector databases like Pinecone, Weaviate, ChromaDB, or Qdrant for retrieval-augmented generation workflows
Knowledge base construction -- convert sprawling company wikis, help centers, or support documentation into structured, deduplicated text for internal AI assistants
Academic NLP research -- collect structured text corpora from institutional websites, open-access journals, or government portals for computational linguistics and natural language processing experiments
Content quality auditing -- use the six-factor quality scoring breakdown to benchmark content depth, vocabulary richness, and structural quality across competitor sites or your own properties
PII compliance screening -- audit web-scraped datasets for personally identifiable information before using them in AI training, or automatically redact PII during extraction to meet privacy requirements
Dataset deduplication -- clean up existing web crawl outputs by running them through the trigram fingerprinting pipeline to identify and remove near-duplicate pages that inflate dataset size
Competitive intelligence corpus -- build structured datasets from competitor documentation, product pages, and blog content for market analysis and strategic planning
Open-source training data -- crawl publicly available government websites, Wikipedia sections, or Creative Commons content to assemble openly licensed training datasets

API & Integration

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run_input = {
    "startUrls": ["https://docs.example.com"],
    "maxPages": 500,
    "maxCrawlDepth": 3,
    "minQualityScore": 0.5,
    "detectPII": True,
    "removePII": True,
    "outputFormat": "jsonl",
    "deduplicateContent": True,
}

run = client.actor("1cYb1W8Ik1Vk4hTcW").call(run_input=run_input)

dataset_items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in dataset_items:
    print(f"{item['title']} -- quality: {item['qualityScore']}, words: {item['wordCount']}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("1cYb1W8Ik1Vk4hTcW").call({
    startUrls: ["https://docs.example.com"],
    maxPages: 500,
    maxCrawlDepth: 3,
    minQualityScore: 0.5,
    detectPII: true,
    removePII: true,
    outputFormat: "jsonl",
    deduplicateContent: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.log(`${item.title} -- quality: ${item.qualityScore}, words: ${item.wordCount}`);
});

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/1cYb1W8Ik1Vk4hTcW/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "startUrls": ["https://docs.example.com"],
    "maxPages": 500,
    "maxCrawlDepth": 3,
    "minQualityScore": 0.5,
    "detectPII": true,
    "removePII": true,
    "outputFormat": "jsonl"
  }'

# Retrieve results (replace DATASET_ID with the actual dataset ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

Integrations

Apify API -- trigger runs and retrieve datasets programmatically via REST endpoints
Python SDK -- call from training scripts, Jupyter notebooks, or data pipelines using apify-client
JavaScript SDK -- integrate with Node.js ETL pipelines using apify-client
Zapier -- trigger crawls from events and route curated data to Google Sheets, Airtable, or Slack
Make (Integromat) -- build automated workflows piping curated data to downstream systems
Google Sheets -- export datasets for manual review, labeling, or annotation
Webhooks -- receive POST notifications at your endpoint when a run completes

How it works

AI Training Data Curator processes web content through a six-stage pipeline.

Crawl -- the CheerioCrawler visits each start URL and follows same-origin internal links up to the configured maxCrawlDepth. It runs with 10 concurrent requests, a 60-second handler timeout, and a 30-second navigation timeout. URLs matching excludePatterns are skipped.
Extract -- for each page, 21 noise selectors remove navigation bars, headers, footers, sidebars, ads, cookie banners, modals, comments, and widgets. The actor then tries 12 content selectors in priority order to isolate the main content area, falling back to <body> if none match.
Convert -- the extracted HTML is converted to clean Markdown-formatted text. Headers (h1-h6), code blocks, inline code, bold, italic, links, and lists are preserved as Markdown syntax. Excessive whitespace and blank lines are collapsed.
Score -- each page receives a quality score from 0 to 1 based on six weighted factors: content length (0--0.25), text-to-HTML ratio (0--0.25), paragraph count (0--0.15), sentence quality (0--0.15), vocabulary diversity (0--0.10), and metadata completeness (0--0.10). Pages below minQualityScore are discarded.
Deduplicate -- trigram fingerprints are generated from the first 500 characters of each page. The top 20 sorted trigram hashes form each page's fingerprint. Pages with 80%+ fingerprint overlap against already-processed pages from the same domain are flagged as duplicates and skipped.
PII scan and output -- if enabled, five regex patterns scan for emails, phone numbers, SSNs, credit card numbers, and IP addresses. Detected PII is either flagged or redacted with placeholder tokens. The final content is formatted according to the chosen output format and pushed to the dataset with full metadata.

AI Training Data Curator Pipeline

  +----------+     +-----------+     +-----------+
  |  CRAWL   | --> |  EXTRACT  | --> |  CONVERT  |
  | Start    |     | Remove 21 |     | HTML to   |
  | URLs +   |     | noise     |     | Markdown  |
  | follow   |     | selectors |     | text      |
  | links    |     | + find    |     |           |
  +----------+     | main      |     +-----------+
                   | content   |           |
                   +-----------+           v
                                     +-----------+
  +----------+     +-----------+     |  SCORE    |
  |  OUTPUT  | <-- |  PII      | <-- | 6-factor  |
  | JSONL /  |     |  SCAN     |     | quality   |
  | Markdown |     | Detect or |     | 0 to 1    |
  | / Plain  |     | redact 5  |     | filter    |
  |          |     | PII types |     +-----------+
  +----------+     +-----------+           ^
       |                |                  |
       v                |           +-----------+
  +-----------+         |           | DEDUP     |
  | Dataset   | <-------+           | Trigram   |
  | with full |                     | finger-   |
  | metadata  |                     | printing  |
  +-----------+                     +-----------+

Performance & cost

Scenario	Pages	Estimated time	Estimated cost
Small documentation site	50	~1 minute	Free tier
Medium blog or knowledge base	500	~5 minutes	~$0.05
Large documentation portal	2,000	~15 minutes	~$0.15
Enterprise multi-site crawl	10,000	~60 minutes	~$0.75

The actor uses 512 MB memory by default. The Apify Free plan includes $5/month of platform credits, which is enough for thousands of pages per month. CheerioCrawler (server-side HTML parsing) is significantly faster and cheaper than browser-based crawling since it does not render JavaScript or load images, stylesheets, or fonts. Actual costs depend on page size, proxy usage, and the number of pages that pass quality filters.

Limitations

No JavaScript rendering -- the actor uses CheerioCrawler, which parses raw HTML without executing JavaScript. Single-page applications built with React, Angular, Vue, or similar frameworks may yield little or no content. For JS-heavy sites, pre-render the pages with a browser-based scraper first.
US-format phone detection only -- the phone number PII pattern is tuned for US phone formats (e.g., (555) 123-4567, +1-555-123-4567). International phone formats with different digit groupings may not be detected.
English-centric sentence scoring -- the sentence quality factor assumes English-style punctuation (periods, exclamation marks, question marks) for sentence boundary detection. Content in languages with different sentence structures may receive inaccurate sentence quality scores.
First-500-character fingerprinting -- deduplication fingerprints are generated from only the first 500 characters of content. Pages that share an identical introduction but diverge significantly afterward may be incorrectly flagged as duplicates.
No image or table extraction -- the actor extracts text content only. Images, charts, diagrams, and complex HTML tables are not included in the output.
Same-origin link following -- the crawler only follows links within the same origin as each start URL. Cross-domain links are not followed, even if they point to related content.
10,000 page maximum -- the maxPages parameter caps at 10,000 pages per run. For larger crawls, split across multiple runs with different start URLs.

Responsible use

Respect robots.txt and terms of service -- always verify that the websites you crawl permit automated access. The actor follows standard HTTP conventions, but compliance with a site's terms of use is your responsibility.
Avoid overloading target servers -- the actor runs with 10 concurrent requests by default. For small or fragile servers, reduce the crawl scope or add a proxy to distribute load across IP addresses.
Handle PII responsibly -- if your training dataset may contain personal information, enable both detectPII and removePII to redact sensitive data before using the dataset in model training. Review flagged PII types and consider manual inspection for high-sensitivity use cases.
Attribute content sources -- the output includes the source URL and metadata for every page. When using extracted content for AI training or publication, respect the original content's copyright and licensing terms.
Review quality before training -- automated quality scoring filters out low-value pages, but it is not a substitute for human review. Spot-check your curated dataset to verify that content quality, accuracy, and relevance meet your requirements before using it to train models.

FAQ

What types of websites work best? Documentation sites, technical blogs, knowledge bases, news archives, government portals, and content-heavy websites with well-structured HTML produce the best results. The actor excels at sites where content is delivered as server-rendered HTML rather than loaded dynamically via JavaScript.

How does the quality scoring system work? Each page is scored from 0 to 1 based on six weighted factors: content length (0--0.25, full score at 2,000+ characters), text-to-HTML ratio (0--0.25), paragraph count (0--0.15, full score at 10+ substantial paragraphs), sentence quality (0--0.15, ideal range of 10--25 words per sentence), vocabulary diversity (0--0.10, ratio of unique to total words), and metadata completeness (0--0.10, based on presence of title, description, and author).

What PII types are detected? The actor scans for five categories: email addresses, US-format phone numbers, Social Security Numbers, credit card numbers (16-digit patterns with optional separators), and IPv4 addresses. When removePII is enabled, each match is replaced with a placeholder token such as [EMAIL], [PHONE], [SSN], [CREDIT_CARD], or [IP_ADDRESS].

How does deduplication work? The actor generates a fingerprint for each page by extracting character trigrams from the first 500 characters of cleaned content, hashing each trigram, sorting the hashes, and keeping the top 20 as the fingerprint. When a new page's fingerprint overlaps 80% or more with an existing fingerprint from the same domain, the page is skipped as a near-duplicate.

Can I crawl multiple websites in one run? Yes. Add multiple URLs to startUrls. The crawler follows internal links within the same origin as each start URL independently, so you can combine a documentation site and a blog in a single run without cross-contamination.

How do I feed the output into a vector database? Export results as JSONL. Each record's content field contains the cleaned text suitable for embedding, while title, url, qualityScore, and metadata provide context for chunking and retrieval. Load the data into Pinecone, Weaviate, ChromaDB, or Qdrant using LangChain, LlamaIndex, or direct API calls.

Does this handle JavaScript-rendered pages? No. The actor uses CheerioCrawler, which parses the raw HTML response without executing JavaScript. For React SPAs, Next.js apps with client-side rendering, or Angular applications, you would need to pre-render the pages using a browser-based tool first and then pass the resulting URLs to this actor.

What is the difference between JSONL, Markdown, and plain text output formats? JSONL (the default) preserves Markdown formatting in the content field alongside all metadata fields -- best for structured data pipelines. Markdown output adds YAML frontmatter with the title and URL above the content -- useful for documentation systems. Plain text strips all Markdown formatting (headers, bold, italic, code fences, links) for simple text-only workflows.

How do I increase output quality for fine-tuning? Set minQualityScore to 0.5 or higher, increase minContentLength to 500 or more, and enable deduplicateContent. This combination filters out thin pages, low-quality content, and duplicates, leaving only substantive, well-structured text suitable for model training.

Can I exclude specific sections of a website? Yes. Use excludePatterns to skip URLs containing specific path segments. For example, adding /api/, /admin/, /login/, and /tag/ prevents the crawler from wasting requests on API documentation, admin panels, authentication pages, and tag archive pages.

How much does a typical run cost? A 500-page crawl of a documentation site takes roughly 5 minutes and costs approximately $0.05 in Apify platform credits. The Apify Free plan includes $5/month, enough for approximately 50,000 pages per month. Actual costs vary based on page size and proxy usage.

Is the data suitable for commercial model training? The actor extracts and cleans web content, but it does not assess or modify the copyright status of that content. Whether the data is suitable for commercial training depends on the source material's licensing terms. Always verify that you have the right to use the content for your intended purpose.

Actor	Description
Website Content to Markdown	Simple website content extraction and Markdown conversion without quality scoring or deduplication
Website Contact Scraper	Extract emails, phone numbers, and social media links from websites alongside page content
Website Change Monitor	Monitor websites for content changes to keep training datasets up to date as sources evolve
Website Tech Stack Detector	Identify which sites use server-rendered HTML (ideal for this actor) versus JavaScript frameworks
Semantic Scholar Paper Search	Search academic papers to find URLs for crawling research corpora and scientific training data
Wikipedia Article Search	Search and extract Wikipedia articles for general-knowledge training datasets

Ai Training Data Curator

digital_troubadour/ai-training-data-curator

Crawl websites and curate high-quality training data for LLM fine-tuning. Automatic deduplication, quality scoring, and language detection. Export to JSONL, Parquet, or CSV formats ready for OpenAI, Claude, or Llama training.

Digital Troubadour

AI-Ready Website Crawler

optimus-fulcria/ai-ready-website-crawler

Crawl websites and convert to clean markdown for AI/RAG, LLM fine-tuning, and document pipelines.

Fulcria Labs

Ai Training Data Enricher

fiery_dream/ai-training-data-enricher

Production-grade data enrichment and validation for LLM training datasets. Automatically clean, enrich, deduplicate, and validate your AI training data before fine-tuning.

Cody Churchwell

AI Training Data Scraper - LLM and RAG-Ready

george.the.developer/ai-training-data-scraper

Extract web content formatted for LLM fine-tuning and RAG pipelines. Output in OpenAI JSONL, Claude JSONL, Markdown, or raw text.

George Kioko

AI Training Data Scraper

lanky_quantifier/ai-training-data-curator

Curate high-quality training datasets for AI/ML models. Extract, clean & format text data from websites, papers & forums. Perfect for LLM training, RAG systems & research.

Vhub Systems

AI Training Data Scraper

blukaze/AI-Training-Data-Scraper

AI Training Data Scraper converts websites into clean, semantically-chunked, vector-ready data for LLMs, RAG pipelines, and AI search. Built for documentation, tutorials, and code-heavy content, with smart chunking and rich metadata.

Blukaze Automations

AI Training Data Scraper (Substack / Medium)

juryless_lens/ai-training-data-scraper

Extract clean, structured text data from Substack and Medium publications — formatted as Markdown or Plain Text — ready for LLM fine-tuning, RAG pipelines, and content analysis.

Brian

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

LLM-Ready Web Scraper

devoted_helix/llm-web-scraper

Convert web pages to clean, LLM-friendly text. Perfect for RAG pipelines, AI chatbot training, and fine-tuning datasets. Removes ads,menus, and clutter automatically.