Lovable Sites Scraper - Find & Enrich lovable.app Apps
Pricing
from $5.00 / 1,000 results
Lovable Sites Scraper - Find & Enrich lovable.app Apps
Discover sites built with Lovable.dev. Enumerates *.lovable.app subdomains from public sources (CT logs, RapidDNS, hackertarget) and enriches each with title, description, Open Graph tags and custom domain detection. Perfect for lead-gen, competitive intel and market research on AI-built apps.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
deusex machine
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Lovable Sites Scraper — Find & Enrich .lovable.app Apps
Discover every public site built with Lovable.dev (the AI app builder by GPT Engineer). This actor enumerates live *.lovable.app subdomains from multiple public sources, then enriches each URL with HTTP metadata — title, description, Open Graph tags, favicon, canonical URL and custom-domain detection — so you can turn the raw list into a searchable, filterable dataset of AI-built apps.
Perfect for lead generation, competitive intelligence, market research on AI-built products, design inspiration and agency prospecting. If you're selling to founders who ship with AI, this is your radar.
Why this actor exists
Lovable.dev is one of the hottest AI app builders on the market — thousands of founders, indie hackers and agencies use it daily to ship full-stack apps in minutes. Every published Lovable project gets a forced subdomain on *.lovable.app, and optionally a custom domain on top. That forced subdomain is the reason we can enumerate the entire public surface of Lovable: if someone shipped it and hit Publish, it's discoverable.
But there's no official directory. No search engine. No public API. If you want to know:
- Which agencies are shipping client work on Lovable?
- What SaaS niches are being built with AI right now?
- Who just bought a custom domain (signal: they're serious, have budget)?
- What landing pages are converting in your market?
- Which Lovable sites are dead vs. live vs. placeholder?
…you had to scrape it yourself. Until now.
This actor does the heavy lifting: multi-source subdomain enumeration, concurrent HTTP enrichment, dead-site filtering, custom-domain detection, and keyword search across the whole result set. You get a clean, structured dataset ready to import into your CRM, your BI tool, or your cold-outreach workflow.
What you get — output fields
Each row in the output dataset contains:
| Field | Type | Description |
|---|---|---|
subdomain | string | The full xxx.lovable.app hostname |
url | string | Canonical https:// URL |
status | integer | HTTP status code returned (200, 404, 500…) |
isLive | boolean | true if the site responds 200 and is NOT the Lovable placeholder page |
isDefault | boolean | true if the response is Lovable's "project not found" / "not deployed yet" page |
title | string | <title> tag, unescaped and trimmed to 300 chars |
description | string | <meta name="description">, 600 chars |
ogTitle | string | <meta property="og:title"> |
ogDescription | string | <meta property="og:description"> |
ogImage | string | <meta property="og:image"> URL — great for thumbnails |
ogUrl | string | <meta property="og:url"> |
canonical | string | <link rel="canonical"> href |
favicon | string | <link rel="icon"> URL (absolute) |
customDomain | string | Detected custom domain (from canonical / og:url hostname) — empty if none |
hasCustomDomain | boolean | true if customDomain is set — key lead-gen filter |
contentLength | integer | Response body size in bytes |
scrapedAt | string | ISO-8601 UTC timestamp when the row was enriched |
The hasCustomDomain field is the money field for lead-gen: a Lovable user who went through the effort of wiring up DNS is several orders of magnitude more likely to be a paying, serious customer than someone with a placeholder.
How it works
1) Discovery — multi-source subdomain enumeration
The actor queries up to three public sources in parallel:
- crt.sh — Certificate Transparency logs. Every SSL certificate issued for
*.lovable.appis logged here. The Lovable platform mostly uses a wildcard cert, so CT coverage is partial, but many projects get their own cert issued. - hackertarget.com — Free passive DNS / host search. Covers a large chunk of the surface with recent data.
- rapiddns.io — Aggregated subdomain database. Pulls from passive DNS, CT and DNS brute-force.
By default all three are enabled (sources: ["crtsh", "hackertarget", "rapiddns"]). You can disable any of them via the input. Combining sources gives the broadest coverage — each individual source has blind spots.
2) Cleaning
Raw results are deduplicated, wildcard entries (*.lovable.app) are dropped, and www.<project>.lovable.app duplicates of the bare form are collapsed by default (toggle with includeWww: true).
3) Enrichment (optional, on by default)
For each surviving candidate, the actor fires a concurrent HTTPS GET with:
- 10-second timeout
- Redirect follow (so we catch sites that
301to a custom domain) - Desktop-class User-Agent
- Configurable concurrency (
concurrency: 15by default, up to 50)
The HTML is parsed with lightweight regex (no Playwright / no heavy browser) — fast and cheap. We extract title, meta tags, OG tags, canonical, favicon. We also detect the Lovable placeholder / "not deployed" page and set isLive=false for those so your dataset isn't polluted with dead projects.
4) Filtering
Two filters run at the end:
onlyLive: true(default) — drop dead / placeholder sitessearchQuery— case-insensitive match across subdomain + title + description + OG tags + customDomain. Great for niching down:"crypto","saas","real estate","restaurant","ai".
5) Charging
The actor bills per result pushed ($0.002 per site), not per candidate discovered. If onlyLive=true and 80% of candidates are dead, you only pay for the 20% that landed in your dataset.
Input parameters
| Field | Type | Default | Description |
|---|---|---|---|
maxSites | integer | 100 | Maximum sites to return. Range: 1–5000 |
enrichHtml | boolean | true | Fetch each site and extract metadata. Turn off for pure subdomain enumeration (cheapest) |
onlyLive | boolean | true | Skip 4xx/5xx/timeouts/placeholder pages |
includeWww | boolean | false | Keep www.* variants even when the bare form is also found |
searchQuery | string | "" | Keyword filter (case-insensitive) across subdomain + title + description + OG + customDomain |
concurrency | integer | 15 | Parallel HTTP requests during enrichment (1–50) |
sources | array | all 3 | Any combination of crtsh, hackertarget, rapiddns |
Sample inputs
Lead-gen: find Lovable sites with custom domains (filter client-side):
{"maxSites": 2000,"enrichHtml": true,"onlyLive": true,"concurrency": 20}
Run → filter dataset where hasCustomDomain = true → that's your qualified lead list.
Niche research: find crypto apps built on Lovable:
{"maxSites": 500,"searchQuery": "crypto","enrichHtml": true,"onlyLive": true}
Pure enumeration (fastest / cheapest):
{"maxSites": 5000,"enrichHtml": false}
No HTTP enrichment — you get a clean list of subdomains in seconds. Good for feeding into your own downstream pipeline.
Competitive intel on a specific vertical:
{"maxSites": 1000,"searchQuery": "saas","sources": ["crtsh", "hackertarget", "rapiddns"],"onlyLive": true,"concurrency": 25}
Use cases
1. Lead generation for AI dev agencies
You build custom apps for clients. Your ICP is founders who've already validated with a no-code AI tool but hit the ceiling. Scrape Lovable → filter hasCustomDomain=true → enrich with the custom-domain owner (Clearbit / Apollo) → send a personalized cold email: "Saw you shipped acme.com on Lovable — once you need real auth + Stripe + multi-tenant, here's what we do."
Conversion rates on this kind of cold outreach are typically 3–8× higher than generic lists because you're qualifying on intent + budget + tech stack all at once.
2. Competitive intelligence for SaaS founders
You're building a SaaS. You want to know what's being shipped in your space right now — not six months ago when Crunchbase got around to indexing it. Filter searchQuery by your vertical keyword, inspect titles + descriptions, collect patterns. Which painpoints are recurring? Who's pricing what? Which are gaining traction (check the custom domain → check DNS age → check backlinks)?
3. Market research on AI-built products
Investors, analysts, journalists: Lovable is one of the primary funnels where AI-generated software becomes production software. Enumerating this surface gives you a weekly pulse on what's being built with AI, what verticals are hot, what's dying in the graveyard of placeholder pages.
4. Design and UX inspiration
Need examples of how AI builders design their landing pages? Filter by vertical, pull the ogImage field, build a mood board of 500 AI-generated homepages in an afternoon.
5. Agency prospecting for Lovable itself
If you work at Lovable or a compatible tool (Vercel, Supabase, Clerk, Stripe), this is your list of current users. Segment by custom-domain vs. placeholder, prioritize the serious ones, reach out with case studies and integration guides.
6. Historical tracking / weekly deltas
Schedule the actor to run weekly. Diff the datasets. What got published this week? What died? What graduated from placeholder to custom domain? That's a trend chart nobody else has.
Code examples
Python — Apify client
from apify_client import ApifyClientclient = ApifyClient("<YOUR_APIFY_TOKEN>")run = client.actor("makework36/lovable-sites-scraper").call(run_input={"maxSites": 500,"enrichHtml": True,"onlyLive": True,"searchQuery": "saas",})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["subdomain"], "→", item["title"], "| custom:", item.get("customDomain") or "-")
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: '<YOUR_APIFY_TOKEN>' });const run = await client.actor('makework36/lovable-sites-scraper').call({maxSites: 1000,enrichHtml: true,onlyLive: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();const withCustomDomain = items.filter(i => i.hasCustomDomain);console.log(`${withCustomDomain.length} qualified leads out of ${items.length} live sites`);
cURL (sync run-and-wait)
curl -X POST "https://api.apify.com/v2/acts/makework36~lovable-sites-scraper/run-sync-get-dataset-items?token=$APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"maxSites": 200, "searchQuery": "crypto", "enrichHtml": true}'
PHP
$url = "https://api.apify.com/v2/acts/makework36~lovable-sites-scraper/run-sync-get-dataset-items?token=$token";$payload = json_encode(["maxSites" => 500, "onlyLive" => true]);$ch = curl_init($url);curl_setopt_array($ch, [CURLOPT_POST => true,CURLOPT_POSTFIELDS => $payload,CURLOPT_HTTPHEADER => ["Content-Type: application/json"],CURLOPT_RETURNTRANSFER => true,]);$data = json_decode(curl_exec($ch), true);
Zapier / Make / n8n
Apify has native connectors for all three. Drop this actor in as a scheduled trigger (weekly/daily), pipe the dataset to Google Sheets, Airtable, Notion, HubSpot or your warehouse of choice. Typical setup: weekly run → filter hasCustomDomain=true → push new rows to your CRM as leads → enrich with Clearbit → trigger outreach sequence.
Pricing
$0.002 per site returned.
Billing model: pay-per-result. You only pay for rows that end up in your dataset — not for candidates that were dead, placeholder, or filtered out by your searchQuery.
Indicative totals:
| Run size | Cost |
|---|---|
| 100 sites | $0.20 |
| 500 sites | $1.00 |
| 1,000 sites | $2.00 |
| 5,000 sites | $10.00 |
The actor has no monthly subscription, no minimum spend, and no proxy/compute overhead charges — the $0.002/site is fully-loaded.
Compare to building this yourself:
- Hiring a scraping dev: 2–5 days at $500/day = $1,000–$2,500
- Scraping stack (proxy + compute + maintenance): $50–$200/month ongoing
- Keeping it alive as sources change: ongoing engineering time
A one-off $10 run replaces weeks of work.
FAQ
How fresh is the data? Every run hits the sources live — no stale cache. crt.sh updates within minutes of a new cert. hackertarget and rapiddns refresh passive DNS daily-to-weekly. If a site was published this morning and has its own cert, it'll show up.
Can this find sites that haven't been published / are private? No. By design, this actor only sees what's publicly reachable. If a Lovable project was never shared publicly, it's invisible to all three sources.
What about sites with only custom domains and no lovable.app hostname?
Lovable currently forces a *.lovable.app subdomain on every published project — the custom domain is added on top, not instead. So every live Lovable site has both. We discover via the forced *.lovable.app and surface the custom domain in the customDomain field.
Why are some results missing a title / description? Some Lovable projects are single-page apps rendered client-side with minimal SSR. The HTML we fetch is the shell; the actual content loads from JS. In those cases we extract what's in the shell and leave the rest empty. If you need rendered content, pipe the URLs into a Playwright-based enrichment actor as a second pass.
How do I filter for very recent sites? crt.sh returns certificate issuance dates — a future version of this actor may surface that. For now, run the actor weekly and diff against your previous dataset to find what's new.
Can I get phone / email of site owners?
Not directly — this actor surfaces what's public on the site. For contact enrichment, combine the output with Clearbit, Apollo, Hunter.io or a WHOIS lookup on the customDomain field.
Does this work for other AI app builders (Bolt, v0, Replit)?
This one is Lovable-specific. If you need bolt.new, v0.app or Replit public surface scraping, check our other actors or request one.
What's the difference between isLive and status = 200?
status = 200 just means the server responded. isLive = true also means the response wasn't Lovable's default "project not found / not deployed yet" page. Many dead Lovable URLs return 200 + placeholder, which is worse than a 404 because you'd waste outreach on them. isLive cleans that up.
Why is enrichHtml=true so much slower?
Because it's actually fetching each URL. 1000 sites at concurrency=15 takes roughly 2–4 minutes. Turn it off if you only need the subdomain list.
Can I re-run this on just a single subdomain to re-check it?
Not as a focused use case of this actor — it's designed for bulk discovery. For point-queries, just curl the URL yourself.
Troubleshooting
Run finishes with very few results.
Check: (1) maxSites isn't too low, (2) at least two sources are enabled, (3) onlyLive=true might be aggressive if Lovable had an outage — try onlyLive=false to see raw reachability, (4) if searchQuery is set, loosen it.
HTTP 429 / rate-limit warnings.
Drop concurrency from 15 → 5 or 8. Sources occasionally tighten their free tiers.
crt.sh returns errors. crt.sh is sometimes flaky under load (502 Bad Gateway, timeouts). The actor logs a warning and continues with the other sources — you just get narrower coverage that run. Re-run later.
customDomain empty on a site I know has one.
Custom-domain detection relies on <link rel="canonical"> or <meta property="og:url">. If the Lovable project doesn't set either to the custom domain, we can't detect it from the HTML alone. A v2 of this actor may add DNS / HTTP redirect-chain detection.
Some subdomains return contentLength: 0.
Means the server returned an empty body (rare — usually a 3xx → 2xx redirect chain ending at an empty page, or HEAD-like response). Treat those as dead.
I want og:image URLs absolutized.
They're returned as-is from the HTML. If a Lovable site uses a relative og:image, you'll get the relative path. Prepend the url field to resolve it.
Related actors from the same author
- Airbnb Scraper — Listings, Prices, Photos & Hosts API — full Airbnb enumeration for travel / lead-gen
- Airbnb Market Analytics — ADR, RevPAR & Occupancy — short-term rental market metrics
- Airbnb MCP Server — Claude, Cursor & AI Agents — conversational Airbnb search for LLM agents
- VRBO Scraper — vacation-rental competitive data
- Skyscanner scrapers — flight + hotel discovery
Check my Apify profile (makework36) for the full catalog of 70+ production scrapers.
Changelog
1.0 — 2026-04-21 — Initial release. Multi-source discovery (crt.sh + hackertarget + rapiddns), HTTP metadata enrichment, custom-domain detection, onlyLive + searchQuery filters, pay-per-result billing at $0.002/site.
Legal / ethics note
This actor only reads publicly available information: Certificate Transparency logs (a legal requirement for every issued SSL cert, mandated by browsers), passive DNS databases (publicly queried), and live HTTP GETs to publicly-published URLs. No authentication bypass, no private data, no rate-limit evasion on Lovable's own infrastructure. If you use the output for cold outreach, follow CAN-SPAM / GDPR / whatever jurisdiction applies to your recipient list.
Built by makework36. Questions, feature requests, or bug reports → open an issue on the actor page or DM on Apify.