Real Estate Listing Extractor
Pricing
Pay per usage
Real Estate Listing Extractor
Extract structured data from a SINGLE public real-estate listing page: address, price, beds, baths, area, property type, sale/rent, year built, agent, images, geo. schema.org JSON-LD -> OpenGraph -> heuristics. Pure code, SSRF-guarded, cost-safe (no proxy/headless/AI). Single-page, not bulk.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Ahmed Moussa
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Real Estate Listing Extractor (single page)
Turn a single public real-estate listing page URL into a clean, structured JSON record — deterministically, with no AI, no proxy, and no headless browser.
What it does
Given one listing-page URL (or a small bounded batch), the actor fetches the page once and extracts structured listing data from its embedded schema.org markup. It is built on the same proven, SSRF-guarded fetch core as the other OMEGA single-page extractors, with a deterministic real-estate parser on top. Pure code — every field is computed, never guessed by a language model.
Input
| Field | Type | Description |
|---|---|---|
url | string | A single public real-estate listing page URL (include https://). |
urls | array | Optional bounded list of extra listing URLs (max 50 per run). |
Example input:
{"url": "https://www.example-realty.com/listing/123-maple-st"}
Output
One dataset item per URL:
{"url": "https://www.example-realty.com/listing/123-maple-st","status": "completed","address": "123 Maple St, Austin, TX, 78701, US","price": "675000","currency": "USD","beds": "4","baths": "2.5","area_sqft": "2400","property_type": "singlefamilyresidence","listing_type": "sale","year_built": "1998","lot_size": "6997","agent": "Acme Realty","images": ["https://.../a.jpg", "https://.../b.jpg"],"description": "Charming family home with garden.","geo": { "lat": "30.2672", "lng": "-97.7431" },"raw_prices": ["675000"],"method": "jsonld_realestate","parse_confidence": "high","extracted_at": "2026-06-24T10:00:00+00:00","error": null}
status is one of completed, failed, blocked, or empty. Any field that
the page does not declare is returned as null (or []) — never invented.
Use cases
- Normalise a listing URL into a row for a CRM, spreadsheet, or database.
- Pull price / beds / baths / area for a comparables (comps) sheet.
- Monitor a single listing's price and status over time.
- Enrich an internal dataset of listing URLs with structured fields.
How it works
Extraction precedence (most reliable first); the layer used is reported in
method:
- schema.org JSON-LD —
RealEstateListing,Residence(House, Apartment, SingleFamilyResidence, …),Accommodation/Place, andProduct/Offer(price, currency, address, bedrooms, bathrooms, floor/lot size, year built, geo, images, broker/seller, sale-vs-rent viabusinessFunction). - OpenGraph / product meta —
og:title,og:image,product:price:amount,product:price:currency, location meta. - Meta / heuristics —
<title>/<h1>plus a conservative, currency-marked price detector (never infers a price from a bare number).
Areas declared in square metres (unitCode MTK) are converted to square feet.
A code-owned parse_confidence (high/medium/low/none) reflects which layer
matched and how many core fields were found.
Cost-safety
- No proxy, no headless browser, no LLM, no paid API. One bounded HTTP GET per URL (hard caps: 5s connect / 10s read / 2 MB / 3 redirects).
- $0 idle and $0 uncovered cost beyond Apify compute — nothing to subsidise.
- SSRF-guarded and fail-closed: private/loopback/reserved IPs are blocked, with per-redirect re-validation, and a domain blocklist for bot-walled portals.
Limitations (honest)
- This is single-page extraction, not bulk portal/MLS scraping. It fetches the page you give it and never follows links or paginates.
- It only reads server-rendered markup. Pages that render entirely in the browser (heavy client-side JS) will expose little to a plain GET and return a low-confidence result.
- Many large portals (Zillow, Realtor.com, Redfin, Rightmove, Zoopla, …) block
bots and/or forbid scraping in their ToS — these are on a blocklist and return
status: "blocked". Point the actor at a brokerage's or publisher's own listing page that exposes schema.org markup for best results. - Fields are only as good as the page's structured data. Missing data is
returned as
null; the actor never fabricates a value.