Real Estate Listing Extractor avatar

Real Estate Listing Extractor

Pricing

Pay per usage

Go to Apify Store
Real Estate Listing Extractor

Real Estate Listing Extractor

Extract structured data from a SINGLE public real-estate listing page: address, price, beds, baths, area, property type, sale/rent, year built, agent, images, geo. schema.org JSON-LD -> OpenGraph -> heuristics. Pure code, SSRF-guarded, cost-safe (no proxy/headless/AI). Single-page, not bulk.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Ahmed Moussa

Ahmed Moussa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Real Estate Listing Extractor (single page)

Turn a single public real-estate listing page URL into a clean, structured JSON record — deterministically, with no AI, no proxy, and no headless browser.

What it does

Given one listing-page URL (or a small bounded batch), the actor fetches the page once and extracts structured listing data from its embedded schema.org markup. It is built on the same proven, SSRF-guarded fetch core as the other OMEGA single-page extractors, with a deterministic real-estate parser on top. Pure code — every field is computed, never guessed by a language model.

Input

FieldTypeDescription
urlstringA single public real-estate listing page URL (include https://).
urlsarrayOptional bounded list of extra listing URLs (max 50 per run).

Example input:

{
"url": "https://www.example-realty.com/listing/123-maple-st"
}

Output

One dataset item per URL:

{
"url": "https://www.example-realty.com/listing/123-maple-st",
"status": "completed",
"address": "123 Maple St, Austin, TX, 78701, US",
"price": "675000",
"currency": "USD",
"beds": "4",
"baths": "2.5",
"area_sqft": "2400",
"property_type": "singlefamilyresidence",
"listing_type": "sale",
"year_built": "1998",
"lot_size": "6997",
"agent": "Acme Realty",
"images": ["https://.../a.jpg", "https://.../b.jpg"],
"description": "Charming family home with garden.",
"geo": { "lat": "30.2672", "lng": "-97.7431" },
"raw_prices": ["675000"],
"method": "jsonld_realestate",
"parse_confidence": "high",
"extracted_at": "2026-06-24T10:00:00+00:00",
"error": null
}

status is one of completed, failed, blocked, or empty. Any field that the page does not declare is returned as null (or []) — never invented.

Use cases

  • Normalise a listing URL into a row for a CRM, spreadsheet, or database.
  • Pull price / beds / baths / area for a comparables (comps) sheet.
  • Monitor a single listing's price and status over time.
  • Enrich an internal dataset of listing URLs with structured fields.

How it works

Extraction precedence (most reliable first); the layer used is reported in method:

  1. schema.org JSON-LDRealEstateListing, Residence (House, Apartment, SingleFamilyResidence, …), Accommodation/Place, and Product/Offer (price, currency, address, bedrooms, bathrooms, floor/lot size, year built, geo, images, broker/seller, sale-vs-rent via businessFunction).
  2. OpenGraph / product metaog:title, og:image, product:price:amount, product:price:currency, location meta.
  3. Meta / heuristics<title>/<h1> plus a conservative, currency-marked price detector (never infers a price from a bare number).

Areas declared in square metres (unitCode MTK) are converted to square feet. A code-owned parse_confidence (high/medium/low/none) reflects which layer matched and how many core fields were found.

Cost-safety

  • No proxy, no headless browser, no LLM, no paid API. One bounded HTTP GET per URL (hard caps: 5s connect / 10s read / 2 MB / 3 redirects).
  • $0 idle and $0 uncovered cost beyond Apify compute — nothing to subsidise.
  • SSRF-guarded and fail-closed: private/loopback/reserved IPs are blocked, with per-redirect re-validation, and a domain blocklist for bot-walled portals.

Limitations (honest)

  • This is single-page extraction, not bulk portal/MLS scraping. It fetches the page you give it and never follows links or paginates.
  • It only reads server-rendered markup. Pages that render entirely in the browser (heavy client-side JS) will expose little to a plain GET and return a low-confidence result.
  • Many large portals (Zillow, Realtor.com, Redfin, Rightmove, Zoopla, …) block bots and/or forbid scraping in their ToS — these are on a blocklist and return status: "blocked". Point the actor at a brokerage's or publisher's own listing page that exposes schema.org markup for best results.
  • Fields are only as good as the page's structured data. Missing data is returned as null; the actor never fabricates a value.