# Real Estate Listing Extractor (`timely_quarterstaff/real-estate-extractor`) Actor

Extract structured data from a SINGLE public real-estate listing page: address, price, beds, baths, area, property type, sale/rent, year built, agent, images, geo. schema.org JSON-LD -> OpenGraph -> heuristics. Pure code, SSRF-guarded, cost-safe (no proxy/headless/AI). Single-page, not bulk.

- **URL**: https://apify.com/timely\_quarterstaff/real-estate-extractor.md
- **Developed by:** [Ahmed Moussa](https://apify.com/timely_quarterstaff) (community)
- **Categories:** Real estate
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Real Estate Listing Extractor (single page)

Turn a single public real-estate **listing page URL** into a clean, structured
JSON record — deterministically, with no AI, no proxy, and no headless browser.

### What it does

Given one listing-page URL (or a small bounded batch), the actor fetches the
page **once** and extracts structured listing data from its embedded
[schema.org](https://schema.org) markup. It is built on the same proven,
SSRF-guarded fetch core as the other OMEGA single-page extractors, with a
deterministic real-estate parser on top. Pure code — every field is computed,
never guessed by a language model.

### Input

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | A single public real-estate listing page URL (include `https://`). |
| `urls` | array | Optional bounded list of extra listing URLs (max 50 per run). |

Example input:

```json
{
  "url": "https://www.example-realty.com/listing/123-maple-st"
}
````

### Output

One dataset item per URL:

```json
{
  "url": "https://www.example-realty.com/listing/123-maple-st",
  "status": "completed",
  "address": "123 Maple St, Austin, TX, 78701, US",
  "price": "675000",
  "currency": "USD",
  "beds": "4",
  "baths": "2.5",
  "area_sqft": "2400",
  "property_type": "singlefamilyresidence",
  "listing_type": "sale",
  "year_built": "1998",
  "lot_size": "6997",
  "agent": "Acme Realty",
  "images": ["https://.../a.jpg", "https://.../b.jpg"],
  "description": "Charming family home with garden.",
  "geo": { "lat": "30.2672", "lng": "-97.7431" },
  "raw_prices": ["675000"],
  "method": "jsonld_realestate",
  "parse_confidence": "high",
  "extracted_at": "2026-06-24T10:00:00+00:00",
  "error": null
}
```

`status` is one of `completed`, `failed`, `blocked`, or `empty`. Any field that
the page does not declare is returned as `null` (or `[]`) — never invented.

### Use cases

- Normalise a listing URL into a row for a CRM, spreadsheet, or database.
- Pull price / beds / baths / area for a comparables (comps) sheet.
- Monitor a single listing's price and status over time.
- Enrich an internal dataset of listing URLs with structured fields.

### How it works

Extraction precedence (most reliable first); the layer used is reported in
`method`:

1. **schema.org JSON-LD** — `RealEstateListing`, `Residence` (House, Apartment,
   SingleFamilyResidence, …), `Accommodation`/`Place`, and `Product`/`Offer`
   (price, currency, address, bedrooms, bathrooms, floor/lot size, year built,
   geo, images, broker/seller, sale-vs-rent via `businessFunction`).
2. **OpenGraph / product meta** — `og:title`, `og:image`, `product:price:amount`,
   `product:price:currency`, location meta.
3. **Meta / heuristics** — `<title>`/`<h1>` plus a conservative,
   currency-marked price detector (never infers a price from a bare number).

Areas declared in square metres (`unitCode` `MTK`) are converted to square feet.
A code-owned `parse_confidence` (high/medium/low/none) reflects which layer
matched and how many core fields were found.

### Cost-safety

- **No proxy, no headless browser, no LLM, no paid API.** One bounded HTTP GET
  per URL (hard caps: 5s connect / 10s read / 2 MB / 3 redirects).
- **$0 idle and $0 uncovered cost** beyond Apify compute — nothing to subsidise.
- SSRF-guarded and fail-closed: private/loopback/reserved IPs are blocked, with
  per-redirect re-validation, and a domain blocklist for bot-walled portals.

### Limitations (honest)

- This is **single-page extraction, not bulk portal/MLS scraping.** It fetches
  the page you give it and never follows links or paginates.
- It only reads **server-rendered** markup. Pages that render entirely in the
  browser (heavy client-side JS) will expose little to a plain GET and return a
  low-confidence result.
- Many large portals (Zillow, Realtor.com, Redfin, Rightmove, Zoopla, …) block
  bots and/or forbid scraping in their ToS — these are on a blocklist and return
  `status: "blocked"`. Point the actor at a brokerage's or publisher's own
  listing page that exposes schema.org markup for best results.
- Fields are only as good as the page's structured data. Missing data is
  returned as `null`; the actor never fabricates a value.

# Actor input Schema

## `url` (type: `string`):

A single public real-estate LISTING page URL. The actor fetches this one page and parses structured listing data (address, price, beds, baths, area, property type, sale/rent, year built, lot size, agent, images, geo, ...) out of its schema.org RealEstateListing / Residence / Product / Offer JSON-LD, OpenGraph and meta tags. Provide a full URL including https://. Best results on pages that server-render schema.org RealEstateListing/Residence JSON-LD; pages that render entirely client-side or block bots will return a low-confidence or 'blocked' result (reported honestly, never fabricated).

## `urls` (type: `array`):

Optional bounded list of additional listing-page URLs, each processed exactly like 'url' and added to the same dataset. Processed in addition to 'url' and capped at 50 per run for cost-safety. This is single-page extraction, one record per URL, NOT bulk portal crawling (the actor never follows links or paginates).

## Actor input object example

```json
{
  "url": "https://www.coldwellbanker.com/",
  "urls": [
    "https://www.coldwellbanker.com/"
  ]
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://www.coldwellbanker.com/",
    "urls": [
        "https://www.coldwellbanker.com/"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("timely_quarterstaff/real-estate-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://www.coldwellbanker.com/",
    "urls": ["https://www.coldwellbanker.com/"],
}

# Run the Actor and wait for it to finish
run = client.actor("timely_quarterstaff/real-estate-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://www.coldwellbanker.com/",
  "urls": [
    "https://www.coldwellbanker.com/"
  ]
}' |
apify call timely_quarterstaff/real-estate-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=timely_quarterstaff/real-estate-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Real Estate Listing Extractor",
        "description": "Extract structured data from a SINGLE public real-estate listing page: address, price, beds, baths, area, property type, sale/rent, year built, agent, images, geo. schema.org JSON-LD -> OpenGraph -> heuristics. Pure code, SSRF-guarded, cost-safe (no proxy/headless/AI). Single-page, not bulk.",
        "version": "0.1",
        "x-build-id": "Dofu06CHqcp7a4abx"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/timely_quarterstaff~real-estate-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-timely_quarterstaff-real-estate-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/timely_quarterstaff~real-estate-extractor/runs": {
            "post": {
                "operationId": "runs-sync-timely_quarterstaff-real-estate-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/timely_quarterstaff~real-estate-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-timely_quarterstaff-real-estate-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "Listing page URL",
                        "type": "string",
                        "description": "A single public real-estate LISTING page URL. The actor fetches this one page and parses structured listing data (address, price, beds, baths, area, property type, sale/rent, year built, lot size, agent, images, geo, ...) out of its schema.org RealEstateListing / Residence / Product / Offer JSON-LD, OpenGraph and meta tags. Provide a full URL including https://. Best results on pages that server-render schema.org RealEstateListing/Residence JSON-LD; pages that render entirely client-side or block bots will return a low-confidence or 'blocked' result (reported honestly, never fabricated)."
                    },
                    "urls": {
                        "title": "Listing page URLs (batch)",
                        "type": "array",
                        "description": "Optional bounded list of additional listing-page URLs, each processed exactly like 'url' and added to the same dataset. Processed in addition to 'url' and capped at 50 per run for cost-safety. This is single-page extraction, one record per URL, NOT bulk portal crawling (the actor never follows links or paginates).",
                        "items": {
                            "type": "string"
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
