Listing Sleuth
Try for free
No credit card required
Go to Store
Listing Sleuth
onyedikachi-david/listing-sleuth
Try for free
No credit card required
An agentic real estate listing monitor that helps users find properties that match their specific criteria. This agent scrapes data from popular real estate platforms such as Zillow, Realtor.com, and Apartments.com to provide up-to-date information on available properties.
Developer
Maintained by Community
Actor Metrics
0 monthly users
No reviews yet
No bookmarks yet
Created in Mar 2025
Modified 14 hours ago
Categories
.dockerignore
1.git
2.mise.toml
3.nvim.lua
4storage
5
6# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore
7
8# Byte-compiled / optimized / DLL files
9__pycache__/
10*.py[cod]
11*$py.class
12
13# C extensions
14*.so
15
16# Distribution / packaging
17.Python
18build/
19develop-eggs/
20dist/
21downloads/
22eggs/
23.eggs/
24lib/
25lib64/
26parts/
27sdist/
28var/
29wheels/
30share/python-wheels/
31*.egg-info/
32.installed.cfg
33*.egg
34MANIFEST
35
36# PyInstaller
37# Usually these files are written by a python script from a template
38# before PyInstaller builds the exe, so as to inject date/other infos into it.
39*.manifest
40*.spec
41
42# Installer logs
43pip-log.txt
44pip-delete-this-directory.txt
45
46# Unit test / coverage reports
47htmlcov/
48.tox/
49.nox/
50.coverage
51.coverage.*
52.cache
53nosetests.xml
54coverage.xml
55*.cover
56*.py,cover
57.hypothesis/
58.pytest_cache/
59cover/
60
61# Translations
62*.mo
63*.pot
64
65# Django stuff:
66*.log
67local_settings.py
68db.sqlite3
69db.sqlite3-journal
70
71# Flask stuff:
72instance/
73.webassets-cache
74
75# Scrapy stuff:
76.scrapy
77
78# Sphinx documentation
79docs/_build/
80
81# PyBuilder
82.pybuilder/
83target/
84
85# Jupyter Notebook
86.ipynb_checkpoints
87
88# IPython
89profile_default/
90ipython_config.py
91
92# pyenv
93# For a library or package, you might want to ignore these files since the code is
94# intended to run in multiple environments; otherwise, check them in:
95.python-version
96
97# pdm
98# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
99#pdm.lock
100# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
101# in version control.
102# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
103.pdm.toml
104.pdm-python
105.pdm-build/
106
107# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
108__pypackages__/
109
110# Celery stuff
111celerybeat-schedule
112celerybeat.pid
113
114# SageMath parsed files
115*.sage.py
116
117# Environments
118.env
119.venv
120env/
121venv/
122ENV/
123env.bak/
124venv.bak/
125
126# Spyder project settings
127.spyderproject
128.spyproject
129
130# Rope project settings
131.ropeproject
132
133# mkdocs documentation
134/site
135
136# mypy
137.mypy_cache/
138.dmypy.json
139dmypy.json
140
141# Pyre type checker
142.pyre/
143
144# pytype static type analyzer
145.pytype/
146
147# Cython debug symbols
148cython_debug/
149
150# PyCharm
151# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
152# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
153# and can be added to the global gitignore or merged into this file. For a more nuclear
154# option (not recommended) you can uncomment the following to ignore the entire idea folder.
155.idea/
.gitignore
1.mise.toml
2.nvim.lua
3storage
4
5# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore
6
7# Byte-compiled / optimized / DLL files
8__pycache__/
9*.py[cod]
10*$py.class
11
12# C extensions
13*.so
14
15# Distribution / packaging
16.Python
17build/
18develop-eggs/
19dist/
20downloads/
21eggs/
22.eggs/
23lib/
24lib64/
25parts/
26sdist/
27var/
28wheels/
29share/python-wheels/
30*.egg-info/
31.installed.cfg
32*.egg
33MANIFEST
34
35# PyInstaller
36# Usually these files are written by a python script from a template
37# before PyInstaller builds the exe, so as to inject date/other infos into it.
38*.manifest
39*.spec
40
41# Installer logs
42pip-log.txt
43pip-delete-this-directory.txt
44
45# Unit test / coverage reports
46htmlcov/
47.tox/
48.nox/
49.coverage
50.coverage.*
51.cache
52nosetests.xml
53coverage.xml
54*.cover
55*.py,cover
56.hypothesis/
57.pytest_cache/
58cover/
59
60# Translations
61*.mo
62*.pot
63
64# Django stuff:
65*.log
66local_settings.py
67db.sqlite3
68db.sqlite3-journal
69
70# Flask stuff:
71instance/
72.webassets-cache
73
74# Scrapy stuff:
75.scrapy
76
77# Sphinx documentation
78docs/_build/
79
80# PyBuilder
81.pybuilder/
82target/
83
84# Jupyter Notebook
85.ipynb_checkpoints
86
87# IPython
88profile_default/
89ipython_config.py
90
91# pyenv
92# For a library or package, you might want to ignore these files since the code is
93# intended to run in multiple environments; otherwise, check them in:
94.python-version
95
96# pdm
97# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
98#pdm.lock
99# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
100# in version control.
101# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
102.pdm.toml
103.pdm-python
104.pdm-build/
105
106# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
107__pypackages__/
108
109# Celery stuff
110celerybeat-schedule
111celerybeat.pid
112
113# SageMath parsed files
114*.sage.py
115
116# Environments
117.env
118.venv
119env/
120venv/
121ENV/
122env.bak/
123venv.bak/
124
125# Spyder project settings
126.spyderproject
127.spyproject
128
129# Rope project settings
130.ropeproject
131
132# mkdocs documentation
133/site
134
135# mypy
136.mypy_cache/
137.dmypy.json
138dmypy.json
139
140# Pyre type checker
141.pyre/
142
143# pytype static type analyzer
144.pytype/
145
146# Cython debug symbols
147cython_debug/
148
149# PyCharm
150# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
151# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
152# and can be added to the global gitignore or merged into this file. For a more nuclear
153# option (not recommended) you can uncomment the following to ignore the entire idea folder.
154.idea/
155
156# Added by Apify CLI
157node_modules
INPUT.json
1{
2 "location": "San Francisco, CA",
3 "propertyType": "apartment",
4 "minBedrooms": 2,
5 "maxBedrooms": 3,
6 "minPrice": 1500,
7 "maxPrice": 3000,
8 "amenities": ["parking", "gym"],
9 "searchType": "rent",
10 "sources": ["zillow", "apartments"]
11}
LICENSE
1MIT License
2
3Copyright (c) 2024 Listing Sleuth
4
5Permission is hereby granted, free of charge, to any person obtaining a copy
6of this software and associated documentation files (the "Software"), to deal
7in the Software without restriction, including without limitation the rights
8to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9copies of the Software, and to permit persons to whom the Software is
10furnished to do so, subject to the following conditions:
11
12The above copyright notice and this permission notice shall be included in all
13copies or substantial portions of the Software.
14
15THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21SOFTWARE.
requirements.txt
1apify < 3.0
2langchain-openai==0.3.6
3langgraph==0.2.73
4aiohttp>=3.8.0
5langchain>=0.1.0
6pydantic>=2.0.0
7langchain-core>=0.1.0
8langchain_community==0.3.19
.actor/Dockerfile
1# First, specify the base Docker image.
2# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
3# You can also use any other image from Docker Hub.
4FROM apify/actor-python-playwright:3.13
5
6# Install build dependencies first
7RUN apt-get update && apt-get install -y build-essential gcc g++ python3-dev
8
9# Second, copy just requirements.txt into the Actor image,
10# since it should be the only file that affects the dependency install in the next step,
11# in order to speed up the build
12COPY requirements.txt ./
13
14# Install the packages specified in requirements.txt,
15# Print the installed Python version, pip version
16# and all installed packages with their versions for debugging
17RUN echo "Python version:" \
18 && python --version \
19 && echo "Pip version:" \
20 && pip --version \
21 && echo "Installing dependencies:" \
22 && pip install --only-binary=:all: -r requirements.txt \
23 && echo "All installed Python packages:" \
24 && pip freeze
25
26# Next, copy the remaining files and directories with the source code.
27# Since we do this after installing the dependencies, quick build will be really fast
28# for most source file changes.
29COPY . ./
30
31# Use compileall to ensure the runnability of the Actor Python code.
32RUN python3 -m compileall -q .
33
34# Specify how to launch the source code of your Actor.
35# By default, the "python3 -m src" command is run
36CMD ["python3", "-m", "src"]
.actor/actor.json
1{
2 "actorSpecification": 1,
3 "name": "listing-sleuth",
4 "title": "Listing Sleuth - Real Estate Monitor",
5 "description": "Monitors real estate listings across multiple platforms based on user-specified criteria",
6 "version": "0.1",
7 "buildTag": "latest",
8 "restart": {
9 "horizontalScaling": true
10 },
11 "dockerfile": "./Dockerfile",
12 "input": "./input_schema.json",
13 "storages": {
14 "dataset": "./dataset_schema.json"
15 },
16 "license": "MIT",
17 "monetization": {
18 "type": "pay-per-event",
19 "enabled": true,
20 "priceSchemaPath": "./pay_per_event.json"
21 }
22}
.actor/dataset_schema.json
1{
2 "actorSpecification": 1,
3 "fields": {
4 "type": "object",
5 "properties": {
6 "id": {
7 "type": "string",
8 "description": "Unique identifier for the property listing"
9 },
10 "title": {
11 "type": "string",
12 "description": "Property title or name"
13 },
14 "description": {
15 "type": "string",
16 "description": "Detailed description of the property"
17 },
18 "price": {
19 "type": "number",
20 "description": "Price of the property (in USD)"
21 },
22 "bedrooms": {
23 "type": "number",
24 "description": "Number of bedrooms"
25 },
26 "bathrooms": {
27 "type": "number",
28 "description": "Number of bathrooms"
29 },
30 "address": {
31 "type": "string",
32 "description": "Property address"
33 },
34 "property_type": {
35 "type": "string",
36 "description": "Type of property (apartment, house, condo, etc.)"
37 },
38 "source": {
39 "type": "string",
40 "description": "Source of the listing (zillow, realtor, apartments, etc.)"
41 },
42 "url": {
43 "type": "string",
44 "description": "Link to the original listing"
45 },
46 "amenities": {
47 "type": "array",
48 "description": "List of amenities available at the property",
49 "items": {
50 "type": "string"
51 }
52 },
53 "listed_date": {
54 "type": "string",
55 "description": "Date when the property was listed"
56 },
57 "is_new": {
58 "type": "boolean",
59 "description": "Whether this is a new listing since last search"
60 }
61 }
62 },
63 "views": {
64 "overview": {
65 "title": "Property Listings",
66 "description": "Real estate property listings matching the search criteria",
67 "transformation": {
68 "fields": [
69 "id",
70 "title",
71 "price",
72 "bedrooms",
73 "bathrooms",
74 "address",
75 "property_type",
76 "source",
77 "url",
78 "listed_date",
79 "is_new"
80 ]
81 },
82 "display": {
83 "component": "table",
84 "properties": {
85 "id": {
86 "label": "ID",
87 "format": "text"
88 },
89 "title": {
90 "label": "Title",
91 "format": "text"
92 },
93 "price": {
94 "label": "Price",
95 "format": "number"
96 },
97 "bedrooms": {
98 "label": "Bedrooms",
99 "format": "number"
100 },
101 "bathrooms": {
102 "label": "Bathrooms",
103 "format": "number"
104 },
105 "address": {
106 "label": "Address",
107 "format": "text"
108 },
109 "property_type": {
110 "label": "Property Type",
111 "format": "text"
112 },
113 "source": {
114 "label": "Source",
115 "format": "text"
116 },
117 "url": {
118 "label": "URL",
119 "format": "link"
120 },
121 "listed_date": {
122 "label": "Listed Date",
123 "format": "date"
124 },
125 "is_new": {
126 "label": "New Listing",
127 "format": "boolean"
128 }
129 }
130 }
131 },
132 "details": {
133 "title": "Detailed View",
134 "description": "Detailed information about property listings",
135 "transformation": {
136 "fields": [
137 "id",
138 "title",
139 "description",
140 "price",
141 "bedrooms",
142 "bathrooms",
143 "address",
144 "property_type",
145 "source",
146 "url",
147 "amenities",
148 "listed_date",
149 "is_new"
150 ]
151 },
152 "display": {
153 "component": "table",
154 "properties": {
155 "description": {
156 "label": "Description",
157 "format": "text"
158 },
159 "amenities": {
160 "label": "Amenities",
161 "format": "array"
162 }
163 }
164 }
165 }
166 }
167}
.actor/input_schema.json
1{
2 "title": "Listing Sleuth - Real Estate Monitor",
3 "type": "object",
4 "schemaVersion": 1,
5 "properties": {
6 "location": {
7 "title": "Location",
8 "type": "string",
9 "description": "City or neighborhood to search in (e.g., 'San Francisco, CA')",
10 "editor": "textfield"
11 },
12 "propertyType": {
13 "title": "Property Type",
14 "type": "string",
15 "description": "Type of property to look for",
16 "enum": ["apartment", "house", "condo", "townhouse", "any"],
17 "enumTitles": ["Apartment", "House", "Condo", "Townhouse", "Any"],
18 "default": "any",
19 "editor": "select"
20 },
21 "minBedrooms": {
22 "title": "Minimum Bedrooms",
23 "type": "integer",
24 "description": "Minimum number of bedrooms",
25 "default": 1,
26 "minimum": 0,
27 "editor": "number"
28 },
29 "maxBedrooms": {
30 "title": "Maximum Bedrooms",
31 "type": "integer",
32 "description": "Maximum number of bedrooms (leave blank for no maximum)",
33 "minimum": 0,
34 "nullable": true,
35 "editor": "number"
36 },
37 "minPrice": {
38 "title": "Minimum Price",
39 "type": "integer",
40 "description": "Minimum price (in USD)",
41 "default": 0,
42 "minimum": 0,
43 "editor": "number"
44 },
45 "maxPrice": {
46 "title": "Maximum Price",
47 "type": "integer",
48 "description": "Maximum price (in USD)",
49 "minimum": 0,
50 "nullable": true,
51 "editor": "number"
52 },
53 "amenities": {
54 "title": "Amenities",
55 "type": "array",
56 "description": "Desired amenities for the property",
57 "editor": "stringList",
58 "default": []
59 },
60 "searchType": {
61 "title": "Search Type",
62 "type": "string",
63 "description": "Type of search to perform",
64 "enum": ["rent", "buy"],
65 "enumTitles": ["Rent", "Buy"],
66 "default": "rent",
67 "editor": "select"
68 },
69 "sources": {
70 "title": "Data Sources",
71 "type": "array",
72 "description": "Sources to search for listings",
73 "editor": "stringList",
74 "default": ["zillow", "realtor", "apartments"]
75 },
76 "llmApiToken": {
77 "title": "LLM API Token",
78 "type": "string",
79 "description": "OpenAI API token for processing results (optional)",
80 "editor": "textfield",
81 "nullable": true
82 }
83 },
84 "required": ["location"]
85}
.actor/pay_per_event.json
1{
2 "actor-start": {
3 "eventTitle": "Search Initiated",
4 "eventDescription": "Flat fee for starting a real estate search.",
5 "eventPriceUsd": 0.1
6 },
7 "property-found": {
8 "eventTitle": "Property Found",
9 "eventDescription": "Fee for each property matching your criteria.",
10 "eventPriceUsd": 0.05
11 },
12 "search-completed": {
13 "eventTitle": "Search Completed",
14 "eventDescription": "Fee for completing a full property search across all selected platforms.",
15 "eventPriceUsd": 0.3
16 }
17}
src/__init__.py
src/__main__.py
1import asyncio
2
3from .main import main
4
5# Execute the Actor entry point.
6asyncio.run(main())
src/main.py
1"""Main entry point for the Listing Sleuth Apify Actor.
2
3This module contains the main entry point for the Actor, which searches for real estate
4listings based on user-specified criteria.
5"""
6
7import os
8import sys
9import json
10from apify import Actor
11from dotenv import load_dotenv
12
13from .models.property import SearchCriteria
14from .search_agent import SearchAgentCrew
15
16# Load environment variables from .env file if present
17load_dotenv()
18
19
20async def main() -> None:
21 """Main entry point for the Apify Actor.
22
23 This function initializes the Actor, processes input data, runs the search agent,
24 and saves the results to the Actor's dataset.
25 """
26 # Enter the context of the Actor.
27 async with Actor:
28 # Log the Actor's version
29 Actor.log.info(f"Listing Sleuth is starting...")
30
31 # Charge for actor start
32 await Actor.charge('actor-start')
33
34 # Retrieve the Actor input, and use default values if not provided.
35 actor_input = await Actor.get_input() or {}
36
37 # For local testing, try to load from INPUT.json if actor_input is empty
38 if not actor_input or 'location' not in actor_input:
39 try:
40 if os.path.exists('INPUT.json'):
41 with open('INPUT.json', 'r') as f:
42 actor_input = json.load(f)
43 Actor.log.info(f"Loaded input from INPUT.json: {actor_input}")
44 except Exception as e:
45 Actor.log.error(f"Error loading from INPUT.json: {str(e)}")
46
47 Actor.log.info(f"Using input: {actor_input}")
48
49 # Parse location (required)
50 location = actor_input.get("location")
51 if not location:
52 Actor.log.error("No location specified in Actor input, exiting...")
53 # Just exit with an error code
54 sys.exit(1)
55
56 # Parse other inputs with defaults
57 property_type = actor_input.get("propertyType", "any")
58 min_bedrooms = int(actor_input.get("minBedrooms", 1))
59 max_bedrooms = actor_input.get("maxBedrooms")
60 if max_bedrooms is not None:
61 max_bedrooms = int(max_bedrooms)
62
63 min_price = float(actor_input.get("minPrice", 0))
64 max_price = actor_input.get("maxPrice")
65 if max_price is not None:
66 max_price = float(max_price)
67
68 # Amenities as a list
69 amenities = actor_input.get("amenities", [])
70
71 # Search type (rent/buy)
72 search_type = actor_input.get("searchType", "rent")
73
74 # Data sources to search
75 sources = actor_input.get("sources", ["zillow", "realtor", "apartments"])
76
77 # LLM API token (optional)
78 llm_api_token = actor_input.get("llmApiToken") or os.environ.get("OPENAI_API_KEY")
79
80 # Create search criteria
81 search_criteria = SearchCriteria(
82 location=location,
83 property_type=property_type,
84 min_bedrooms=min_bedrooms,
85 max_bedrooms=max_bedrooms,
86 min_price=min_price,
87 max_price=max_price,
88 amenities=amenities,
89 search_type=search_type,
90 sources=sources,
91 llm_api_token=llm_api_token
92 )
93
94 Actor.log.info(f"Search criteria: {search_criteria}")
95
96 # Create and run the search agent
97 search_agent = SearchAgentCrew(search_criteria)
98 results = search_agent.run()
99
100 # Charge for each property found
101 if results.total_results > 0:
102 await Actor.charge('property-found', count=results.total_results)
103
104 # Log results
105 Actor.log.info(f"Search complete. Found {results.total_results} properties.")
106 Actor.log.info(f"New listings: {results.new_results}")
107
108 # Charge for search completion
109 await Actor.charge('search-completed')
110
111 # The results have already been saved to the dataset by the search agent
src/py.typed
1
src/search_agent.py
1"""Search agent for real estate properties."""
2
3import os
4import json
5from typing import List, Dict, Any, Optional, Tuple
6from datetime import datetime
7from crewai import Agent, Task, Crew
8from langchain.tools import BaseTool
9from langchain_openai import ChatOpenAI
10from apify import Actor
11
12from .models.property import PropertyListing, SearchCriteria, SearchResults
13from .scrapers.zillow import ZillowScraper
14from .scrapers.realtor import RealtorScraper
15from .scrapers.apartments import ApartmentsScraper
16from .utils.llm import filter_properties_with_llm, summarize_property
17from .utils.storage import (
18 load_previous_results,
19 mark_new_listings,
20 save_search_results,
21 push_results_to_dataset
22)
23
24
25class SearchTool(BaseTool):
26 """Tool for searching real estate listings."""
27
28 name = "search_real_estate"
29 description = "Search for real estate listings based on search criteria"
30 search_criteria: SearchCriteria = None
31
32 def __init__(self, search_criteria: SearchCriteria):
33 """Initialize the search tool.
34
35 Args:
36 search_criteria: Search criteria
37 """
38 super().__init__()
39 self.search_criteria = search_criteria
40
41 def _run(self, query: str) -> Dict[str, Any]:
42 """Run the search tool.
43
44 Args:
45 query: Search query (not used, but required by BaseTool)
46
47 Returns:
48 Search results
49 """
50 # Initialize scrapers
51 scrapers = []
52 if "zillow" in self.search_criteria.sources:
53 scrapers.append(ZillowScraper(self.search_criteria))
54 if "realtor" in self.search_criteria.sources:
55 scrapers.append(RealtorScraper(self.search_criteria))
56 if "apartments" in self.search_criteria.sources:
57 scrapers.append(ApartmentsScraper(self.search_criteria))
58
59 # Run scrapers
60 all_listings = []
61 sources_searched = []
62
63 for scraper in scrapers:
64 try:
65 listings = scraper.scrape()
66 all_listings.extend(listings)
67 sources_searched.append(scraper.source_name)
68 except Exception as e:
69 Actor.log.exception(f"Error scraping {scraper.source_name}: {e}")
70
71 # Load previous results
72 previous_results = load_previous_results(self.search_criteria)
73
74 # Mark new listings
75 marked_listings = mark_new_listings(all_listings, previous_results)
76
77 # Create search results
78 results = SearchResults(
79 search_criteria=self.search_criteria,
80 results=marked_listings,
81 total_results=len(marked_listings),
82 new_results=sum(1 for listing in marked_listings if listing.is_new),
83 sources_searched=sources_searched
84 )
85
86 # Save results
87 save_search_results(results)
88 push_results_to_dataset(results)
89
90 # Return results
91 return {
92 "total_results": results.total_results,
93 "new_results": results.new_results,
94 "sources_searched": results.sources_searched,
95 "search_date": results.search_date.isoformat()
96 }
97
98 async def _arun(self, query: str) -> Dict[str, Any]:
99 """Async version of _run.
100
101 Args:
102 query: Search query
103
104 Returns:
105 Search results
106 """
107 return self._run(query)
108
109
110class FilterTool(BaseTool):
111 """Tool for filtering property listings with LLM."""
112
113 name = "filter_properties"
114 description = "Filter property listings based on search criteria using LLM"
115 search_criteria: SearchCriteria = None
116
117 def __init__(self, search_criteria: SearchCriteria):
118 """Initialize the filter tool.
119
120 Args:
121 search_criteria: Search criteria
122 """
123 super().__init__()
124 self.search_criteria = search_criteria
125
126 def _run(self, query: str) -> Dict[str, Any]:
127 """Run the filter tool.
128
129 Args:
130 query: Filter query (not used, but required by BaseTool)
131
132 Returns:
133 Filtered search results
134 """
135 # Try to load saved results
136 try:
137 results_dict = None
138
139 # Try to load from Apify KV store if available
140 if hasattr(Actor, 'main_kv_store'):
141 results_dict = Actor.main_kv_store.get_value("search_results")
142 # Otherwise try to load from local file
143 elif os.path.exists("storage/key_value_stores/search_results.json"):
144 with open("storage/key_value_stores/search_results.json", "r") as f:
145 results_dict = json.load(f)
146
147 if not results_dict:
148 return {"error": "No search results found"}
149
150 # Convert to SearchResults
151 search_results = SearchResults(**results_dict)
152
153 if not search_results.results:
154 return {"error": "No results to filter"}
155
156 # Filter results with LLM if token is available
157 if self.search_criteria.llm_api_token:
158 filtered_listings = filter_properties_with_llm(
159 search_results.results,
160 self.search_criteria,
161 self.search_criteria.llm_api_token
162 )
163
164 # Update results
165 search_results.results = filtered_listings
166 search_results.total_results = len(filtered_listings)
167
168 # Save filtered results
169 save_search_results(search_results)
170
171 return {
172 "total_results_after_filtering": len(filtered_listings),
173 "filter_date": datetime.now().isoformat()
174 }
175 else:
176 return {"error": "No LLM API token provided for filtering"}
177
178 except Exception as e:
179 Actor.log.exception(f"Error filtering properties: {e}")
180 return {"error": str(e)}
181
182 async def _arun(self, query: str) -> Dict[str, Any]:
183 """Async version of _run.
184
185 Args:
186 query: Filter query
187
188 Returns:
189 Filtered search results
190 """
191 return self._run(query)
192
193
194class SummarizeTool(BaseTool):
195 """Tool for summarizing property listings."""
196
197 name = "summarize_properties"
198 description = "Generate summaries of property listings"
199 search_criteria: SearchCriteria = None
200
201 def __init__(self, search_criteria: SearchCriteria):
202 """Initialize the summarize tool.
203
204 Args:
205 search_criteria: Search criteria
206 """
207 super().__init__()
208 self.search_criteria = search_criteria
209
210 def _run(self, query: str) -> Dict[str, Any]:
211 """Run the summarize tool.
212
213 Args:
214 query: Summarize query (not used, but required by BaseTool)
215
216 Returns:
217 Summarized search results
218 """
219 # Try to load saved results
220 try:
221 results_dict = None
222
223 # Try to load from Apify KV store if available
224 if hasattr(Actor, 'main_kv_store'):
225 results_dict = Actor.main_kv_store.get_value("search_results")
226 # Otherwise try to load from local file
227 elif os.path.exists("storage/key_value_stores/search_results.json"):
228 with open("storage/key_value_stores/search_results.json", "r") as f:
229 results_dict = json.load(f)
230
231 if not results_dict:
232 return {"error": "No search results found"}
233
234 # Convert to SearchResults
235 search_results = SearchResults(**results_dict)
236
237 if not search_results.results:
238 return {"error": "No results to summarize"}
239
240 # Generate summaries if LLM API token is available
241 if self.search_criteria.llm_api_token:
242 summaries = []
243
244 for listing in search_results.results:
245 summary = summarize_property(listing, self.search_criteria.llm_api_token)
246 summaries.append({
247 "id": listing.id,
248 "summary": summary,
249 "is_new": listing.is_new
250 })
251
252 return {
253 "summaries": summaries,
254 "total_summaries": len(summaries),
255 "summarize_date": datetime.now().isoformat()
256 }
257 else:
258 # Generate basic summaries without LLM
259 summaries = []
260
261 for listing in search_results.results:
262 basic_summary = (
263 f"{listing.title}: {listing.bedrooms} bed, "
264 f"{listing.bathrooms or 'unknown'} bath {listing.property_type} "
265 f"for ${listing.price:,.2f} in {listing.address.city}, "
266 f"{listing.address.state}."
267 )
268
269 summaries.append({
270 "id": listing.id,
271 "summary": basic_summary,
272 "is_new": listing.is_new
273 })
274
275 return {
276 "summaries": summaries,
277 "total_summaries": len(summaries),
278 "summarize_date": datetime.now().isoformat()
279 }
280
281 except Exception as e:
282 Actor.log.exception(f"Error summarizing properties: {e}")
283 return {"error": str(e)}
284
285 async def _arun(self, query: str) -> Dict[str, Any]:
286 """Async version of _run.
287
288 Args:
289 query: Summarize query
290
291 Returns:
292 Summarized search results
293 """
294 return self._run(query)
295
296
297class SearchAgentCrew:
298 """Crew of agents for property search."""
299
300 def __init__(self, search_criteria: SearchCriteria):
301 """Initialize the search agent crew.
302
303 Args:
304 search_criteria: Search criteria
305 """
306 self.search_criteria = search_criteria
307 self.llm = None
308
309 # Initialize LLM if token is provided
310 if search_criteria.llm_api_token:
311 self.llm = ChatOpenAI(
312 api_key=search_criteria.llm_api_token,
313 temperature=0,
314 model="gpt-3.5-turbo"
315 )
316
317 def run(self) -> SearchResults:
318 """Run the search agent crew.
319
320 Returns:
321 Search results
322 """
323 # If no LLM, just run the search directly
324 if not self.llm:
325 Actor.log.info("No LLM API token provided, running basic search without agents")
326 search_tool = SearchTool(self.search_criteria)
327 search_tool._run("")
328
329 # Load and return results
330 try:
331 # Try loading from Apify KV store if available
332 if hasattr(Actor, 'main_kv_store'):
333 results_dict = Actor.main_kv_store.get_value("search_results")
334 # Otherwise try to load from local file
335 elif os.path.exists("storage/key_value_stores/search_results.json"):
336 with open("storage/key_value_stores/search_results.json", "r") as f:
337 results_dict = json.load(f)
338 else:
339 results_dict = None
340
341 if results_dict:
342 return SearchResults(**results_dict)
343 except Exception as e:
344 Actor.log.error(f"Error loading search results: {e}")
345
346 # Create empty results if loading failed
347 return SearchResults(
348 search_criteria=self.search_criteria,
349 results=[],
350 total_results=0,
351 new_results=0,
352 sources_searched=[]
353 )
354
355 # Create tools
356 search_tool = SearchTool(self.search_criteria)
357 filter_tool = FilterTool(self.search_criteria)
358 summarize_tool = SummarizeTool(self.search_criteria)
359
360 # Create agents
361 search_agent = Agent(
362 role="Real Estate Search Specialist",
363 goal="Find properties that match the search criteria",
364 backstory="You are an expert in finding real estate listings across multiple platforms.",
365 verbose=True,
366 allow_delegation=True,
367 tools=[search_tool],
368 llm=self.llm
369 )
370
371 filter_agent = Agent(
372 role="Property Filter Specialist",
373 goal="Filter properties to find the best matches for the user",
374 backstory="You are an expert in analyzing property details and matching them with user preferences.",
375 verbose=True,
376 allow_delegation=True,
377 tools=[filter_tool],
378 llm=self.llm
379 )
380
381 summarize_agent = Agent(
382 role="Property Summarizer",
383 goal="Create concise, informative summaries of properties",
384 backstory="You are skilled at creating appealing property descriptions that highlight key features.",
385 verbose=True,
386 allow_delegation=True,
387 tools=[summarize_tool],
388 llm=self.llm
389 )
390
391 # Create tasks
392 search_task = Task(
393 description=(
394 f"Search for properties in {self.search_criteria.location} "
395 f"with {self.search_criteria.min_bedrooms}+ bedrooms, "
396 f"maximum price of ${self.search_criteria.max_price or 'any'}, "
397 f"property type: {self.search_criteria.property_type}. "
398 f"Search sources: {', '.join(self.search_criteria.sources)}."
399 ),
400 agent=search_agent,
401 expected_output="A report of the total number of properties found"
402 )
403
404 filter_task = Task(
405 description=(
406 "Filter the search results to find properties that best match "
407 f"the user's criteria, especially regarding amenities: {', '.join(self.search_criteria.amenities)}"
408 ),
409 agent=filter_agent,
410 expected_output="A report of how many properties passed the filtering"
411 )
412
413 summarize_task = Task(
414 description=(
415 "Create summaries for each property highlighting key features. "
416 "Mark new listings that weren't found in previous searches."
417 ),
418 agent=summarize_agent,
419 expected_output="Summaries of each property"
420 )
421
422 # Create crew
423 crew = Crew(
424 agents=[search_agent, filter_agent, summarize_agent],
425 tasks=[search_task, filter_task, summarize_task],
426 verbose=True
427 )
428
429 # Run the crew
430 try:
431 result = crew.kickoff()
432
433 # Load and return results
434 try:
435 # Try loading from Apify KV store if available
436 if hasattr(Actor, 'main_kv_store'):
437 results_dict = Actor.main_kv_store.get_value("search_results")
438 # Otherwise try to load from local file
439 elif os.path.exists("storage/key_value_stores/search_results.json"):
440 with open("storage/key_value_stores/search_results.json", "r") as f:
441 results_dict = json.load(f)
442 else:
443 results_dict = None
444
445 if results_dict:
446 return SearchResults(**results_dict)
447 except Exception as e:
448 Actor.log.error(f"Error loading search results: {e}")
449 except Exception as e:
450 Actor.log.error(f"Error running crew: {e}")
451
452 # If we got here, either there was an error or no results were found
453 # Create empty results
454 return SearchResults(
455 search_criteria=self.search_criteria,
456 results=[],
457 total_results=0,
458 new_results=0,
459 sources_searched=[]
460 )
src/agents/__init__.py
1"""Agent classes for Listing Sleuth."""
src/scrapers/__init__.py
1"""Scrapers for real estate platforms."""
src/scrapers/apartments.py
1"""Apartments.com scraper."""
2
3import re
4import json
5import uuid
6from typing import Dict, Any, List, Optional
7from datetime import datetime
8from pydantic import HttpUrl
9
10from apify import Actor
11from apify_client import ApifyClient
12
13from .base import BaseScraper
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class ApartmentsScraper(BaseScraper):
18 """Apartments.com scraper."""
19
20 @property
21 def actor_id(self) -> str:
22 """Get Apify actor ID for Apartments.com.
23
24 Returns:
25 Actor ID
26 """
27 return "epctex/apartments-scraper"
28
29 @property
30 def source_name(self) -> str:
31 """Get source name.
32
33 Returns:
34 Source name
35 """
36 return "apartments"
37
38 def prepare_input(self) -> Dict[str, Any]:
39 """Prepare input for the Apartments.com scraper.
40
41 Returns:
42 Actor input
43 """
44 # Parse location into city and state
45 location_parts = self.search_criteria.location.split(",")
46 city = location_parts[0].strip().replace(" ", "-").lower()
47 state = ""
48 if len(location_parts) > 1:
49 state = location_parts[1].strip().lower()
50
51 # Construct location for URL
52 if state:
53 location_url = f"{city}-{state}"
54 else:
55 location_url = city
56
57 # Base URL
58 base_url = f"https://www.apartments.com/{location_url}"
59
60 # Start building search parameters
61 search_params = {}
62
63 # Bedrooms filter
64 if self.search_criteria.min_bedrooms > 0 and self.search_criteria.max_bedrooms:
65 if self.search_criteria.min_bedrooms == self.search_criteria.max_bedrooms:
66 search_params["br"] = str(self.search_criteria.min_bedrooms)
67 else:
68 search_params["br-min"] = str(self.search_criteria.min_bedrooms)
69 search_params["br-max"] = str(self.search_criteria.max_bedrooms)
70 elif self.search_criteria.min_bedrooms > 0:
71 search_params["br-min"] = str(self.search_criteria.min_bedrooms)
72 elif self.search_criteria.max_bedrooms:
73 search_params["br-max"] = str(self.search_criteria.max_bedrooms)
74
75 # Price filter
76 if self.search_criteria.min_price > 0:
77 search_params["price-min"] = str(int(self.search_criteria.min_price))
78 if self.search_criteria.max_price:
79 search_params["price-max"] = str(int(self.search_criteria.max_price))
80
81 # Property type - apartments.com primarily focuses on apartments, but can filter for types
82 if self.search_criteria.property_type != "any" and self.search_criteria.property_type != "apartment":
83 search_params["type"] = self.search_criteria.property_type
84
85 return {
86 "startUrls": [{"url": base_url}],
87 "searchParams": search_params,
88 "maxItems": self.max_items,
89 "extendOutputFunction": """async ({ data, item, customData, Apify }) => {
90 return { ...item };
91 }""",
92 "proxy": {
93 "useApifyProxy": True,
94 "apifyProxyGroups": ["RESIDENTIAL"]
95 }
96 }
97
98 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
99 """Transform an Apartments.com listing to a PropertyListing.
100
101 Args:
102 item: Apartments.com listing
103
104 Returns:
105 PropertyListing
106 """
107 # Parse price
108 price_str = item.get("rent", "0")
109 if isinstance(price_str, str):
110 # Extract digits from price string
111 price_match = re.search(r'(\d{1,3}(?:,\d{3})*(?:\.\d+)?)', price_str)
112 if price_match:
113 price_clean = price_match.group(1).replace(",", "")
114 price = float(price_clean)
115 else:
116 price = 0
117 else:
118 price = float(price_str) if price_str else 0
119
120 # Parse address
121 property_address = item.get("propertyAddress", {})
122 address_line = property_address.get("addressLine", "")
123 neighborhood = property_address.get("neighborhood", "")
124 city = property_address.get("city", "")
125 state = property_address.get("state", "")
126 postal_code = property_address.get("postalCode", None)
127
128 address = Address(
129 street=address_line,
130 city=city or neighborhood, # Use neighborhood if city is missing
131 state=state,
132 zip_code=postal_code
133 )
134
135 # Parse bedrooms
136 bedrooms = 0
137 beds = item.get("beds", 0)
138 if isinstance(beds, str):
139 bed_match = re.search(r'(\d+\.?\d*)', beds)
140 bedrooms = float(bed_match.group(1)) if bed_match else 0
141 else:
142 bedrooms = float(beds) if beds else 0
143
144 # Parse bathrooms
145 bathrooms = None
146 baths = item.get("baths", None)
147 if baths:
148 if isinstance(baths, str):
149 bath_match = re.search(r'(\d+\.?\d*)', baths)
150 bathrooms = float(bath_match.group(1)) if bath_match else None
151 else:
152 bathrooms = float(baths)
153
154 # Parse square feet
155 sqft = None
156 sqft_str = item.get("sqft", None)
157 if sqft_str:
158 if isinstance(sqft_str, str):
159 sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))
160 sqft = int(sqft_match.group(1)) if sqft_match else None
161 else:
162 sqft = int(sqft_str)
163
164 # Determine property type
165 property_type = "apartment" # Default for apartments.com
166 if "condo" in item.get("title", "").lower() or "condo" in item.get("description", "").lower():
167 property_type = "condo"
168 elif "townhouse" in item.get("title", "").lower() or "townhouse" in item.get("description", "").lower():
169 property_type = "townhouse"
170 elif "house" in item.get("title", "").lower() and "townhouse" not in item.get("title", "").lower():
171 property_type = "house"
172
173 # Get URL
174 url = item.get("url", "")
175
176 # Get images
177 images = []
178 photos = item.get("photos", [])
179 if isinstance(photos, list):
180 for photo in photos:
181 if isinstance(photo, dict) and "url" in photo:
182 images.append(photo["url"])
183 elif isinstance(photo, str) and photo.startswith("http"):
184 images.append(photo)
185
186 # Extract amenities
187 amenities = []
188
189 # Add apartment amenities
190 apartment_amenities = item.get("apartmentAmenities", [])
191 if isinstance(apartment_amenities, list):
192 amenities.extend(apartment_amenities)
193
194 # Add community amenities
195 community_amenities = item.get("communityAmenities", [])
196 if isinstance(community_amenities, list):
197 amenities.extend(community_amenities)
198
199 # Also use the base extract_amenities method to catch any missed ones
200 amenities.extend(self.extract_amenities(item))
201
202 # Remove duplicates while preserving order
203 amenities = list(dict.fromkeys(amenities))
204
205 # Generate a unique ID
206 property_id = str(item.get("id", uuid.uuid4()))
207
208 # Create features dictionary for additional data
209 additional_features = {}
210 for key, value in item.items():
211 if key not in [
212 "rent", "propertyAddress", "beds", "baths", "sqft", "url", "photos",
213 "apartmentAmenities", "communityAmenities", "id", "title", "description",
214 ]:
215 additional_features[key] = value
216
217 # Parse listing date if available
218 listed_date = None
219 date_str = item.get("dateAvailable", item.get("datePosted", None))
220 if date_str and isinstance(date_str, str):
221 try:
222 # Try common date formats
223 for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%b %d, %Y"]:
224 try:
225 listed_date = datetime.strptime(date_str, fmt)
226 break
227 except ValueError:
228 continue
229 except Exception:
230 pass
231
232 return PropertyListing(
233 id=property_id,
234 title=item.get("title", "Property Listing"),
235 description=item.get("description", None),
236 price=price,
237 address=address,
238 bedrooms=bedrooms,
239 bathrooms=bathrooms,
240 square_feet=sqft,
241 property_type=property_type,
242 url=url,
243 source="apartments",
244 amenities=amenities,
245 images=images,
246 listed_date=listed_date,
247 features=additional_features
248 )
src/scrapers/base.py
1"""Base scraper class for all real estate platform scrapers."""
2
3import re
4import json
5import uuid
6import os
7from abc import ABC, abstractmethod
8from typing import List, Dict, Any, Optional
9from datetime import datetime
10from apify import Actor
11from apify_client import ApifyClient
12from pydantic import HttpUrl
13
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class BaseScraper(ABC):
18 """Base scraper class that all platform-specific scrapers should inherit from."""
19
20 def __init__(
21 self,
22 search_criteria: SearchCriteria,
23 apify_client: Optional[ApifyClient] = None,
24 max_items: int = 100
25 ):
26 """Initialize the scraper.
27
28 Args:
29 search_criteria: Search criteria
30 apify_client: Apify client. If None, creates a new client
31 max_items: Maximum number of items to scrape
32 """
33 self.search_criteria = search_criteria
34 self.apify_client = apify_client or ApifyClient()
35 self.max_items = max_items
36
37 @property
38 @abstractmethod
39 def actor_id(self) -> str:
40 """Apify actor ID for the scraper.
41
42 Returns:
43 Actor ID
44 """
45 pass
46
47 @property
48 @abstractmethod
49 def source_name(self) -> str:
50 """Name of the source.
51
52 Returns:
53 Source name
54 """
55 pass
56
57 @abstractmethod
58 def prepare_input(self) -> Dict[str, Any]:
59 """Prepare input for the Apify actor.
60
61 Returns:
62 Actor input
63 """
64 pass
65
66 @abstractmethod
67 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
68 """Transform a scraped item into a PropertyListing.
69
70 Args:
71 item: Scraped item
72
73 Returns:
74 PropertyListing
75 """
76 pass
77
78 def parse_address(self, address_str: str) -> Address:
79 """Parse address string into Address model.
80
81 Args:
82 address_str: Address string
83
84 Returns:
85 Address
86 """
87 # Default implementation with simple parsing
88 # Subclasses can override for platform-specific parsing
89 address_parts = address_str.split(",")
90
91 if len(address_parts) >= 3:
92 street = address_parts[0].strip()
93 city = address_parts[1].strip()
94 state_zip = address_parts[2].strip().split()
95 state = state_zip[0].strip() if state_zip else ""
96 zip_code = state_zip[1].strip() if len(state_zip) > 1 else None
97 elif len(address_parts) == 2:
98 street = None
99 city = address_parts[0].strip()
100 state_zip = address_parts[1].strip().split()
101 state = state_zip[0].strip() if state_zip else ""
102 zip_code = state_zip[1].strip() if len(state_zip) > 1 else None
103 else:
104 # If we can't parse the address properly, use a minimal approach
105 street = None
106 # Try to extract a known state abbreviation
107 state_match = re.search(r'\b([A-Z]{2})\b', address_str)
108 if state_match:
109 state = state_match.group(1)
110 # Assume the city is before the state
111 city_match = re.search(r'([^,]+),\s*' + state, address_str)
112 city = city_match.group(1) if city_match else address_str
113 else:
114 # If we can't extract a state, use the whole string as city
115 city = address_str
116 state = ""
117 zip_code = None
118
119 return Address(
120 street=street,
121 city=city,
122 state=state,
123 zip_code=zip_code
124 )
125
126 def extract_amenities(self, item: Dict[str, Any]) -> List[str]:
127 """Extract amenities from a scraped item.
128
129 Args:
130 item: Scraped item
131
132 Returns:
133 List of amenities
134 """
135 # Default implementation that subclasses can override
136 amenities = []
137
138 # Look for amenities in features or amenities field
139 if "amenities" in item and isinstance(item["amenities"], list):
140 amenities.extend(item["amenities"])
141
142 if "features" in item and isinstance(item["features"], list):
143 amenities.extend(item["features"])
144
145 # Look for amenities in description
146 if "description" in item and isinstance(item["description"], str):
147 # Common amenities to look for in descriptions
148 common_amenities = [
149 "parking", "garage", "gym", "fitness", "pool", "washer", "dryer",
150 "dishwasher", "air conditioning", "ac", "balcony", "patio",
151 "hardwood", "fireplace", "wheelchair", "elevator", "pet friendly"
152 ]
153
154 description = item["description"].lower()
155 for amenity in common_amenities:
156 if amenity in description and amenity not in amenities:
157 amenities.append(amenity)
158
159 return amenities
160
161 def scrape(self) -> List[PropertyListing]:
162 """Scrape properties based on search criteria.
163
164 Returns:
165 List of property listings
166 """
167 Actor.log.info(f"Starting {self.source_name} scraper")
168
169 # Prepare input for the Apify actor
170 input_data = self.prepare_input()
171
172 # Check if we're running in local mode for testing
173 if os.environ.get("ACTOR_TEST_PAY_PER_EVENT") == "true" and not os.environ.get("APIFY_TOKEN"):
174 Actor.log.info(f"Running in local test mode, using mock data for {self.source_name}")
175 return self.get_mock_listings()
176
177 Actor.log.info(f"Running Apify actor {self.actor_id} with input: {input_data}")
178
179 try:
180 # Run the actor
181 run = self.apify_client.actor(self.actor_id).call(
182 run_input=input_data,
183 build="latest"
184 )
185
186 # Get the dataset
187 dataset_id = run["defaultDatasetId"]
188 items = self.apify_client.dataset(dataset_id).list_items(limit=self.max_items).items
189
190 Actor.log.info(f"Scraped {len(items)} items from {self.source_name}")
191
192 # Transform items to PropertyListings
193 listings = []
194 for item in items:
195 try:
196 listing = self.transform_item(item)
197 listings.append(listing)
198 except Exception as e:
199 Actor.log.exception(f"Error transforming item: {e}")
200 continue
201
202 Actor.log.info(f"Transformed {len(listings)} listings from {self.source_name}")
203
204 return listings
205 except Exception as e:
206 Actor.log.error(f"Error scraping {self.source_name}: {e}")
207 return self.get_mock_listings()
208
209 def get_mock_listings(self) -> List[PropertyListing]:
210 """Get mock listings for local testing.
211
212 Returns:
213 List of mock property listings
214 """
215 Actor.log.info(f"Generating mock data for {self.source_name}")
216
217 # Create 5 mock listings
218 mock_listings = []
219
220 for i in range(1, 6):
221 mock_listings.append(
222 PropertyListing(
223 id=f"{self.source_name}_mock_{i}",
224 title=f"Mock {self.source_name} Listing {i}",
225 description=f"This is a mock listing for testing purposes. In {self.search_criteria.location} with {self.search_criteria.min_bedrooms} bedrooms.",
226 url=f"https://example.com/{self.source_name}/mock-listing-{i}",
227 price=float(self.search_criteria.min_price or 1000) + (i * 200),
228 bedrooms=self.search_criteria.min_bedrooms + (i % 2),
229 bathrooms=self.search_criteria.min_bedrooms / 2 + (i % 2),
230 address=Address(
231 street=f"{100 + i} Main St",
232 city=self.search_criteria.location.split(",")[0].strip(),
233 state=self.search_criteria.location.split(",")[-1].strip(),
234 zip_code="12345"
235 ),
236 property_type=self.search_criteria.property_type,
237 source=self.source_name,
238 amenities=self.search_criteria.amenities + ["parking", "air conditioning"],
239 listed_date=datetime.now(),
240 is_new=True
241 )
242 )
243
244 Actor.log.info(f"Generated {len(mock_listings)} mock listings for {self.source_name}")
245 return mock_listings
src/scrapers/realtor.py
1"""Realtor.com scraper."""
2
3import re
4import json
5import uuid
6from typing import Dict, Any, List, Optional
7from datetime import datetime
8from pydantic import HttpUrl
9
10from apify import Actor
11from apify_client import ApifyClient
12
13from .base import BaseScraper
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class RealtorScraper(BaseScraper):
18 """Realtor.com scraper."""
19
20 @property
21 def actor_id(self) -> str:
22 """Get Apify actor ID for Realtor.com.
23
24 Returns:
25 Actor ID
26 """
27 return "epctex/realtor-scraper"
28
29 @property
30 def source_name(self) -> str:
31 """Get source name.
32
33 Returns:
34 Source name
35 """
36 return "realtor"
37
38 def prepare_input(self) -> Dict[str, Any]:
39 """Prepare input for the Realtor.com scraper.
40
41 Returns:
42 Actor input
43 """
44 # Parse location into city and state
45 location_parts = self.search_criteria.location.split(",")
46 city = location_parts[0].strip().replace(" ", "-").lower()
47 state = ""
48 if len(location_parts) > 1:
49 state = location_parts[1].strip().lower()
50
51 # Property type mapping
52 property_type_map = {
53 "apartment": "apartments",
54 "house": "single-family-home",
55 "condo": "condos",
56 "townhouse": "townhomes",
57 "any": "any"
58 }
59
60 property_type = property_type_map.get(
61 self.search_criteria.property_type, "any"
62 )
63
64 # Base search URL
65 if self.search_criteria.search_type == "rent":
66 base_url = "https://www.realtor.com/apartments"
67 else:
68 base_url = "https://www.realtor.com/realestateandhomes-search"
69
70 # Construct location part of URL
71 if state:
72 location_url = f"{city}_{state}"
73 else:
74 location_url = city
75
76 # Build search URL
77 input_url = f"{base_url}/{location_url}"
78
79 # Start building search parameters
80 search_params = {}
81
82 # Add property type
83 if property_type != "any":
84 search_params["prop"] = property_type
85
86 # Add bedroom filter
87 if self.search_criteria.min_bedrooms > 0:
88 search_params["beds-lower"] = str(self.search_criteria.min_bedrooms)
89 if self.search_criteria.max_bedrooms:
90 search_params["beds-upper"] = str(self.search_criteria.max_bedrooms)
91
92 # Add price filter
93 if self.search_criteria.min_price > 0:
94 search_params["price-lower"] = str(int(self.search_criteria.min_price))
95 if self.search_criteria.max_price:
96 search_params["price-upper"] = str(int(self.search_criteria.max_price))
97
98 return {
99 "startUrls": [{"url": input_url}],
100 "searchParams": search_params,
101 "maxItems": self.max_items,
102 "extendOutputFunction": """async ({ data, item, customData, Apify }) => {
103 return { ...item };
104 }""",
105 "proxy": {
106 "useApifyProxy": True,
107 "apifyProxyGroups": ["RESIDENTIAL"]
108 }
109 }
110
111 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
112 """Transform a Realtor.com listing to a PropertyListing.
113
114 Args:
115 item: Realtor.com listing
116
117 Returns:
118 PropertyListing
119 """
120 # Parse price
121 price_str = item.get("price", "0")
122 if isinstance(price_str, str):
123 # Remove currency symbols and commas
124 price_str = re.sub(r'[^\d.]', '', price_str)
125 price = float(price_str) if price_str else 0
126 else:
127 price = float(price_str) if price_str else 0
128
129 # Get address components
130 full_address = item.get("address", "")
131 address_components = item.get("addressComponents", {})
132
133 # Construct address
134 street = address_components.get("streetName", "")
135 if "streetNumber" in address_components:
136 street = f"{address_components['streetNumber']} {street}"
137
138 address = Address(
139 street=street,
140 city=address_components.get("city", ""),
141 state=address_components.get("state", ""),
142 zip_code=address_components.get("zipcode", None)
143 )
144
145 # Parse bedrooms
146 bedrooms = 0
147 beds = item.get("beds", 0)
148 if isinstance(beds, str):
149 bed_match = re.search(r'(\d+\.?\d*)', beds)
150 bedrooms = float(bed_match.group(1)) if bed_match else 0
151 else:
152 bedrooms = float(beds) if beds else 0
153
154 # Parse bathrooms
155 bathrooms = None
156 baths = item.get("baths", None)
157 if baths:
158 if isinstance(baths, str):
159 bath_match = re.search(r'(\d+\.?\d*)', baths)
160 bathrooms = float(bath_match.group(1)) if bath_match else None
161 else:
162 bathrooms = float(baths)
163
164 # Parse square feet
165 sqft = None
166 sqft_str = item.get("sqft", None)
167 if sqft_str:
168 if isinstance(sqft_str, str):
169 sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))
170 sqft = int(sqft_match.group(1)) if sqft_match else None
171 else:
172 sqft = int(sqft_str)
173
174 # Determine property type
175 property_type = item.get("propertyType", "").lower()
176 if not property_type:
177 property_subtype = item.get("propertySubType", "").lower()
178 if property_subtype:
179 property_type = property_subtype
180 else:
181 property_type = "unknown"
182
183 # Get URL
184 url = item.get("detailUrl", "")
185 if not url.startswith("http"):
186 url = f"https://www.realtor.com{url}"
187
188 # Get images
189 images = []
190 photos = item.get("photos", [])
191 if isinstance(photos, list):
192 for photo in photos:
193 if isinstance(photo, dict) and "url" in photo:
194 images.append(photo["url"])
195 elif isinstance(photo, str) and photo.startswith("http"):
196 images.append(photo)
197
198 # Extract amenities
199 amenities = self.extract_amenities(item)
200
201 # Check for specific features in the item data
202 features = item.get("features", {})
203 if features:
204 for category, feature_list in features.items():
205 if isinstance(feature_list, list):
206 amenities.extend(feature_list)
207
208 # Generate a unique ID
209 property_id = str(item.get("listingId", uuid.uuid4()))
210
211 # Create features dictionary for additional data
212 additional_features = {}
213 for key, value in item.items():
214 if key not in [
215 "price", "address", "addressComponents", "beds", "baths", "sqft",
216 "propertyType", "propertySubType", "detailUrl", "photos", "features",
217 "listingId", "description", "amenities"
218 ]:
219 additional_features[key] = value
220
221 return PropertyListing(
222 id=property_id,
223 title=item.get("title", "Property Listing"),
224 description=item.get("description", None),
225 price=price,
226 address=address,
227 bedrooms=bedrooms,
228 bathrooms=bathrooms,
229 square_feet=sqft,
230 property_type=property_type,
231 url=url,
232 source="realtor",
233 amenities=amenities,
234 images=images,
235 features=additional_features
236 )
src/scrapers/zillow.py
1"""Zillow scraper."""
2
3import re
4import json
5import uuid
6from typing import Dict, Any, List, Optional
7from datetime import datetime
8from pydantic import HttpUrl
9
10from apify import Actor
11from apify_client import ApifyClient
12
13from .base import BaseScraper
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class ZillowScraper(BaseScraper):
18 """Zillow scraper."""
19
20 @property
21 def actor_id(self) -> str:
22 """Get Apify actor ID for Zillow.
23
24 Returns:
25 Actor ID
26 """
27 return "maxcopell/zillow-detail-scraper"
28
29 @property
30 def source_name(self) -> str:
31 """Get source name.
32
33 Returns:
34 Source name
35 """
36 return "zillow"
37
38 def prepare_input(self) -> Dict[str, Any]:
39 """Prepare input for the Zillow scraper.
40
41 Returns:
42 Actor input
43 """
44 location = self.search_criteria.location.replace(", ", ",").replace(" ", "-").lower()
45
46 # Property type mapping
47 property_type_map = {
48 "apartment": "apartment",
49 "house": "house",
50 "condo": "condo",
51 "townhouse": "townhome",
52 "any": ""
53 }
54
55 property_type = property_type_map.get(
56 self.search_criteria.property_type, ""
57 )
58
59 # Build the URL
60 if self.search_criteria.search_type == "rent":
61 base_url = f"https://www.zillow.com/homes/for_rent/{location}"
62 else:
63 base_url = f"https://www.zillow.com/homes/{location}"
64
65 # Add filters based on search criteria
66 filters = []
67
68 # Price filter
69 if self.search_criteria.min_price > 0 or self.search_criteria.max_price:
70 price_filter = "price"
71 if self.search_criteria.min_price > 0:
72 price_filter += f"_gte-{int(self.search_criteria.min_price)}"
73 if self.search_criteria.max_price:
74 price_filter += f"_lte-{int(self.search_criteria.max_price)}"
75 filters.append(price_filter)
76
77 # Bedroom filter
78 if self.search_criteria.min_bedrooms > 0 or self.search_criteria.max_bedrooms:
79 if self.search_criteria.min_bedrooms == self.search_criteria.max_bedrooms:
80 filters.append(f"{self.search_criteria.min_bedrooms}-_beds")
81 else:
82 bedroom_filter = "beds"
83 if self.search_criteria.min_bedrooms > 0:
84 bedroom_filter += f"_gte-{self.search_criteria.min_bedrooms}"
85 if self.search_criteria.max_bedrooms:
86 bedroom_filter += f"_lte-{self.search_criteria.max_bedrooms}"
87 filters.append(bedroom_filter)
88
89 # Property type filter
90 if property_type:
91 filters.append(f"type-{property_type}")
92
93 # Assemble the URL with filters
94 if filters:
95 filter_string = "/".join(filters)
96 url = f"{base_url}/{filter_string}"
97 else:
98 url = base_url
99
100 return {
101 "startUrls": [{"url": url}],
102 "maxPages": 10,
103 "includeRental": self.search_criteria.search_type == "rent",
104 "includeSale": self.search_criteria.search_type == "buy",
105 "includeAuction": False,
106 "proxy": {
107 "useApifyProxy": True,
108 "apifyProxyGroups": ["RESIDENTIAL"]
109 }
110 }
111
112 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
113 """Transform a Zillow listing to a PropertyListing.
114
115 Args:
116 item: Zillow listing
117
118 Returns:
119 PropertyListing
120 """
121 # Parse price
122 price_str = item.get("price", "0")
123 if isinstance(price_str, str):
124 # Remove currency symbols and commas
125 price_str = re.sub(r'[^\d.]', '', price_str)
126 price = float(price_str) if price_str else 0
127 else:
128 price = float(price_str)
129
130 # Parse address
131 address_str = item.get("address", "")
132 address = self.parse_address(address_str)
133
134 # Parse bedrooms
135 bedrooms_str = item.get("bedrooms", "0")
136 if isinstance(bedrooms_str, str):
137 bedroom_match = re.search(r'(\d+\.?\d*)', bedrooms_str)
138 bedrooms = float(bedroom_match.group(1)) if bedroom_match else 0
139 else:
140 bedrooms = float(bedrooms_str) if bedrooms_str else 0
141
142 # Parse bathrooms
143 bathrooms_str = item.get("bathrooms", None)
144 if bathrooms_str:
145 if isinstance(bathrooms_str, str):
146 bathroom_match = re.search(r'(\d+\.?\d*)', bathrooms_str)
147 bathrooms = float(bathroom_match.group(1)) if bathroom_match else None
148 else:
149 bathrooms = float(bathrooms_str)
150 else:
151 bathrooms = None
152
153 # Parse square feet
154 sqft_str = item.get("livingArea", None)
155 if sqft_str:
156 if isinstance(sqft_str, str):
157 # Remove non-digit characters
158 sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))
159 sqft = int(sqft_match.group(1)) if sqft_match else None
160 else:
161 sqft = int(sqft_str)
162 else:
163 sqft = None
164
165 # Extract amenities
166 amenities = self.extract_amenities(item)
167
168 # Get property type
169 property_type = item.get("homeType", "").lower()
170 if not property_type:
171 # Try to infer from description or facts
172 if "apartment" in item.get("description", "").lower():
173 property_type = "apartment"
174 elif "condo" in item.get("description", "").lower():
175 property_type = "condo"
176 elif "house" in item.get("description", "").lower():
177 property_type = "house"
178 elif "townhouse" in item.get("description", "").lower() or "town house" in item.get("description", "").lower():
179 property_type = "townhouse"
180 else:
181 property_type = "unknown"
182
183 # Get listing URL
184 url = item.get("url", "")
185 if not url.startswith("http"):
186 url = f"https://www.zillow.com{url}"
187
188 # Get images
189 images = []
190 if "images" in item and isinstance(item["images"], list):
191 for img in item["images"]:
192 if isinstance(img, str) and img.startswith("http"):
193 images.append(img)
194
195 # Generate a unique ID
196 property_id = str(item.get("zpid", uuid.uuid4()))
197
198 # Extract any additional features
199 features = {}
200 for key, value in item.items():
201 if key not in [
202 "price", "address", "bedrooms", "bathrooms", "livingArea",
203 "homeType", "description", "url", "images", "zpid", "amenities"
204 ]:
205 features[key] = value
206
207 return PropertyListing(
208 id=property_id,
209 title=item.get("streetAddress", "Property Listing"),
210 description=item.get("description", None),
211 price=price,
212 address=address,
213 bedrooms=bedrooms,
214 bathrooms=bathrooms,
215 square_feet=sqft,
216 property_type=property_type,
217 url=url,
218 source="zillow",
219 amenities=amenities,
220 images=images,
221 features=features
222 )
src/utils/__init__.py
1"""Utility functions for Listing Sleuth."""
src/utils/llm.py
1"""LLM utility functions for Listing Sleuth."""
2
3import os
4from typing import List, Dict, Any, Optional
5from langchain_openai import ChatOpenAI
6from langchain.prompts import ChatPromptTemplate
7from langchain.output_parsers import PydanticOutputParser
8from langchain.schema import Document
9
10from ..models.property import PropertyListing, SearchCriteria
11
12
13def get_llm(api_token: Optional[str] = None) -> ChatOpenAI:
14 """Get LLM client.
15
16 Args:
17 api_token: OpenAI API token. If None, tries to get from environment.
18
19 Returns:
20 ChatOpenAI instance
21
22 Raises:
23 ValueError: If API token is not provided and not found in environment.
24 """
25 token = api_token or os.environ.get("OPENAI_API_KEY")
26 if not token:
27 raise ValueError(
28 "OpenAI API token not provided. Please provide a token in the input "
29 "or set the OPENAI_API_KEY environment variable."
30 )
31
32 return ChatOpenAI(
33 api_key=token,
34 model="gpt-3.5-turbo",
35 temperature=0
36 )
37
38
39def filter_properties_with_llm(
40 properties: List[PropertyListing],
41 search_criteria: SearchCriteria,
42 api_token: Optional[str] = None
43) -> List[PropertyListing]:
44 """Filter properties with LLM based on search criteria.
45
46 Args:
47 properties: List of property listings
48 search_criteria: Search criteria
49 api_token: OpenAI API token
50
51 Returns:
52 Filtered list of property listings
53 """
54 if not properties:
55 return []
56
57 if not api_token and not search_criteria.llm_api_token:
58 # Without a token, just do basic filtering
59 return properties
60
61 llm = get_llm(api_token or search_criteria.llm_api_token)
62 parser = PydanticOutputParser(pydantic_object=PropertyListing)
63
64 template = """
65 You are an AI assistant helping to filter real estate listings based on specific criteria.
66
67 The user is looking for the following:
68 - Location: {location}
69 - Property type: {property_type}
70 - Price range: ${min_price} - ${max_price} (0 means no minimum, None means no maximum)
71 - Bedrooms: {min_bedrooms} - {max_bedrooms} (None means no maximum)
72 - Desired amenities: {amenities}
73
74 For each property, evaluate how well it fits the criteria, with special attention to amenities
75 and any specific requirements. Return the property object unmodified if it's a good match,
76 filtering out properties that don't meet the criteria.
77
78 Here are the properties to evaluate:
79 {properties}
80
81 If the user mentioned any amenities, prioritize properties with those amenities.
82 """
83
84 # Process in smaller batches to avoid token limits
85 batch_size = 5
86 filtered_properties = []
87
88 for i in range(0, len(properties), batch_size):
89 batch = properties[i:i+batch_size]
90
91 prompt = ChatPromptTemplate.from_template(template)
92 chain = prompt | llm
93
94 # Simplify property objects for LLM consumption
95 simplified_batch = [
96 {
97 "id": p.id,
98 "title": p.title,
99 "price": p.price,
100 "bedrooms": p.bedrooms,
101 "bathrooms": p.bathrooms,
102 "property_type": p.property_type,
103 "address": str(p.address),
104 "amenities": p.amenities,
105 "description": p.description,
106 "url": str(p.url)
107 }
108 for p in batch
109 ]
110
111 result = chain.invoke({
112 "location": search_criteria.location,
113 "property_type": search_criteria.property_type,
114 "min_price": search_criteria.min_price,
115 "max_price": search_criteria.max_price,
116 "min_bedrooms": search_criteria.min_bedrooms,
117 "max_bedrooms": search_criteria.max_bedrooms,
118 "amenities": search_criteria.amenities,
119 "properties": simplified_batch
120 })
121
122 # Extract property IDs that the LLM determined to be good matches
123 response_text = result.content
124 passing_ids = []
125
126 # Simple parsing of response - in production, this would be more robust
127 for line in response_text.split("\n"):
128 if "id:" in line and "good match" in line.lower():
129 try:
130 id_part = line.split("id:")[1].strip()
131 property_id = id_part.split()[0].strip(",")
132 passing_ids.append(property_id)
133 except IndexError:
134 continue
135
136 # Add matching properties to filtered list
137 for p in batch:
138 if p.id in passing_ids:
139 filtered_properties.append(p)
140
141 return filtered_properties
142
143
144def summarize_property(
145 property_listing: PropertyListing,
146 api_token: Optional[str] = None
147) -> str:
148 """Generate a natural language summary of a property.
149
150 Args:
151 property_listing: Property listing to summarize
152 api_token: OpenAI API token
153
154 Returns:
155 Summary of property
156 """
157 try:
158 llm = get_llm(api_token)
159 except ValueError:
160 # Fall back to basic summary if no API token
161 return (
162 f"{property_listing.title}: {property_listing.bedrooms} bed, "
163 f"{property_listing.bathrooms or 'unknown'} bath {property_listing.property_type} "
164 f"for ${property_listing.price:,.2f} in {property_listing.address.city}, "
165 f"{property_listing.address.state}."
166 )
167
168 template = """
169 Create a concise, appealing summary of this property listing in one paragraph:
170
171 Title: {title}
172 Price: ${price}
173 Address: {address}
174 Property type: {property_type}
175 Bedrooms: {bedrooms}
176 Bathrooms: {bathrooms}
177 Square feet: {square_feet}
178 Amenities: {amenities}
179 Description: {description}
180
181 Keep the summary brief but informative, highlighting key selling points.
182 """
183
184 prompt = ChatPromptTemplate.from_template(template)
185 chain = prompt | llm
186
187 result = chain.invoke({
188 "title": property_listing.title,
189 "price": f"{property_listing.price:,.2f}",
190 "address": str(property_listing.address),
191 "property_type": property_listing.property_type,
192 "bedrooms": property_listing.bedrooms,
193 "bathrooms": property_listing.bathrooms or "unknown",
194 "square_feet": property_listing.square_feet or "unknown",
195 "amenities": ", ".join(property_listing.amenities) or "none specified",
196 "description": property_listing.description or "No description provided"
197 })
198
199 return result.content.strip()
src/utils/storage.py
1"""Storage utility functions for Listing Sleuth."""
2
3import json
4import os
5from typing import Dict, List, Optional, Any, Union
6from datetime import datetime
7from pydantic import BaseModel
8from apify import Actor
9
10from ..models.property import PropertyListing, SearchResults, SearchCriteria
11
12
13def save_search_results(results: SearchResults) -> None:
14 """Save search results to Apify key-value store.
15
16 Args:
17 results: Search results to save
18 """
19 # Convert results to dict for storage
20 results_dict = results.model_dump()
21
22 # Convert datetime objects to ISO format strings
23 results_dict["search_date"] = results_dict["search_date"].isoformat()
24 for i, result in enumerate(results_dict["results"]):
25 if result.get("listed_date"):
26 results_dict["results"][i]["listed_date"] = result["listed_date"].isoformat()
27
28 try:
29 # Save to Apify key-value store if in production
30 if hasattr(Actor, 'main_kv_store'):
31 Actor.main_kv_store.set_value("search_results", results_dict)
32
33 # Also save the individual listings separately for easier access
34 for listing in results.results:
35 Actor.main_kv_store.set_value(f"listing_{listing.id}", listing.model_dump())
36 else:
37 # Local testing - save to a local file
38 Actor.log.info("Running in local mode, saving to local file")
39 os.makedirs("storage/key_value_stores", exist_ok=True)
40 with open("storage/key_value_stores/search_results.json", "w") as f:
41 json.dump(results_dict, f)
42 except Exception as e:
43 Actor.log.error(f"Error saving search results: {e}")
44
45
46def load_previous_results(search_criteria: SearchCriteria) -> Optional[SearchResults]:
47 """Load previous search results from Apify key-value store.
48
49 Args:
50 search_criteria: Current search criteria, to compare with previous search
51
52 Returns:
53 Previous search results, or None if no previous results or criteria changed
54 """
55 # Try to get previous results
56 try:
57 results_dict = None
58
59 # Try to load from Apify KV store first
60 if hasattr(Actor, 'main_kv_store'):
61 results_dict = Actor.main_kv_store.get_value("search_results")
62
63 # If not found or in local mode, try loading from local file
64 if not results_dict and os.path.exists("storage/key_value_stores/search_results.json"):
65 Actor.log.info("Loading from local file")
66 with open("storage/key_value_stores/search_results.json", "r") as f:
67 results_dict = json.load(f)
68
69 if not results_dict:
70 return None
71
72 # Parse dates
73 results_dict["search_date"] = datetime.fromisoformat(results_dict["search_date"])
74 for i, result in enumerate(results_dict["results"]):
75 if result.get("listed_date"):
76 results_dict["results"][i]["listed_date"] = datetime.fromisoformat(
77 result["listed_date"]
78 )
79
80 # Convert back to model
81 previous_results = SearchResults(**results_dict)
82
83 # Check if search criteria has changed
84 prev_criteria = previous_results.search_criteria
85 if (
86 prev_criteria.location != search_criteria.location
87 or prev_criteria.property_type != search_criteria.property_type
88 or prev_criteria.min_bedrooms != search_criteria.min_bedrooms
89 or prev_criteria.max_bedrooms != search_criteria.max_bedrooms
90 or prev_criteria.min_price != search_criteria.min_price
91 or prev_criteria.max_price != search_criteria.max_price
92 or prev_criteria.search_type != search_criteria.search_type
93 or set(prev_criteria.sources) != set(search_criteria.sources)
94 # Amenities might be in different order but same content
95 or set(prev_criteria.amenities) != set(search_criteria.amenities)
96 ):
97 # Criteria changed, don't use previous results
98 return None
99
100 return previous_results
101
102 except Exception as e:
103 Actor.log.error(f"Error loading previous results: {e}")
104 return None
105
106
107def mark_new_listings(
108 current_results: List[PropertyListing],
109 previous_results: Optional[SearchResults]
110) -> List[PropertyListing]:
111 """Mark new listings in current results compared to previous results.
112
113 Args:
114 current_results: Current property listings
115 previous_results: Previous search results, or None if no previous results
116
117 Returns:
118 Updated current property listings with is_new flag set
119 """
120 if not previous_results:
121 # If no previous results, all are new
122 for listing in current_results:
123 listing.is_new = True
124 return current_results
125
126 # Get IDs of previous listings
127 previous_ids = {listing.id for listing in previous_results.results}
128
129 # Mark new listings
130 for listing in current_results:
131 if listing.id not in previous_ids:
132 listing.is_new = True
133
134 return current_results
135
136
137def push_results_to_dataset(results: SearchResults) -> None:
138 """Push search results to Apify dataset.
139
140 Args:
141 results: Search results to push
142 """
143 # Convert to simple dicts for the dataset
144 listings_data = []
145 for listing in results.results:
146 listing_dict = listing.model_dump()
147 # Convert complex types to strings for better compatibility
148 listing_dict["address"] = str(listing.address)
149 if listing.listed_date:
150 listing_dict["listed_date"] = listing.listed_date.isoformat()
151 listings_data.append(listing_dict)
152
153 try:
154 # Push each listing as a separate item
155 Actor.push_data(listings_data)
156 except Exception as e:
157 Actor.log.error(f"Error pushing data to dataset: {e}")
158 # In local mode, save to local file
159 try:
160 os.makedirs("storage/datasets/default", exist_ok=True)
161 with open("storage/datasets/default/results.json", "w") as f:
162 json.dump(listings_data, f)
163 Actor.log.info("Saved results to local file")
164 except Exception as e2:
165 Actor.log.error(f"Error saving to local file: {e2}")
src/models/__init__.py
1"""Models for Listing Sleuth."""
src/models/property.py
1"""Property data models for Listing Sleuth."""
2
3from typing import List, Optional, Dict, Any
4from pydantic import BaseModel, Field, HttpUrl
5from datetime import datetime
6
7
8class Address(BaseModel):
9 """Model for property address."""
10
11 street: Optional[str] = None
12 city: str
13 state: str
14 zip_code: Optional[str] = None
15 country: str = "United States"
16
17 def __str__(self) -> str:
18 """Return string representation of address."""
19 parts = []
20 if self.street:
21 parts.append(self.street)
22 parts.append(f"{self.city}, {self.state}")
23 if self.zip_code:
24 parts.append(self.zip_code)
25 return ", ".join(parts)
26
27
28class PropertyListing(BaseModel):
29 """Model for property listing data."""
30
31 id: str
32 title: str
33 description: Optional[str] = None
34 price: float
35 address: Address
36 bedrooms: float
37 bathrooms: Optional[float] = None
38 square_feet: Optional[int] = None
39 property_type: str
40 url: HttpUrl
41 source: str
42 amenities: List[str] = Field(default_factory=list)
43 images: List[HttpUrl] = Field(default_factory=list)
44 listed_date: Optional[datetime] = None
45 is_new: bool = False # Flag for new listings since last search
46 features: Dict[str, Any] = Field(default_factory=dict) # Additional property features
47
48 class Config:
49 """Pydantic config."""
50
51 extra = "ignore"
52
53
54class SearchCriteria(BaseModel):
55 """Model for search criteria."""
56
57 location: str
58 property_type: str = "any"
59 min_bedrooms: int = 0
60 max_bedrooms: Optional[int] = None
61 min_price: float = 0
62 max_price: Optional[float] = None
63 amenities: List[str] = Field(default_factory=list)
64 search_type: str = "rent"
65 sources: List[str] = Field(default=["zillow", "realtor", "apartments"])
66 llm_api_token: Optional[str] = None
67
68
69class SearchResults(BaseModel):
70 """Model for search results."""
71
72 search_criteria: SearchCriteria
73 results: List[PropertyListing] = Field(default_factory=list)
74 total_results: int = 0
75 new_results: int = 0
76 search_date: datetime = Field(default_factory=datetime.now)
77 sources_searched: List[str] = Field(default_factory=list)