Listing Sleuth
Deprecated
Pricing
Pay per usage
Go to Store
Listing Sleuth
Deprecated
An agentic real estate listing monitor that helps users find properties that match their specific criteria. This agent scrapes data from popular real estate platforms such as Zillow, Realtor.com, and Apartments.com to provide up-to-date information on available properties.
0.0 (0)
Pricing
Pay per usage
0
Total users
1
Monthly users
1
Runs succeeded
0%
Last modified
2 months ago
.dockerignore
.git.mise.toml.nvim.luastorage
# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore
# Byte-compiled / optimized / DLL files__pycache__/*.py[cod]*$py.class
# C extensions*.so
# Distribution / packaging.Pythonbuild/develop-eggs/dist/downloads/eggs/.eggs/lib/lib64/parts/sdist/var/wheels/share/python-wheels/*.egg-info/.installed.cfg*.eggMANIFEST
# PyInstaller# Usually these files are written by a python script from a template# before PyInstaller builds the exe, so as to inject date/other infos into it.*.manifest*.spec
# Installer logspip-log.txtpip-delete-this-directory.txt
# Unit test / coverage reportshtmlcov/.tox/.nox/.coverage.coverage.*.cachenosetests.xmlcoverage.xml*.cover*.py,cover.hypothesis/.pytest_cache/cover/
# Translations*.mo*.pot
# Django stuff:*.loglocal_settings.pydb.sqlite3db.sqlite3-journal
# Flask stuff:instance/.webassets-cache
# Scrapy stuff:.scrapy
# Sphinx documentationdocs/_build/
# PyBuilder.pybuilder/target/
# Jupyter Notebook.ipynb_checkpoints
# IPythonprofile_default/ipython_config.py
# pyenv# For a library or package, you might want to ignore these files since the code is# intended to run in multiple environments; otherwise, check them in:.python-version
# pdm# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.#pdm.lock# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it# in version control.# https://pdm.fming.dev/latest/usage/project/#working-with-version-control.pdm.toml.pdm-python.pdm-build/
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm__pypackages__/
# Celery stuffcelerybeat-schedulecelerybeat.pid
# SageMath parsed files*.sage.py
# Environments.env.venvenv/venv/ENV/env.bak/venv.bak/
# Spyder project settings.spyderproject.spyproject
# Rope project settings.ropeproject
# mkdocs documentation/site
# mypy.mypy_cache/.dmypy.jsondmypy.json
# Pyre type checker.pyre/
# pytype static type analyzer.pytype/
# Cython debug symbolscython_debug/
# PyCharm# JetBrains specific template is maintained in a separate JetBrains.gitignore that can# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore# and can be added to the global gitignore or merged into this file. For a more nuclear# option (not recommended) you can uncomment the following to ignore the entire idea folder..idea/
.gitignore
.mise.toml.nvim.luastorage
# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore
# Byte-compiled / optimized / DLL files__pycache__/*.py[cod]*$py.class
# C extensions*.so
# Distribution / packaging.Pythonbuild/develop-eggs/dist/downloads/eggs/.eggs/lib/lib64/parts/sdist/var/wheels/share/python-wheels/*.egg-info/.installed.cfg*.eggMANIFEST
# PyInstaller# Usually these files are written by a python script from a template# before PyInstaller builds the exe, so as to inject date/other infos into it.*.manifest*.spec
# Installer logspip-log.txtpip-delete-this-directory.txt
# Unit test / coverage reportshtmlcov/.tox/.nox/.coverage.coverage.*.cachenosetests.xmlcoverage.xml*.cover*.py,cover.hypothesis/.pytest_cache/cover/
# Translations*.mo*.pot
# Django stuff:*.loglocal_settings.pydb.sqlite3db.sqlite3-journal
# Flask stuff:instance/.webassets-cache
# Scrapy stuff:.scrapy
# Sphinx documentationdocs/_build/
# PyBuilder.pybuilder/target/
# Jupyter Notebook.ipynb_checkpoints
# IPythonprofile_default/ipython_config.py
# pyenv# For a library or package, you might want to ignore these files since the code is# intended to run in multiple environments; otherwise, check them in:.python-version
# pdm# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.#pdm.lock# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it# in version control.# https://pdm.fming.dev/latest/usage/project/#working-with-version-control.pdm.toml.pdm-python.pdm-build/
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm__pypackages__/
# Celery stuffcelerybeat-schedulecelerybeat.pid
# SageMath parsed files*.sage.py
# Environments.env.venvenv/venv/ENV/env.bak/venv.bak/
# Spyder project settings.spyderproject.spyproject
# Rope project settings.ropeproject
# mkdocs documentation/site
# mypy.mypy_cache/.dmypy.jsondmypy.json
# Pyre type checker.pyre/
# pytype static type analyzer.pytype/
# Cython debug symbolscython_debug/
# PyCharm# JetBrains specific template is maintained in a separate JetBrains.gitignore that can# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore# and can be added to the global gitignore or merged into this file. For a more nuclear# option (not recommended) you can uncomment the following to ignore the entire idea folder..idea/
# Added by Apify CLInode_modules
INPUT.json
{ "location": "San Francisco, CA", "propertyType": "apartment", "minBedrooms": 2, "maxBedrooms": 3, "minPrice": 1500, "maxPrice": 3000, "amenities": ["parking", "gym"], "searchType": "rent", "sources": ["zillow", "apartments"]}
LICENSE
MIT License
Copyright (c) 2024 Listing Sleuth
Permission is hereby granted, free of charge, to any person obtaining a copyof this software and associated documentation files (the "Software"), to dealin the Software without restriction, including without limitation the rightsto use, copy, modify, merge, publish, distribute, sublicense, and/or sellcopies of the Software, and to permit persons to whom the Software isfurnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHERLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THESOFTWARE.
requirements.txt
1apify < 3.02langchain-openai==0.3.63langgraph==0.2.734aiohttp>=3.8.05langchain>=0.1.06pydantic>=2.0.07langchain-core>=0.1.08langchain_community==0.3.19
.actor/Dockerfile
# First, specify the base Docker image.# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.# You can also use any other image from Docker Hub.FROM apify/actor-python-playwright:3.13
# Install build dependencies firstRUN apt-get update && apt-get install -y build-essential gcc g++ python3-dev
# Second, copy just requirements.txt into the Actor image,# since it should be the only file that affects the dependency install in the next step,# in order to speed up the buildCOPY requirements.txt ./
# Install the packages specified in requirements.txt,# Print the installed Python version, pip version# and all installed packages with their versions for debuggingRUN echo "Python version:" \ && python --version \ && echo "Pip version:" \ && pip --version \ && echo "Installing dependencies:" \ && pip install --only-binary=:all: -r requirements.txt \ && echo "All installed Python packages:" \ && pip freeze
# Next, copy the remaining files and directories with the source code.# Since we do this after installing the dependencies, quick build will be really fast# for most source file changes.COPY . ./
# Use compileall to ensure the runnability of the Actor Python code.RUN python3 -m compileall -q .
# Specify how to launch the source code of your Actor.# By default, the "python3 -m src" command is runCMD ["python3", "-m", "src"]
.actor/actor.json
{ "actorSpecification": 1, "name": "listing-sleuth", "title": "Listing Sleuth - Real Estate Monitor", "description": "Monitors real estate listings across multiple platforms based on user-specified criteria", "version": "0.1", "buildTag": "latest", "restart": { "horizontalScaling": true }, "dockerfile": "./Dockerfile", "input": "./input_schema.json", "storages": { "dataset": "./dataset_schema.json" }, "license": "MIT", "monetization": { "type": "pay-per-event", "enabled": true, "priceSchemaPath": "./pay_per_event.json" }}
.actor/dataset_schema.json
{ "actorSpecification": 1, "fields": { "type": "object", "properties": { "id": { "type": "string", "description": "Unique identifier for the property listing" }, "title": { "type": "string", "description": "Property title or name" }, "description": { "type": "string", "description": "Detailed description of the property" }, "price": { "type": "number", "description": "Price of the property (in USD)" }, "bedrooms": { "type": "number", "description": "Number of bedrooms" }, "bathrooms": { "type": "number", "description": "Number of bathrooms" }, "address": { "type": "string", "description": "Property address" }, "property_type": { "type": "string", "description": "Type of property (apartment, house, condo, etc.)" }, "source": { "type": "string", "description": "Source of the listing (zillow, realtor, apartments, etc.)" }, "url": { "type": "string", "description": "Link to the original listing" }, "amenities": { "type": "array", "description": "List of amenities available at the property", "items": { "type": "string" } }, "listed_date": { "type": "string", "description": "Date when the property was listed" }, "is_new": { "type": "boolean", "description": "Whether this is a new listing since last search" } } }, "views": { "overview": { "title": "Property Listings", "description": "Real estate property listings matching the search criteria", "transformation": { "fields": [ "id", "title", "price", "bedrooms", "bathrooms", "address", "property_type", "source", "url", "listed_date", "is_new" ] }, "display": { "component": "table", "properties": { "id": { "label": "ID", "format": "text" }, "title": { "label": "Title", "format": "text" }, "price": { "label": "Price", "format": "number" }, "bedrooms": { "label": "Bedrooms", "format": "number" }, "bathrooms": { "label": "Bathrooms", "format": "number" }, "address": { "label": "Address", "format": "text" }, "property_type": { "label": "Property Type", "format": "text" }, "source": { "label": "Source", "format": "text" }, "url": { "label": "URL", "format": "link" }, "listed_date": { "label": "Listed Date", "format": "date" }, "is_new": { "label": "New Listing", "format": "boolean" } } } }, "details": { "title": "Detailed View", "description": "Detailed information about property listings", "transformation": { "fields": [ "id", "title", "description", "price", "bedrooms", "bathrooms", "address", "property_type", "source", "url", "amenities", "listed_date", "is_new" ] }, "display": { "component": "table", "properties": { "description": { "label": "Description", "format": "text" }, "amenities": { "label": "Amenities", "format": "array" } } } } }}
.actor/input_schema.json
{ "title": "Listing Sleuth - Real Estate Monitor", "type": "object", "schemaVersion": 1, "properties": { "location": { "title": "Location", "type": "string", "description": "City or neighborhood to search in (e.g., 'San Francisco, CA')", "editor": "textfield" }, "propertyType": { "title": "Property Type", "type": "string", "description": "Type of property to look for", "enum": ["apartment", "house", "condo", "townhouse", "any"], "enumTitles": ["Apartment", "House", "Condo", "Townhouse", "Any"], "default": "any", "editor": "select" }, "minBedrooms": { "title": "Minimum Bedrooms", "type": "integer", "description": "Minimum number of bedrooms", "default": 1, "minimum": 0, "editor": "number" }, "maxBedrooms": { "title": "Maximum Bedrooms", "type": "integer", "description": "Maximum number of bedrooms (leave blank for no maximum)", "minimum": 0, "nullable": true, "editor": "number" }, "minPrice": { "title": "Minimum Price", "type": "integer", "description": "Minimum price (in USD)", "default": 0, "minimum": 0, "editor": "number" }, "maxPrice": { "title": "Maximum Price", "type": "integer", "description": "Maximum price (in USD)", "minimum": 0, "nullable": true, "editor": "number" }, "amenities": { "title": "Amenities", "type": "array", "description": "Desired amenities for the property", "editor": "stringList", "default": [] }, "searchType": { "title": "Search Type", "type": "string", "description": "Type of search to perform", "enum": ["rent", "buy"], "enumTitles": ["Rent", "Buy"], "default": "rent", "editor": "select" }, "sources": { "title": "Data Sources", "type": "array", "description": "Sources to search for listings", "editor": "stringList", "default": ["zillow", "realtor", "apartments"] }, "llmApiToken": { "title": "LLM API Token", "type": "string", "description": "OpenAI API token for processing results (optional)", "editor": "textfield", "nullable": true } }, "required": ["location"]}
.actor/pay_per_event.json
{ "actor-start": { "eventTitle": "Search Initiated", "eventDescription": "Flat fee for starting a real estate search.", "eventPriceUsd": 0.1 }, "property-found": { "eventTitle": "Property Found", "eventDescription": "Fee for each property matching your criteria.", "eventPriceUsd": 0.05 }, "search-completed": { "eventTitle": "Search Completed", "eventDescription": "Fee for completing a full property search across all selected platforms.", "eventPriceUsd": 0.3 }}
src/__init__.py
1
src/__main__.py
1import asyncio2
3from .main import main4
5# Execute the Actor entry point.6asyncio.run(main())
src/main.py
1"""Main entry point for the Listing Sleuth Apify Actor.2
3This module contains the main entry point for the Actor, which searches for real estate4listings based on user-specified criteria.5"""6
7import os8import sys9import json10from apify import Actor11from dotenv import load_dotenv12
13from .models.property import SearchCriteria14from .search_agent import SearchAgentCrew15
16# Load environment variables from .env file if present17load_dotenv()18
19
20async def main() -> None:21 """Main entry point for the Apify Actor.22 23 This function initializes the Actor, processes input data, runs the search agent,24 and saves the results to the Actor's dataset.25 """26 # Enter the context of the Actor.27 async with Actor:28 # Log the Actor's version29 Actor.log.info(f"Listing Sleuth is starting...")30 31 # Charge for actor start32 await Actor.charge('actor-start')33 34 # Retrieve the Actor input, and use default values if not provided.35 actor_input = await Actor.get_input() or {}36 37 # For local testing, try to load from INPUT.json if actor_input is empty38 if not actor_input or 'location' not in actor_input:39 try:40 if os.path.exists('INPUT.json'):41 with open('INPUT.json', 'r') as f:42 actor_input = json.load(f)43 Actor.log.info(f"Loaded input from INPUT.json: {actor_input}")44 except Exception as e:45 Actor.log.error(f"Error loading from INPUT.json: {str(e)}")46 47 Actor.log.info(f"Using input: {actor_input}")48 49 # Parse location (required)50 location = actor_input.get("location")51 if not location:52 Actor.log.error("No location specified in Actor input, exiting...")53 # Just exit with an error code54 sys.exit(1)55 56 # Parse other inputs with defaults57 property_type = actor_input.get("propertyType", "any")58 min_bedrooms = int(actor_input.get("minBedrooms", 1))59 max_bedrooms = actor_input.get("maxBedrooms")60 if max_bedrooms is not None:61 max_bedrooms = int(max_bedrooms)62 63 min_price = float(actor_input.get("minPrice", 0))64 max_price = actor_input.get("maxPrice")65 if max_price is not None:66 max_price = float(max_price)67 68 # Amenities as a list69 amenities = actor_input.get("amenities", [])70 71 # Search type (rent/buy)72 search_type = actor_input.get("searchType", "rent")73 74 # Data sources to search75 sources = actor_input.get("sources", ["zillow", "realtor", "apartments"])76 77 # LLM API token (optional)78 llm_api_token = actor_input.get("llmApiToken") or os.environ.get("OPENAI_API_KEY")79 80 # Create search criteria81 search_criteria = SearchCriteria(82 location=location,83 property_type=property_type,84 min_bedrooms=min_bedrooms,85 max_bedrooms=max_bedrooms,86 min_price=min_price,87 max_price=max_price,88 amenities=amenities,89 search_type=search_type,90 sources=sources,91 llm_api_token=llm_api_token92 )93 94 Actor.log.info(f"Search criteria: {search_criteria}")95 96 # Create and run the search agent97 search_agent = SearchAgentCrew(search_criteria)98 results = search_agent.run()99 100 # Charge for each property found101 if results.total_results > 0:102 await Actor.charge('property-found', count=results.total_results)103 104 # Log results105 Actor.log.info(f"Search complete. Found {results.total_results} properties.")106 Actor.log.info(f"New listings: {results.new_results}")107 108 # Charge for search completion109 await Actor.charge('search-completed')110 111 # The results have already been saved to the dataset by the search agent
src/py.typed
src/search_agent.py
1"""Search agent for real estate properties."""2
3import os4import json5from typing import List, Dict, Any, Optional, Tuple6from datetime import datetime7from crewai import Agent, Task, Crew8from langchain.tools import BaseTool9from langchain_openai import ChatOpenAI10from apify import Actor11
12from .models.property import PropertyListing, SearchCriteria, SearchResults13from .scrapers.zillow import ZillowScraper14from .scrapers.realtor import RealtorScraper15from .scrapers.apartments import ApartmentsScraper16from .utils.llm import filter_properties_with_llm, summarize_property17from .utils.storage import (18 load_previous_results,19 mark_new_listings,20 save_search_results,21 push_results_to_dataset22)23
24
25class SearchTool(BaseTool):26 """Tool for searching real estate listings."""27 28 name = "search_real_estate"29 description = "Search for real estate listings based on search criteria"30 search_criteria: SearchCriteria = None31 32 def __init__(self, search_criteria: SearchCriteria):33 """Initialize the search tool.34 35 Args:36 search_criteria: Search criteria37 """38 super().__init__()39 self.search_criteria = search_criteria40 41 def _run(self, query: str) -> Dict[str, Any]:42 """Run the search tool.43 44 Args:45 query: Search query (not used, but required by BaseTool)46 47 Returns:48 Search results49 """50 # Initialize scrapers51 scrapers = []52 if "zillow" in self.search_criteria.sources:53 scrapers.append(ZillowScraper(self.search_criteria))54 if "realtor" in self.search_criteria.sources:55 scrapers.append(RealtorScraper(self.search_criteria))56 if "apartments" in self.search_criteria.sources:57 scrapers.append(ApartmentsScraper(self.search_criteria))58 59 # Run scrapers60 all_listings = []61 sources_searched = []62 63 for scraper in scrapers:64 try:65 listings = scraper.scrape()66 all_listings.extend(listings)67 sources_searched.append(scraper.source_name)68 except Exception as e:69 Actor.log.exception(f"Error scraping {scraper.source_name}: {e}")70 71 # Load previous results72 previous_results = load_previous_results(self.search_criteria)73 74 # Mark new listings75 marked_listings = mark_new_listings(all_listings, previous_results)76 77 # Create search results78 results = SearchResults(79 search_criteria=self.search_criteria,80 results=marked_listings,81 total_results=len(marked_listings),82 new_results=sum(1 for listing in marked_listings if listing.is_new),83 sources_searched=sources_searched84 )85 86 # Save results87 save_search_results(results)88 push_results_to_dataset(results)89 90 # Return results91 return {92 "total_results": results.total_results,93 "new_results": results.new_results,94 "sources_searched": results.sources_searched,95 "search_date": results.search_date.isoformat()96 }97 98 async def _arun(self, query: str) -> Dict[str, Any]:99 """Async version of _run.100 101 Args:102 query: Search query103 104 Returns:105 Search results106 """107 return self._run(query)108
109
110class FilterTool(BaseTool):111 """Tool for filtering property listings with LLM."""112 113 name = "filter_properties"114 description = "Filter property listings based on search criteria using LLM"115 search_criteria: SearchCriteria = None116 117 def __init__(self, search_criteria: SearchCriteria):118 """Initialize the filter tool.119 120 Args:121 search_criteria: Search criteria122 """123 super().__init__()124 self.search_criteria = search_criteria125 126 def _run(self, query: str) -> Dict[str, Any]:127 """Run the filter tool.128 129 Args:130 query: Filter query (not used, but required by BaseTool)131 132 Returns:133 Filtered search results134 """135 # Try to load saved results136 try:137 results_dict = None138 139 # Try to load from Apify KV store if available140 if hasattr(Actor, 'main_kv_store'):141 results_dict = Actor.main_kv_store.get_value("search_results")142 # Otherwise try to load from local file143 elif os.path.exists("storage/key_value_stores/search_results.json"):144 with open("storage/key_value_stores/search_results.json", "r") as f:145 results_dict = json.load(f)146 147 if not results_dict:148 return {"error": "No search results found"}149 150 # Convert to SearchResults151 search_results = SearchResults(**results_dict)152 153 if not search_results.results:154 return {"error": "No results to filter"}155 156 # Filter results with LLM if token is available157 if self.search_criteria.llm_api_token:158 filtered_listings = filter_properties_with_llm(159 search_results.results,160 self.search_criteria,161 self.search_criteria.llm_api_token162 )163 164 # Update results165 search_results.results = filtered_listings166 search_results.total_results = len(filtered_listings)167 168 # Save filtered results169 save_search_results(search_results)170 171 return {172 "total_results_after_filtering": len(filtered_listings),173 "filter_date": datetime.now().isoformat()174 }175 else:176 return {"error": "No LLM API token provided for filtering"}177 178 except Exception as e:179 Actor.log.exception(f"Error filtering properties: {e}")180 return {"error": str(e)}181 182 async def _arun(self, query: str) -> Dict[str, Any]:183 """Async version of _run.184 185 Args:186 query: Filter query187 188 Returns:189 Filtered search results190 """191 return self._run(query)192
193
194class SummarizeTool(BaseTool):195 """Tool for summarizing property listings."""196 197 name = "summarize_properties"198 description = "Generate summaries of property listings"199 search_criteria: SearchCriteria = None200 201 def __init__(self, search_criteria: SearchCriteria):202 """Initialize the summarize tool.203 204 Args:205 search_criteria: Search criteria206 """207 super().__init__()208 self.search_criteria = search_criteria209 210 def _run(self, query: str) -> Dict[str, Any]:211 """Run the summarize tool.212 213 Args:214 query: Summarize query (not used, but required by BaseTool)215 216 Returns:217 Summarized search results218 """219 # Try to load saved results220 try:221 results_dict = None222 223 # Try to load from Apify KV store if available224 if hasattr(Actor, 'main_kv_store'):225 results_dict = Actor.main_kv_store.get_value("search_results")226 # Otherwise try to load from local file227 elif os.path.exists("storage/key_value_stores/search_results.json"):228 with open("storage/key_value_stores/search_results.json", "r") as f:229 results_dict = json.load(f)230 231 if not results_dict:232 return {"error": "No search results found"}233 234 # Convert to SearchResults235 search_results = SearchResults(**results_dict)236 237 if not search_results.results:238 return {"error": "No results to summarize"}239 240 # Generate summaries if LLM API token is available241 if self.search_criteria.llm_api_token:242 summaries = []243 244 for listing in search_results.results:245 summary = summarize_property(listing, self.search_criteria.llm_api_token)246 summaries.append({247 "id": listing.id,248 "summary": summary,249 "is_new": listing.is_new250 })251 252 return {253 "summaries": summaries,254 "total_summaries": len(summaries),255 "summarize_date": datetime.now().isoformat()256 }257 else:258 # Generate basic summaries without LLM259 summaries = []260 261 for listing in search_results.results:262 basic_summary = (263 f"{listing.title}: {listing.bedrooms} bed, "264 f"{listing.bathrooms or 'unknown'} bath {listing.property_type} "265 f"for ${listing.price:,.2f} in {listing.address.city}, "266 f"{listing.address.state}."267 )268 269 summaries.append({270 "id": listing.id,271 "summary": basic_summary,272 "is_new": listing.is_new273 })274 275 return {276 "summaries": summaries,277 "total_summaries": len(summaries),278 "summarize_date": datetime.now().isoformat()279 }280 281 except Exception as e:282 Actor.log.exception(f"Error summarizing properties: {e}")283 return {"error": str(e)}284 285 async def _arun(self, query: str) -> Dict[str, Any]:286 """Async version of _run.287 288 Args:289 query: Summarize query290 291 Returns:292 Summarized search results293 """294 return self._run(query)295
296
297class SearchAgentCrew:298 """Crew of agents for property search."""299 300 def __init__(self, search_criteria: SearchCriteria):301 """Initialize the search agent crew.302 303 Args:304 search_criteria: Search criteria305 """306 self.search_criteria = search_criteria307 self.llm = None308 309 # Initialize LLM if token is provided310 if search_criteria.llm_api_token:311 self.llm = ChatOpenAI(312 api_key=search_criteria.llm_api_token,313 temperature=0,314 model="gpt-3.5-turbo"315 )316 317 def run(self) -> SearchResults:318 """Run the search agent crew.319 320 Returns:321 Search results322 """323 # If no LLM, just run the search directly324 if not self.llm:325 Actor.log.info("No LLM API token provided, running basic search without agents")326 search_tool = SearchTool(self.search_criteria)327 search_tool._run("")328 329 # Load and return results330 try:331 # Try loading from Apify KV store if available332 if hasattr(Actor, 'main_kv_store'):333 results_dict = Actor.main_kv_store.get_value("search_results")334 # Otherwise try to load from local file335 elif os.path.exists("storage/key_value_stores/search_results.json"):336 with open("storage/key_value_stores/search_results.json", "r") as f:337 results_dict = json.load(f)338 else:339 results_dict = None340 341 if results_dict:342 return SearchResults(**results_dict)343 except Exception as e:344 Actor.log.error(f"Error loading search results: {e}")345 346 # Create empty results if loading failed347 return SearchResults(348 search_criteria=self.search_criteria,349 results=[],350 total_results=0,351 new_results=0,352 sources_searched=[]353 )354 355 # Create tools356 search_tool = SearchTool(self.search_criteria)357 filter_tool = FilterTool(self.search_criteria)358 summarize_tool = SummarizeTool(self.search_criteria)359 360 # Create agents361 search_agent = Agent(362 role="Real Estate Search Specialist",363 goal="Find properties that match the search criteria",364 backstory="You are an expert in finding real estate listings across multiple platforms.",365 verbose=True,366 allow_delegation=True,367 tools=[search_tool],368 llm=self.llm369 )370 371 filter_agent = Agent(372 role="Property Filter Specialist",373 goal="Filter properties to find the best matches for the user",374 backstory="You are an expert in analyzing property details and matching them with user preferences.",375 verbose=True,376 allow_delegation=True,377 tools=[filter_tool],378 llm=self.llm379 )380 381 summarize_agent = Agent(382 role="Property Summarizer",383 goal="Create concise, informative summaries of properties",384 backstory="You are skilled at creating appealing property descriptions that highlight key features.",385 verbose=True,386 allow_delegation=True,387 tools=[summarize_tool],388 llm=self.llm389 )390 391 # Create tasks392 search_task = Task(393 description=(394 f"Search for properties in {self.search_criteria.location} "395 f"with {self.search_criteria.min_bedrooms}+ bedrooms, "396 f"maximum price of ${self.search_criteria.max_price or 'any'}, "397 f"property type: {self.search_criteria.property_type}. "398 f"Search sources: {', '.join(self.search_criteria.sources)}."399 ),400 agent=search_agent,401 expected_output="A report of the total number of properties found"402 )403 404 filter_task = Task(405 description=(406 "Filter the search results to find properties that best match "407 f"the user's criteria, especially regarding amenities: {', '.join(self.search_criteria.amenities)}"408 ),409 agent=filter_agent,410 expected_output="A report of how many properties passed the filtering"411 )412 413 summarize_task = Task(414 description=(415 "Create summaries for each property highlighting key features. "416 "Mark new listings that weren't found in previous searches."417 ),418 agent=summarize_agent,419 expected_output="Summaries of each property"420 )421 422 # Create crew423 crew = Crew(424 agents=[search_agent, filter_agent, summarize_agent],425 tasks=[search_task, filter_task, summarize_task],426 verbose=True427 )428 429 # Run the crew430 try:431 result = crew.kickoff()432 433 # Load and return results434 try:435 # Try loading from Apify KV store if available436 if hasattr(Actor, 'main_kv_store'):437 results_dict = Actor.main_kv_store.get_value("search_results")438 # Otherwise try to load from local file439 elif os.path.exists("storage/key_value_stores/search_results.json"):440 with open("storage/key_value_stores/search_results.json", "r") as f:441 results_dict = json.load(f)442 else:443 results_dict = None444 445 if results_dict:446 return SearchResults(**results_dict)447 except Exception as e:448 Actor.log.error(f"Error loading search results: {e}")449 except Exception as e:450 Actor.log.error(f"Error running crew: {e}")451 452 # If we got here, either there was an error or no results were found453 # Create empty results454 return SearchResults(455 search_criteria=self.search_criteria,456 results=[],457 total_results=0,458 new_results=0,459 sources_searched=[]460 )
src/agents/__init__.py
1"""Agent classes for Listing Sleuth."""
src/scrapers/__init__.py
1"""Scrapers for real estate platforms."""
src/scrapers/apartments.py
1"""Apartments.com scraper."""2
3import re4import json5import uuid6from typing import Dict, Any, List, Optional7from datetime import datetime8from pydantic import HttpUrl9
10from apify import Actor11from apify_client import ApifyClient12
13from .base import BaseScraper14from ..models.property import PropertyListing, Address, SearchCriteria15
16
17class ApartmentsScraper(BaseScraper):18 """Apartments.com scraper."""19 20 @property21 def actor_id(self) -> str:22 """Get Apify actor ID for Apartments.com.23 24 Returns:25 Actor ID26 """27 return "epctex/apartments-scraper"28 29 @property30 def source_name(self) -> str:31 """Get source name.32 33 Returns:34 Source name35 """36 return "apartments"37 38 def prepare_input(self) -> Dict[str, Any]:39 """Prepare input for the Apartments.com scraper.40 41 Returns:42 Actor input43 """44 # Parse location into city and state45 location_parts = self.search_criteria.location.split(",")46 city = location_parts[0].strip().replace(" ", "-").lower()47 state = ""48 if len(location_parts) > 1:49 state = location_parts[1].strip().lower()50 51 # Construct location for URL52 if state:53 location_url = f"{city}-{state}"54 else:55 location_url = city56 57 # Base URL58 base_url = f"https://www.apartments.com/{location_url}"59 60 # Start building search parameters61 search_params = {}62 63 # Bedrooms filter64 if self.search_criteria.min_bedrooms > 0 and self.search_criteria.max_bedrooms:65 if self.search_criteria.min_bedrooms == self.search_criteria.max_bedrooms:66 search_params["br"] = str(self.search_criteria.min_bedrooms)67 else:68 search_params["br-min"] = str(self.search_criteria.min_bedrooms)69 search_params["br-max"] = str(self.search_criteria.max_bedrooms)70 elif self.search_criteria.min_bedrooms > 0:71 search_params["br-min"] = str(self.search_criteria.min_bedrooms)72 elif self.search_criteria.max_bedrooms:73 search_params["br-max"] = str(self.search_criteria.max_bedrooms)74 75 # Price filter76 if self.search_criteria.min_price > 0:77 search_params["price-min"] = str(int(self.search_criteria.min_price))78 if self.search_criteria.max_price:79 search_params["price-max"] = str(int(self.search_criteria.max_price))80 81 # Property type - apartments.com primarily focuses on apartments, but can filter for types82 if self.search_criteria.property_type != "any" and self.search_criteria.property_type != "apartment":83 search_params["type"] = self.search_criteria.property_type84 85 return {86 "startUrls": [{"url": base_url}],87 "searchParams": search_params,88 "maxItems": self.max_items,89 "extendOutputFunction": """async ({ data, item, customData, Apify }) => {90 return { ...item };91 }""",92 "proxy": {93 "useApifyProxy": True,94 "apifyProxyGroups": ["RESIDENTIAL"]95 }96 }97 98 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:99 """Transform an Apartments.com listing to a PropertyListing.100 101 Args:102 item: Apartments.com listing103 104 Returns:105 PropertyListing106 """107 # Parse price108 price_str = item.get("rent", "0")109 if isinstance(price_str, str):110 # Extract digits from price string111 price_match = re.search(r'(\d{1,3}(?:,\d{3})*(?:\.\d+)?)', price_str)112 if price_match:113 price_clean = price_match.group(1).replace(",", "")114 price = float(price_clean)115 else:116 price = 0117 else:118 price = float(price_str) if price_str else 0119 120 # Parse address121 property_address = item.get("propertyAddress", {})122 address_line = property_address.get("addressLine", "")123 neighborhood = property_address.get("neighborhood", "")124 city = property_address.get("city", "")125 state = property_address.get("state", "")126 postal_code = property_address.get("postalCode", None)127 128 address = Address(129 street=address_line,130 city=city or neighborhood, # Use neighborhood if city is missing131 state=state,132 zip_code=postal_code133 )134 135 # Parse bedrooms136 bedrooms = 0137 beds = item.get("beds", 0)138 if isinstance(beds, str):139 bed_match = re.search(r'(\d+\.?\d*)', beds)140 bedrooms = float(bed_match.group(1)) if bed_match else 0141 else:142 bedrooms = float(beds) if beds else 0143 144 # Parse bathrooms145 bathrooms = None146 baths = item.get("baths", None)147 if baths:148 if isinstance(baths, str):149 bath_match = re.search(r'(\d+\.?\d*)', baths)150 bathrooms = float(bath_match.group(1)) if bath_match else None151 else:152 bathrooms = float(baths)153 154 # Parse square feet155 sqft = None156 sqft_str = item.get("sqft", None)157 if sqft_str:158 if isinstance(sqft_str, str):159 sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))160 sqft = int(sqft_match.group(1)) if sqft_match else None161 else:162 sqft = int(sqft_str)163 164 # Determine property type165 property_type = "apartment" # Default for apartments.com166 if "condo" in item.get("title", "").lower() or "condo" in item.get("description", "").lower():167 property_type = "condo"168 elif "townhouse" in item.get("title", "").lower() or "townhouse" in item.get("description", "").lower():169 property_type = "townhouse"170 elif "house" in item.get("title", "").lower() and "townhouse" not in item.get("title", "").lower():171 property_type = "house"172 173 # Get URL174 url = item.get("url", "")175 176 # Get images177 images = []178 photos = item.get("photos", [])179 if isinstance(photos, list):180 for photo in photos:181 if isinstance(photo, dict) and "url" in photo:182 images.append(photo["url"])183 elif isinstance(photo, str) and photo.startswith("http"):184 images.append(photo)185 186 # Extract amenities187 amenities = []188 189 # Add apartment amenities190 apartment_amenities = item.get("apartmentAmenities", [])191 if isinstance(apartment_amenities, list):192 amenities.extend(apartment_amenities)193 194 # Add community amenities195 community_amenities = item.get("communityAmenities", [])196 if isinstance(community_amenities, list):197 amenities.extend(community_amenities)198 199 # Also use the base extract_amenities method to catch any missed ones200 amenities.extend(self.extract_amenities(item))201 202 # Remove duplicates while preserving order203 amenities = list(dict.fromkeys(amenities))204 205 # Generate a unique ID206 property_id = str(item.get("id", uuid.uuid4()))207 208 # Create features dictionary for additional data209 additional_features = {}210 for key, value in item.items():211 if key not in [212 "rent", "propertyAddress", "beds", "baths", "sqft", "url", "photos",213 "apartmentAmenities", "communityAmenities", "id", "title", "description",214 ]:215 additional_features[key] = value216 217 # Parse listing date if available218 listed_date = None219 date_str = item.get("dateAvailable", item.get("datePosted", None))220 if date_str and isinstance(date_str, str):221 try:222 # Try common date formats223 for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%b %d, %Y"]:224 try:225 listed_date = datetime.strptime(date_str, fmt)226 break227 except ValueError:228 continue229 except Exception:230 pass231 232 return PropertyListing(233 id=property_id,234 title=item.get("title", "Property Listing"),235 description=item.get("description", None),236 price=price,237 address=address,238 bedrooms=bedrooms,239 bathrooms=bathrooms,240 square_feet=sqft,241 property_type=property_type,242 url=url,243 source="apartments",244 amenities=amenities,245 images=images,246 listed_date=listed_date,247 features=additional_features248 )
src/scrapers/base.py
1"""Base scraper class for all real estate platform scrapers."""2
3import re4import json5import uuid6import os7from abc import ABC, abstractmethod8from typing import List, Dict, Any, Optional9from datetime import datetime10from apify import Actor11from apify_client import ApifyClient12from pydantic import HttpUrl13
14from ..models.property import PropertyListing, Address, SearchCriteria15
16
17class BaseScraper(ABC):18 """Base scraper class that all platform-specific scrapers should inherit from."""19 20 def __init__(21 self,22 search_criteria: SearchCriteria,23 apify_client: Optional[ApifyClient] = None,24 max_items: int = 10025 ):26 """Initialize the scraper.27 28 Args:29 search_criteria: Search criteria30 apify_client: Apify client. If None, creates a new client31 max_items: Maximum number of items to scrape32 """33 self.search_criteria = search_criteria34 self.apify_client = apify_client or ApifyClient()35 self.max_items = max_items36 37 @property38 @abstractmethod39 def actor_id(self) -> str:40 """Apify actor ID for the scraper.41 42 Returns:43 Actor ID44 """45 pass46 47 @property48 @abstractmethod49 def source_name(self) -> str:50 """Name of the source.51 52 Returns:53 Source name54 """55 pass56 57 @abstractmethod58 def prepare_input(self) -> Dict[str, Any]:59 """Prepare input for the Apify actor.60 61 Returns:62 Actor input63 """64 pass65 66 @abstractmethod67 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:68 """Transform a scraped item into a PropertyListing.69 70 Args:71 item: Scraped item72 73 Returns:74 PropertyListing75 """76 pass77 78 def parse_address(self, address_str: str) -> Address:79 """Parse address string into Address model.80 81 Args:82 address_str: Address string83 84 Returns:85 Address86 """87 # Default implementation with simple parsing88 # Subclasses can override for platform-specific parsing89 address_parts = address_str.split(",")90 91 if len(address_parts) >= 3:92 street = address_parts[0].strip()93 city = address_parts[1].strip()94 state_zip = address_parts[2].strip().split()95 state = state_zip[0].strip() if state_zip else ""96 zip_code = state_zip[1].strip() if len(state_zip) > 1 else None97 elif len(address_parts) == 2:98 street = None99 city = address_parts[0].strip()100 state_zip = address_parts[1].strip().split()101 state = state_zip[0].strip() if state_zip else ""102 zip_code = state_zip[1].strip() if len(state_zip) > 1 else None103 else:104 # If we can't parse the address properly, use a minimal approach105 street = None106 # Try to extract a known state abbreviation107 state_match = re.search(r'\b([A-Z]{2})\b', address_str)108 if state_match:109 state = state_match.group(1)110 # Assume the city is before the state111 city_match = re.search(r'([^,]+),\s*' + state, address_str)112 city = city_match.group(1) if city_match else address_str113 else:114 # If we can't extract a state, use the whole string as city115 city = address_str116 state = ""117 zip_code = None118 119 return Address(120 street=street,121 city=city,122 state=state,123 zip_code=zip_code124 )125 126 def extract_amenities(self, item: Dict[str, Any]) -> List[str]:127 """Extract amenities from a scraped item.128 129 Args:130 item: Scraped item131 132 Returns:133 List of amenities134 """135 # Default implementation that subclasses can override136 amenities = []137 138 # Look for amenities in features or amenities field139 if "amenities" in item and isinstance(item["amenities"], list):140 amenities.extend(item["amenities"])141 142 if "features" in item and isinstance(item["features"], list):143 amenities.extend(item["features"])144 145 # Look for amenities in description146 if "description" in item and isinstance(item["description"], str):147 # Common amenities to look for in descriptions148 common_amenities = [149 "parking", "garage", "gym", "fitness", "pool", "washer", "dryer", 150 "dishwasher", "air conditioning", "ac", "balcony", "patio", 151 "hardwood", "fireplace", "wheelchair", "elevator", "pet friendly"152 ]153 154 description = item["description"].lower()155 for amenity in common_amenities:156 if amenity in description and amenity not in amenities:157 amenities.append(amenity)158 159 return amenities160 161 def scrape(self) -> List[PropertyListing]:162 """Scrape properties based on search criteria.163 164 Returns:165 List of property listings166 """167 Actor.log.info(f"Starting {self.source_name} scraper")168 169 # Prepare input for the Apify actor170 input_data = self.prepare_input()171 172 # Check if we're running in local mode for testing173 if os.environ.get("ACTOR_TEST_PAY_PER_EVENT") == "true" and not os.environ.get("APIFY_TOKEN"):174 Actor.log.info(f"Running in local test mode, using mock data for {self.source_name}")175 return self.get_mock_listings()176 177 Actor.log.info(f"Running Apify actor {self.actor_id} with input: {input_data}")178 179 try:180 # Run the actor181 run = self.apify_client.actor(self.actor_id).call(182 run_input=input_data,183 build="latest"184 )185 186 # Get the dataset187 dataset_id = run["defaultDatasetId"]188 items = self.apify_client.dataset(dataset_id).list_items(limit=self.max_items).items189 190 Actor.log.info(f"Scraped {len(items)} items from {self.source_name}")191 192 # Transform items to PropertyListings193 listings = []194 for item in items:195 try:196 listing = self.transform_item(item)197 listings.append(listing)198 except Exception as e:199 Actor.log.exception(f"Error transforming item: {e}")200 continue201 202 Actor.log.info(f"Transformed {len(listings)} listings from {self.source_name}")203 204 return listings205 except Exception as e:206 Actor.log.error(f"Error scraping {self.source_name}: {e}")207 return self.get_mock_listings()208 209 def get_mock_listings(self) -> List[PropertyListing]:210 """Get mock listings for local testing.211 212 Returns:213 List of mock property listings214 """215 Actor.log.info(f"Generating mock data for {self.source_name}")216 217 # Create 5 mock listings218 mock_listings = []219 220 for i in range(1, 6):221 mock_listings.append(222 PropertyListing(223 id=f"{self.source_name}_mock_{i}",224 title=f"Mock {self.source_name} Listing {i}",225 description=f"This is a mock listing for testing purposes. In {self.search_criteria.location} with {self.search_criteria.min_bedrooms} bedrooms.",226 url=f"https://example.com/{self.source_name}/mock-listing-{i}",227 price=float(self.search_criteria.min_price or 1000) + (i * 200),228 bedrooms=self.search_criteria.min_bedrooms + (i % 2),229 bathrooms=self.search_criteria.min_bedrooms / 2 + (i % 2),230 address=Address(231 street=f"{100 + i} Main St",232 city=self.search_criteria.location.split(",")[0].strip(),233 state=self.search_criteria.location.split(",")[-1].strip(),234 zip_code="12345"235 ),236 property_type=self.search_criteria.property_type,237 source=self.source_name,238 amenities=self.search_criteria.amenities + ["parking", "air conditioning"],239 listed_date=datetime.now(),240 is_new=True241 )242 )243 244 Actor.log.info(f"Generated {len(mock_listings)} mock listings for {self.source_name}")245 return mock_listings
src/scrapers/realtor.py
1"""Realtor.com scraper."""2
3import re4import json5import uuid6from typing import Dict, Any, List, Optional7from datetime import datetime8from pydantic import HttpUrl9
10from apify import Actor11from apify_client import ApifyClient12
13from .base import BaseScraper14from ..models.property import PropertyListing, Address, SearchCriteria15
16
17class RealtorScraper(BaseScraper):18 """Realtor.com scraper."""19 20 @property21 def actor_id(self) -> str:22 """Get Apify actor ID for Realtor.com.23 24 Returns:25 Actor ID26 """27 return "epctex/realtor-scraper"28 29 @property30 def source_name(self) -> str:31 """Get source name.32 33 Returns:34 Source name35 """36 return "realtor"37 38 def prepare_input(self) -> Dict[str, Any]:39 """Prepare input for the Realtor.com scraper.40 41 Returns:42 Actor input43 """44 # Parse location into city and state45 location_parts = self.search_criteria.location.split(",")46 city = location_parts[0].strip().replace(" ", "-").lower()47 state = ""48 if len(location_parts) > 1:49 state = location_parts[1].strip().lower()50 51 # Property type mapping52 property_type_map = {53 "apartment": "apartments",54 "house": "single-family-home",55 "condo": "condos",56 "townhouse": "townhomes",57 "any": "any"58 }59 60 property_type = property_type_map.get(61 self.search_criteria.property_type, "any"62 )63 64 # Base search URL65 if self.search_criteria.search_type == "rent":66 base_url = "https://www.realtor.com/apartments"67 else:68 base_url = "https://www.realtor.com/realestateandhomes-search"69 70 # Construct location part of URL71 if state:72 location_url = f"{city}_{state}"73 else:74 location_url = city75 76 # Build search URL77 input_url = f"{base_url}/{location_url}"78 79 # Start building search parameters80 search_params = {}81 82 # Add property type83 if property_type != "any":84 search_params["prop"] = property_type85 86 # Add bedroom filter87 if self.search_criteria.min_bedrooms > 0:88 search_params["beds-lower"] = str(self.search_criteria.min_bedrooms)89 if self.search_criteria.max_bedrooms:90 search_params["beds-upper"] = str(self.search_criteria.max_bedrooms)91 92 # Add price filter93 if self.search_criteria.min_price > 0:94 search_params["price-lower"] = str(int(self.search_criteria.min_price))95 if self.search_criteria.max_price:96 search_params["price-upper"] = str(int(self.search_criteria.max_price))97 98 return {99 "startUrls": [{"url": input_url}],100 "searchParams": search_params,101 "maxItems": self.max_items,102 "extendOutputFunction": """async ({ data, item, customData, Apify }) => {103 return { ...item };104 }""",105 "proxy": {106 "useApifyProxy": True,107 "apifyProxyGroups": ["RESIDENTIAL"]108 }109 }110 111 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:112 """Transform a Realtor.com listing to a PropertyListing.113 114 Args:115 item: Realtor.com listing116 117 Returns:118 PropertyListing119 """120 # Parse price121 price_str = item.get("price", "0")122 if isinstance(price_str, str):123 # Remove currency symbols and commas124 price_str = re.sub(r'[^\d.]', '', price_str)125 price = float(price_str) if price_str else 0126 else:127 price = float(price_str) if price_str else 0128 129 # Get address components130 full_address = item.get("address", "")131 address_components = item.get("addressComponents", {})132 133 # Construct address134 street = address_components.get("streetName", "")135 if "streetNumber" in address_components:136 street = f"{address_components['streetNumber']} {street}"137 138 address = Address(139 street=street,140 city=address_components.get("city", ""),141 state=address_components.get("state", ""),142 zip_code=address_components.get("zipcode", None)143 )144 145 # Parse bedrooms146 bedrooms = 0147 beds = item.get("beds", 0)148 if isinstance(beds, str):149 bed_match = re.search(r'(\d+\.?\d*)', beds)150 bedrooms = float(bed_match.group(1)) if bed_match else 0151 else:152 bedrooms = float(beds) if beds else 0153 154 # Parse bathrooms155 bathrooms = None156 baths = item.get("baths", None)157 if baths:158 if isinstance(baths, str):159 bath_match = re.search(r'(\d+\.?\d*)', baths)160 bathrooms = float(bath_match.group(1)) if bath_match else None161 else:162 bathrooms = float(baths)163 164 # Parse square feet165 sqft = None166 sqft_str = item.get("sqft", None)167 if sqft_str:168 if isinstance(sqft_str, str):169 sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))170 sqft = int(sqft_match.group(1)) if sqft_match else None171 else:172 sqft = int(sqft_str)173 174 # Determine property type175 property_type = item.get("propertyType", "").lower()176 if not property_type:177 property_subtype = item.get("propertySubType", "").lower()178 if property_subtype:179 property_type = property_subtype180 else:181 property_type = "unknown"182 183 # Get URL184 url = item.get("detailUrl", "")185 if not url.startswith("http"):186 url = f"https://www.realtor.com{url}"187 188 # Get images189 images = []190 photos = item.get("photos", [])191 if isinstance(photos, list):192 for photo in photos:193 if isinstance(photo, dict) and "url" in photo:194 images.append(photo["url"])195 elif isinstance(photo, str) and photo.startswith("http"):196 images.append(photo)197 198 # Extract amenities199 amenities = self.extract_amenities(item)200 201 # Check for specific features in the item data202 features = item.get("features", {})203 if features:204 for category, feature_list in features.items():205 if isinstance(feature_list, list):206 amenities.extend(feature_list)207 208 # Generate a unique ID209 property_id = str(item.get("listingId", uuid.uuid4()))210 211 # Create features dictionary for additional data212 additional_features = {}213 for key, value in item.items():214 if key not in [215 "price", "address", "addressComponents", "beds", "baths", "sqft",216 "propertyType", "propertySubType", "detailUrl", "photos", "features",217 "listingId", "description", "amenities"218 ]:219 additional_features[key] = value220 221 return PropertyListing(222 id=property_id,223 title=item.get("title", "Property Listing"),224 description=item.get("description", None),225 price=price,226 address=address,227 bedrooms=bedrooms,228 bathrooms=bathrooms,229 square_feet=sqft,230 property_type=property_type,231 url=url,232 source="realtor",233 amenities=amenities,234 images=images,235 features=additional_features236 )
src/scrapers/zillow.py
1"""Zillow scraper."""2
3import re4import json5import uuid6from typing import Dict, Any, List, Optional7from datetime import datetime8from pydantic import HttpUrl9
10from apify import Actor11from apify_client import ApifyClient12
13from .base import BaseScraper14from ..models.property import PropertyListing, Address, SearchCriteria15
16
17class ZillowScraper(BaseScraper):18 """Zillow scraper."""19 20 @property21 def actor_id(self) -> str:22 """Get Apify actor ID for Zillow.23 24 Returns:25 Actor ID26 """27 return "maxcopell/zillow-detail-scraper"28 29 @property30 def source_name(self) -> str:31 """Get source name.32 33 Returns:34 Source name35 """36 return "zillow"37 38 def prepare_input(self) -> Dict[str, Any]:39 """Prepare input for the Zillow scraper.40 41 Returns:42 Actor input43 """44 location = self.search_criteria.location.replace(", ", ",").replace(" ", "-").lower()45 46 # Property type mapping47 property_type_map = {48 "apartment": "apartment",49 "house": "house",50 "condo": "condo",51 "townhouse": "townhome",52 "any": ""53 }54 55 property_type = property_type_map.get(56 self.search_criteria.property_type, ""57 )58 59 # Build the URL60 if self.search_criteria.search_type == "rent":61 base_url = f"https://www.zillow.com/homes/for_rent/{location}"62 else:63 base_url = f"https://www.zillow.com/homes/{location}"64 65 # Add filters based on search criteria66 filters = []67 68 # Price filter69 if self.search_criteria.min_price > 0 or self.search_criteria.max_price:70 price_filter = "price"71 if self.search_criteria.min_price > 0:72 price_filter += f"_gte-{int(self.search_criteria.min_price)}"73 if self.search_criteria.max_price:74 price_filter += f"_lte-{int(self.search_criteria.max_price)}"75 filters.append(price_filter)76 77 # Bedroom filter78 if self.search_criteria.min_bedrooms > 0 or self.search_criteria.max_bedrooms:79 if self.search_criteria.min_bedrooms == self.search_criteria.max_bedrooms:80 filters.append(f"{self.search_criteria.min_bedrooms}-_beds")81 else:82 bedroom_filter = "beds"83 if self.search_criteria.min_bedrooms > 0:84 bedroom_filter += f"_gte-{self.search_criteria.min_bedrooms}"85 if self.search_criteria.max_bedrooms:86 bedroom_filter += f"_lte-{self.search_criteria.max_bedrooms}"87 filters.append(bedroom_filter)88 89 # Property type filter90 if property_type:91 filters.append(f"type-{property_type}")92 93 # Assemble the URL with filters94 if filters:95 filter_string = "/".join(filters)96 url = f"{base_url}/{filter_string}"97 else:98 url = base_url99 100 return {101 "startUrls": [{"url": url}],102 "maxPages": 10,103 "includeRental": self.search_criteria.search_type == "rent",104 "includeSale": self.search_criteria.search_type == "buy",105 "includeAuction": False,106 "proxy": {107 "useApifyProxy": True,108 "apifyProxyGroups": ["RESIDENTIAL"]109 }110 }111 112 def transform_item(self, item: Dict[str, Any]) -> PropertyListing:113 """Transform a Zillow listing to a PropertyListing.114 115 Args:116 item: Zillow listing117 118 Returns:119 PropertyListing120 """121 # Parse price122 price_str = item.get("price", "0")123 if isinstance(price_str, str):124 # Remove currency symbols and commas125 price_str = re.sub(r'[^\d.]', '', price_str)126 price = float(price_str) if price_str else 0127 else:128 price = float(price_str)129 130 # Parse address131 address_str = item.get("address", "")132 address = self.parse_address(address_str)133 134 # Parse bedrooms135 bedrooms_str = item.get("bedrooms", "0")136 if isinstance(bedrooms_str, str):137 bedroom_match = re.search(r'(\d+\.?\d*)', bedrooms_str)138 bedrooms = float(bedroom_match.group(1)) if bedroom_match else 0139 else:140 bedrooms = float(bedrooms_str) if bedrooms_str else 0141 142 # Parse bathrooms143 bathrooms_str = item.get("bathrooms", None)144 if bathrooms_str:145 if isinstance(bathrooms_str, str):146 bathroom_match = re.search(r'(\d+\.?\d*)', bathrooms_str)147 bathrooms = float(bathroom_match.group(1)) if bathroom_match else None148 else:149 bathrooms = float(bathrooms_str)150 else:151 bathrooms = None152 153 # Parse square feet154 sqft_str = item.get("livingArea", None)155 if sqft_str:156 if isinstance(sqft_str, str):157 # Remove non-digit characters158 sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))159 sqft = int(sqft_match.group(1)) if sqft_match else None160 else:161 sqft = int(sqft_str)162 else:163 sqft = None164 165 # Extract amenities166 amenities = self.extract_amenities(item)167 168 # Get property type169 property_type = item.get("homeType", "").lower()170 if not property_type:171 # Try to infer from description or facts172 if "apartment" in item.get("description", "").lower():173 property_type = "apartment"174 elif "condo" in item.get("description", "").lower():175 property_type = "condo"176 elif "house" in item.get("description", "").lower():177 property_type = "house"178 elif "townhouse" in item.get("description", "").lower() or "town house" in item.get("description", "").lower():179 property_type = "townhouse"180 else:181 property_type = "unknown"182 183 # Get listing URL184 url = item.get("url", "")185 if not url.startswith("http"):186 url = f"https://www.zillow.com{url}"187 188 # Get images189 images = []190 if "images" in item and isinstance(item["images"], list):191 for img in item["images"]:192 if isinstance(img, str) and img.startswith("http"):193 images.append(img)194 195 # Generate a unique ID196 property_id = str(item.get("zpid", uuid.uuid4()))197 198 # Extract any additional features199 features = {}200 for key, value in item.items():201 if key not in [202 "price", "address", "bedrooms", "bathrooms", "livingArea", 203 "homeType", "description", "url", "images", "zpid", "amenities"204 ]:205 features[key] = value206 207 return PropertyListing(208 id=property_id,209 title=item.get("streetAddress", "Property Listing"),210 description=item.get("description", None),211 price=price,212 address=address,213 bedrooms=bedrooms,214 bathrooms=bathrooms,215 square_feet=sqft,216 property_type=property_type,217 url=url,218 source="zillow",219 amenities=amenities,220 images=images,221 features=features222 )
src/utils/__init__.py
1"""Utility functions for Listing Sleuth."""
src/utils/llm.py
1"""LLM utility functions for Listing Sleuth."""2
3import os4from typing import List, Dict, Any, Optional5from langchain_openai import ChatOpenAI6from langchain.prompts import ChatPromptTemplate7from langchain.output_parsers import PydanticOutputParser8from langchain.schema import Document9
10from ..models.property import PropertyListing, SearchCriteria11
12
13def get_llm(api_token: Optional[str] = None) -> ChatOpenAI:14 """Get LLM client.15 16 Args:17 api_token: OpenAI API token. If None, tries to get from environment.18 19 Returns:20 ChatOpenAI instance21 22 Raises:23 ValueError: If API token is not provided and not found in environment.24 """25 token = api_token or os.environ.get("OPENAI_API_KEY")26 if not token:27 raise ValueError(28 "OpenAI API token not provided. Please provide a token in the input "29 "or set the OPENAI_API_KEY environment variable."30 )31 32 return ChatOpenAI(33 api_key=token,34 model="gpt-3.5-turbo",35 temperature=036 )37
38
39def filter_properties_with_llm(40 properties: List[PropertyListing],41 search_criteria: SearchCriteria,42 api_token: Optional[str] = None43) -> List[PropertyListing]:44 """Filter properties with LLM based on search criteria.45 46 Args:47 properties: List of property listings48 search_criteria: Search criteria49 api_token: OpenAI API token50 51 Returns:52 Filtered list of property listings53 """54 if not properties:55 return []56 57 if not api_token and not search_criteria.llm_api_token:58 # Without a token, just do basic filtering59 return properties60 61 llm = get_llm(api_token or search_criteria.llm_api_token)62 parser = PydanticOutputParser(pydantic_object=PropertyListing)63 64 template = """65 You are an AI assistant helping to filter real estate listings based on specific criteria.66 67 The user is looking for the following:68 - Location: {location}69 - Property type: {property_type}70 - Price range: ${min_price} - ${max_price} (0 means no minimum, None means no maximum)71 - Bedrooms: {min_bedrooms} - {max_bedrooms} (None means no maximum)72 - Desired amenities: {amenities}73 74 For each property, evaluate how well it fits the criteria, with special attention to amenities75 and any specific requirements. Return the property object unmodified if it's a good match,76 filtering out properties that don't meet the criteria.77 78 Here are the properties to evaluate:79 {properties}80 81 If the user mentioned any amenities, prioritize properties with those amenities.82 """83 84 # Process in smaller batches to avoid token limits85 batch_size = 586 filtered_properties = []87 88 for i in range(0, len(properties), batch_size):89 batch = properties[i:i+batch_size]90 91 prompt = ChatPromptTemplate.from_template(template)92 chain = prompt | llm93 94 # Simplify property objects for LLM consumption95 simplified_batch = [96 {97 "id": p.id,98 "title": p.title,99 "price": p.price,100 "bedrooms": p.bedrooms,101 "bathrooms": p.bathrooms,102 "property_type": p.property_type,103 "address": str(p.address),104 "amenities": p.amenities,105 "description": p.description,106 "url": str(p.url)107 }108 for p in batch109 ]110 111 result = chain.invoke({112 "location": search_criteria.location,113 "property_type": search_criteria.property_type,114 "min_price": search_criteria.min_price,115 "max_price": search_criteria.max_price,116 "min_bedrooms": search_criteria.min_bedrooms,117 "max_bedrooms": search_criteria.max_bedrooms,118 "amenities": search_criteria.amenities,119 "properties": simplified_batch120 })121 122 # Extract property IDs that the LLM determined to be good matches123 response_text = result.content124 passing_ids = []125 126 # Simple parsing of response - in production, this would be more robust127 for line in response_text.split("\n"):128 if "id:" in line and "good match" in line.lower():129 try:130 id_part = line.split("id:")[1].strip()131 property_id = id_part.split()[0].strip(",")132 passing_ids.append(property_id)133 except IndexError:134 continue135 136 # Add matching properties to filtered list137 for p in batch:138 if p.id in passing_ids:139 filtered_properties.append(p)140 141 return filtered_properties142
143
144def summarize_property(145 property_listing: PropertyListing,146 api_token: Optional[str] = None147) -> str:148 """Generate a natural language summary of a property.149 150 Args:151 property_listing: Property listing to summarize152 api_token: OpenAI API token153 154 Returns:155 Summary of property156 """157 try:158 llm = get_llm(api_token)159 except ValueError:160 # Fall back to basic summary if no API token161 return (162 f"{property_listing.title}: {property_listing.bedrooms} bed, "163 f"{property_listing.bathrooms or 'unknown'} bath {property_listing.property_type} "164 f"for ${property_listing.price:,.2f} in {property_listing.address.city}, "165 f"{property_listing.address.state}."166 )167 168 template = """169 Create a concise, appealing summary of this property listing in one paragraph:170 171 Title: {title}172 Price: ${price}173 Address: {address}174 Property type: {property_type}175 Bedrooms: {bedrooms}176 Bathrooms: {bathrooms}177 Square feet: {square_feet}178 Amenities: {amenities}179 Description: {description}180 181 Keep the summary brief but informative, highlighting key selling points.182 """183 184 prompt = ChatPromptTemplate.from_template(template)185 chain = prompt | llm186 187 result = chain.invoke({188 "title": property_listing.title,189 "price": f"{property_listing.price:,.2f}",190 "address": str(property_listing.address),191 "property_type": property_listing.property_type,192 "bedrooms": property_listing.bedrooms,193 "bathrooms": property_listing.bathrooms or "unknown",194 "square_feet": property_listing.square_feet or "unknown",195 "amenities": ", ".join(property_listing.amenities) or "none specified",196 "description": property_listing.description or "No description provided"197 })198 199 return result.content.strip()
src/utils/storage.py
1"""Storage utility functions for Listing Sleuth."""2
3import json4import os5from typing import Dict, List, Optional, Any, Union6from datetime import datetime7from pydantic import BaseModel8from apify import Actor9
10from ..models.property import PropertyListing, SearchResults, SearchCriteria11
12
13def save_search_results(results: SearchResults) -> None:14 """Save search results to Apify key-value store.15 16 Args:17 results: Search results to save18 """19 # Convert results to dict for storage20 results_dict = results.model_dump()21 22 # Convert datetime objects to ISO format strings23 results_dict["search_date"] = results_dict["search_date"].isoformat()24 for i, result in enumerate(results_dict["results"]):25 if result.get("listed_date"):26 results_dict["results"][i]["listed_date"] = result["listed_date"].isoformat()27 28 try:29 # Save to Apify key-value store if in production30 if hasattr(Actor, 'main_kv_store'):31 Actor.main_kv_store.set_value("search_results", results_dict)32 33 # Also save the individual listings separately for easier access34 for listing in results.results:35 Actor.main_kv_store.set_value(f"listing_{listing.id}", listing.model_dump())36 else:37 # Local testing - save to a local file38 Actor.log.info("Running in local mode, saving to local file")39 os.makedirs("storage/key_value_stores", exist_ok=True)40 with open("storage/key_value_stores/search_results.json", "w") as f:41 json.dump(results_dict, f)42 except Exception as e:43 Actor.log.error(f"Error saving search results: {e}")44
45
46def load_previous_results(search_criteria: SearchCriteria) -> Optional[SearchResults]:47 """Load previous search results from Apify key-value store.48 49 Args:50 search_criteria: Current search criteria, to compare with previous search51 52 Returns:53 Previous search results, or None if no previous results or criteria changed54 """55 # Try to get previous results56 try:57 results_dict = None58 59 # Try to load from Apify KV store first60 if hasattr(Actor, 'main_kv_store'):61 results_dict = Actor.main_kv_store.get_value("search_results")62 63 # If not found or in local mode, try loading from local file64 if not results_dict and os.path.exists("storage/key_value_stores/search_results.json"):65 Actor.log.info("Loading from local file")66 with open("storage/key_value_stores/search_results.json", "r") as f:67 results_dict = json.load(f)68 69 if not results_dict:70 return None71 72 # Parse dates73 results_dict["search_date"] = datetime.fromisoformat(results_dict["search_date"])74 for i, result in enumerate(results_dict["results"]):75 if result.get("listed_date"):76 results_dict["results"][i]["listed_date"] = datetime.fromisoformat(77 result["listed_date"]78 )79 80 # Convert back to model81 previous_results = SearchResults(**results_dict)82 83 # Check if search criteria has changed84 prev_criteria = previous_results.search_criteria85 if (86 prev_criteria.location != search_criteria.location87 or prev_criteria.property_type != search_criteria.property_type88 or prev_criteria.min_bedrooms != search_criteria.min_bedrooms89 or prev_criteria.max_bedrooms != search_criteria.max_bedrooms90 or prev_criteria.min_price != search_criteria.min_price91 or prev_criteria.max_price != search_criteria.max_price92 or prev_criteria.search_type != search_criteria.search_type93 or set(prev_criteria.sources) != set(search_criteria.sources)94 # Amenities might be in different order but same content95 or set(prev_criteria.amenities) != set(search_criteria.amenities)96 ):97 # Criteria changed, don't use previous results98 return None99 100 return previous_results101 102 except Exception as e:103 Actor.log.error(f"Error loading previous results: {e}")104 return None105
106
107def mark_new_listings(108 current_results: List[PropertyListing],109 previous_results: Optional[SearchResults]110) -> List[PropertyListing]:111 """Mark new listings in current results compared to previous results.112 113 Args:114 current_results: Current property listings115 previous_results: Previous search results, or None if no previous results116 117 Returns:118 Updated current property listings with is_new flag set119 """120 if not previous_results:121 # If no previous results, all are new122 for listing in current_results:123 listing.is_new = True124 return current_results125 126 # Get IDs of previous listings127 previous_ids = {listing.id for listing in previous_results.results}128 129 # Mark new listings130 for listing in current_results:131 if listing.id not in previous_ids:132 listing.is_new = True133 134 return current_results135
136
137def push_results_to_dataset(results: SearchResults) -> None:138 """Push search results to Apify dataset.139 140 Args:141 results: Search results to push142 """143 # Convert to simple dicts for the dataset144 listings_data = []145 for listing in results.results:146 listing_dict = listing.model_dump()147 # Convert complex types to strings for better compatibility148 listing_dict["address"] = str(listing.address)149 if listing.listed_date:150 listing_dict["listed_date"] = listing.listed_date.isoformat()151 listings_data.append(listing_dict)152 153 try:154 # Push each listing as a separate item155 Actor.push_data(listings_data)156 except Exception as e:157 Actor.log.error(f"Error pushing data to dataset: {e}")158 # In local mode, save to local file159 try:160 os.makedirs("storage/datasets/default", exist_ok=True)161 with open("storage/datasets/default/results.json", "w") as f:162 json.dump(listings_data, f)163 Actor.log.info("Saved results to local file")164 except Exception as e2:165 Actor.log.error(f"Error saving to local file: {e2}")
src/models/__init__.py
1"""Models for Listing Sleuth."""
src/models/property.py
1"""Property data models for Listing Sleuth."""2
3from typing import List, Optional, Dict, Any4from pydantic import BaseModel, Field, HttpUrl5from datetime import datetime6
7
8class Address(BaseModel):9 """Model for property address."""10 11 street: Optional[str] = None12 city: str13 state: str14 zip_code: Optional[str] = None15 country: str = "United States"16 17 def __str__(self) -> str:18 """Return string representation of address."""19 parts = []20 if self.street:21 parts.append(self.street)22 parts.append(f"{self.city}, {self.state}")23 if self.zip_code:24 parts.append(self.zip_code)25 return ", ".join(parts)26
27
28class PropertyListing(BaseModel):29 """Model for property listing data."""30 31 id: str32 title: str33 description: Optional[str] = None34 price: float35 address: Address36 bedrooms: float37 bathrooms: Optional[float] = None38 square_feet: Optional[int] = None39 property_type: str40 url: HttpUrl41 source: str42 amenities: List[str] = Field(default_factory=list)43 images: List[HttpUrl] = Field(default_factory=list)44 listed_date: Optional[datetime] = None45 is_new: bool = False # Flag for new listings since last search46 features: Dict[str, Any] = Field(default_factory=dict) # Additional property features47 48 class Config:49 """Pydantic config."""50 51 extra = "ignore"52
53
54class SearchCriteria(BaseModel):55 """Model for search criteria."""56 57 location: str58 property_type: str = "any"59 min_bedrooms: int = 060 max_bedrooms: Optional[int] = None61 min_price: float = 062 max_price: Optional[float] = None63 amenities: List[str] = Field(default_factory=list)64 search_type: str = "rent"65 sources: List[str] = Field(default=["zillow", "realtor", "apartments"])66 llm_api_token: Optional[str] = None67
68
69class SearchResults(BaseModel):70 """Model for search results."""71 72 search_criteria: SearchCriteria73 results: List[PropertyListing] = Field(default_factory=list)74 total_results: int = 075 new_results: int = 076 search_date: datetime = Field(default_factory=datetime.now)77 sources_searched: List[str] = Field(default_factory=list)