Listing Sleuth avatar

Listing Sleuth

Try for free

No credit card required

Go to Store
Listing Sleuth

Listing Sleuth

onyedikachi-david/listing-sleuth
Try for free

No credit card required

An agentic real estate listing monitor that helps users find properties that match their specific criteria. This agent scrapes data from popular real estate platforms such as Zillow, Realtor.com, and Apartments.com to provide up-to-date information on available properties.

Developer
Maintained by Community

Actor Metrics

  • 0 monthly users

  • No reviews yet

  • No bookmarks yet

  • Created in Mar 2025

  • Modified 15 hours ago

.dockerignore

1.git
2.mise.toml
3.nvim.lua
4storage
5
6# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore
7
8# Byte-compiled / optimized / DLL files
9__pycache__/
10*.py[cod]
11*$py.class
12
13# C extensions
14*.so
15
16# Distribution / packaging
17.Python
18build/
19develop-eggs/
20dist/
21downloads/
22eggs/
23.eggs/
24lib/
25lib64/
26parts/
27sdist/
28var/
29wheels/
30share/python-wheels/
31*.egg-info/
32.installed.cfg
33*.egg
34MANIFEST
35
36# PyInstaller
37#  Usually these files are written by a python script from a template
38#  before PyInstaller builds the exe, so as to inject date/other infos into it.
39*.manifest
40*.spec
41
42# Installer logs
43pip-log.txt
44pip-delete-this-directory.txt
45
46# Unit test / coverage reports
47htmlcov/
48.tox/
49.nox/
50.coverage
51.coverage.*
52.cache
53nosetests.xml
54coverage.xml
55*.cover
56*.py,cover
57.hypothesis/
58.pytest_cache/
59cover/
60
61# Translations
62*.mo
63*.pot
64
65# Django stuff:
66*.log
67local_settings.py
68db.sqlite3
69db.sqlite3-journal
70
71# Flask stuff:
72instance/
73.webassets-cache
74
75# Scrapy stuff:
76.scrapy
77
78# Sphinx documentation
79docs/_build/
80
81# PyBuilder
82.pybuilder/
83target/
84
85# Jupyter Notebook
86.ipynb_checkpoints
87
88# IPython
89profile_default/
90ipython_config.py
91
92# pyenv
93#   For a library or package, you might want to ignore these files since the code is
94#   intended to run in multiple environments; otherwise, check them in:
95.python-version
96
97# pdm
98#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
99#pdm.lock
100#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
101#   in version control.
102#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
103.pdm.toml
104.pdm-python
105.pdm-build/
106
107# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
108__pypackages__/
109
110# Celery stuff
111celerybeat-schedule
112celerybeat.pid
113
114# SageMath parsed files
115*.sage.py
116
117# Environments
118.env
119.venv
120env/
121venv/
122ENV/
123env.bak/
124venv.bak/
125
126# Spyder project settings
127.spyderproject
128.spyproject
129
130# Rope project settings
131.ropeproject
132
133# mkdocs documentation
134/site
135
136# mypy
137.mypy_cache/
138.dmypy.json
139dmypy.json
140
141# Pyre type checker
142.pyre/
143
144# pytype static type analyzer
145.pytype/
146
147# Cython debug symbols
148cython_debug/
149
150# PyCharm
151#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
152#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
153#  and can be added to the global gitignore or merged into this file.  For a more nuclear
154#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
155.idea/

.gitignore

1.mise.toml
2.nvim.lua
3storage
4
5# The rest is copied from https://github.com/github/gitignore/blob/main/Python.gitignore
6
7# Byte-compiled / optimized / DLL files
8__pycache__/
9*.py[cod]
10*$py.class
11
12# C extensions
13*.so
14
15# Distribution / packaging
16.Python
17build/
18develop-eggs/
19dist/
20downloads/
21eggs/
22.eggs/
23lib/
24lib64/
25parts/
26sdist/
27var/
28wheels/
29share/python-wheels/
30*.egg-info/
31.installed.cfg
32*.egg
33MANIFEST
34
35# PyInstaller
36#  Usually these files are written by a python script from a template
37#  before PyInstaller builds the exe, so as to inject date/other infos into it.
38*.manifest
39*.spec
40
41# Installer logs
42pip-log.txt
43pip-delete-this-directory.txt
44
45# Unit test / coverage reports
46htmlcov/
47.tox/
48.nox/
49.coverage
50.coverage.*
51.cache
52nosetests.xml
53coverage.xml
54*.cover
55*.py,cover
56.hypothesis/
57.pytest_cache/
58cover/
59
60# Translations
61*.mo
62*.pot
63
64# Django stuff:
65*.log
66local_settings.py
67db.sqlite3
68db.sqlite3-journal
69
70# Flask stuff:
71instance/
72.webassets-cache
73
74# Scrapy stuff:
75.scrapy
76
77# Sphinx documentation
78docs/_build/
79
80# PyBuilder
81.pybuilder/
82target/
83
84# Jupyter Notebook
85.ipynb_checkpoints
86
87# IPython
88profile_default/
89ipython_config.py
90
91# pyenv
92#   For a library or package, you might want to ignore these files since the code is
93#   intended to run in multiple environments; otherwise, check them in:
94.python-version
95
96# pdm
97#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
98#pdm.lock
99#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
100#   in version control.
101#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
102.pdm.toml
103.pdm-python
104.pdm-build/
105
106# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
107__pypackages__/
108
109# Celery stuff
110celerybeat-schedule
111celerybeat.pid
112
113# SageMath parsed files
114*.sage.py
115
116# Environments
117.env
118.venv
119env/
120venv/
121ENV/
122env.bak/
123venv.bak/
124
125# Spyder project settings
126.spyderproject
127.spyproject
128
129# Rope project settings
130.ropeproject
131
132# mkdocs documentation
133/site
134
135# mypy
136.mypy_cache/
137.dmypy.json
138dmypy.json
139
140# Pyre type checker
141.pyre/
142
143# pytype static type analyzer
144.pytype/
145
146# Cython debug symbols
147cython_debug/
148
149# PyCharm
150#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
151#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
152#  and can be added to the global gitignore or merged into this file.  For a more nuclear
153#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
154.idea/
155
156# Added by Apify CLI
157node_modules

INPUT.json

1{
2  "location": "San Francisco, CA",
3  "propertyType": "apartment",
4  "minBedrooms": 2,
5  "maxBedrooms": 3,
6  "minPrice": 1500,
7  "maxPrice": 3000,
8  "amenities": ["parking", "gym"],
9  "searchType": "rent",
10  "sources": ["zillow", "apartments"]
11}

LICENSE

1MIT License
2
3Copyright (c) 2024 Listing Sleuth
4
5Permission is hereby granted, free of charge, to any person obtaining a copy
6of this software and associated documentation files (the "Software"), to deal
7in the Software without restriction, including without limitation the rights
8to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9copies of the Software, and to permit persons to whom the Software is
10furnished to do so, subject to the following conditions:
11
12The above copyright notice and this permission notice shall be included in all
13copies or substantial portions of the Software.
14
15THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21SOFTWARE.

requirements.txt

1apify < 3.0
2langchain-openai==0.3.6
3langgraph==0.2.73
4aiohttp>=3.8.0
5langchain>=0.1.0
6pydantic>=2.0.0
7langchain-core>=0.1.0
8langchain_community==0.3.19

.actor/Dockerfile

1# First, specify the base Docker image.
2# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
3# You can also use any other image from Docker Hub.
4FROM apify/actor-python-playwright:3.13
5
6# Install build dependencies first
7RUN apt-get update && apt-get install -y build-essential gcc g++ python3-dev
8
9# Second, copy just requirements.txt into the Actor image,
10# since it should be the only file that affects the dependency install in the next step,
11# in order to speed up the build
12COPY requirements.txt ./
13
14# Install the packages specified in requirements.txt,
15# Print the installed Python version, pip version
16# and all installed packages with their versions for debugging
17RUN echo "Python version:" \
18 && python --version \
19 && echo "Pip version:" \
20 && pip --version \
21 && echo "Installing dependencies:" \
22 && pip install --only-binary=:all: -r requirements.txt \
23 && echo "All installed Python packages:" \
24 && pip freeze
25
26# Next, copy the remaining files and directories with the source code.
27# Since we do this after installing the dependencies, quick build will be really fast
28# for most source file changes.
29COPY . ./
30
31# Use compileall to ensure the runnability of the Actor Python code.
32RUN python3 -m compileall -q .
33
34# Specify how to launch the source code of your Actor.
35# By default, the "python3 -m src" command is run
36CMD ["python3", "-m", "src"]

.actor/actor.json

1{
2	"actorSpecification": 1,
3	"name": "listing-sleuth",
4	"title": "Listing Sleuth - Real Estate Monitor",
5	"description": "Monitors real estate listings across multiple platforms based on user-specified criteria",
6	"version": "0.1",
7	"buildTag": "latest",
8	"restart": {
9		"horizontalScaling": true
10	},
11	"dockerfile": "./Dockerfile",
12	"input": "./input_schema.json",
13	"storages": {
14		"dataset": "./dataset_schema.json"
15	},
16	"license": "MIT",
17	"monetization": {
18		"type": "pay-per-event",
19		"enabled": true,
20		"priceSchemaPath": "./pay_per_event.json"
21	}
22}

.actor/dataset_schema.json

1{
2    "actorSpecification": 1,
3    "fields": {
4        "type": "object",
5        "properties": {
6            "id": {
7                "type": "string",
8                "description": "Unique identifier for the property listing"
9            },
10            "title": {
11                "type": "string",
12                "description": "Property title or name"
13            },
14            "description": {
15                "type": "string",
16                "description": "Detailed description of the property"
17            },
18            "price": {
19                "type": "number",
20                "description": "Price of the property (in USD)"
21            },
22            "bedrooms": {
23                "type": "number",
24                "description": "Number of bedrooms"
25            },
26            "bathrooms": {
27                "type": "number",
28                "description": "Number of bathrooms"
29            },
30            "address": {
31                "type": "string",
32                "description": "Property address"
33            },
34            "property_type": {
35                "type": "string",
36                "description": "Type of property (apartment, house, condo, etc.)"
37            },
38            "source": {
39                "type": "string",
40                "description": "Source of the listing (zillow, realtor, apartments, etc.)"
41            },
42            "url": {
43                "type": "string",
44                "description": "Link to the original listing"
45            },
46            "amenities": {
47                "type": "array",
48                "description": "List of amenities available at the property",
49                "items": {
50                    "type": "string"
51                }
52            },
53            "listed_date": {
54                "type": "string",
55                "description": "Date when the property was listed"
56            },
57            "is_new": {
58                "type": "boolean",
59                "description": "Whether this is a new listing since last search"
60            }
61        }
62    },
63    "views": {
64        "overview": {
65            "title": "Property Listings",
66            "description": "Real estate property listings matching the search criteria",
67            "transformation": {
68                "fields": [
69                    "id",
70                    "title",
71                    "price",
72                    "bedrooms",
73                    "bathrooms",
74                    "address",
75                    "property_type",
76                    "source",
77                    "url",
78                    "listed_date",
79                    "is_new"
80                ]
81            },
82            "display": {
83                "component": "table",
84                "properties": {
85                    "id": {
86                        "label": "ID",
87                        "format": "text"
88                    },
89                    "title": {
90                        "label": "Title",
91                        "format": "text"
92                    },
93                    "price": {
94                        "label": "Price",
95                        "format": "number"
96                    },
97                    "bedrooms": {
98                        "label": "Bedrooms",
99                        "format": "number"
100                    },
101                    "bathrooms": {
102                        "label": "Bathrooms",
103                        "format": "number"
104                    },
105                    "address": {
106                        "label": "Address",
107                        "format": "text"
108                    },
109                    "property_type": {
110                        "label": "Property Type",
111                        "format": "text"
112                    },
113                    "source": {
114                        "label": "Source",
115                        "format": "text"
116                    },
117                    "url": {
118                        "label": "URL",
119                        "format": "link"
120                    },
121                    "listed_date": {
122                        "label": "Listed Date",
123                        "format": "date"
124                    },
125                    "is_new": {
126                        "label": "New Listing",
127                        "format": "boolean"
128                    }
129                }
130            }
131        },
132        "details": {
133            "title": "Detailed View",
134            "description": "Detailed information about property listings",
135            "transformation": {
136                "fields": [
137                    "id",
138                    "title",
139                    "description",
140                    "price",
141                    "bedrooms",
142                    "bathrooms",
143                    "address",
144                    "property_type",
145                    "source",
146                    "url",
147                    "amenities",
148                    "listed_date",
149                    "is_new"
150                ]
151            },
152            "display": {
153                "component": "table",
154                "properties": {
155                    "description": {
156                        "label": "Description",
157                        "format": "text"
158                    },
159                    "amenities": {
160                        "label": "Amenities",
161                        "format": "array"
162                    }
163                }
164            }
165        }
166    }
167}

.actor/input_schema.json

1{
2    "title": "Listing Sleuth - Real Estate Monitor",
3    "type": "object",
4    "schemaVersion": 1,
5    "properties": {
6        "location": {
7            "title": "Location",
8            "type": "string",
9            "description": "City or neighborhood to search in (e.g., 'San Francisco, CA')",
10            "editor": "textfield"
11        },
12        "propertyType": {
13            "title": "Property Type",
14            "type": "string",
15            "description": "Type of property to look for",
16            "enum": ["apartment", "house", "condo", "townhouse", "any"],
17            "enumTitles": ["Apartment", "House", "Condo", "Townhouse", "Any"],
18            "default": "any",
19            "editor": "select"
20        },
21        "minBedrooms": {
22            "title": "Minimum Bedrooms",
23            "type": "integer",
24            "description": "Minimum number of bedrooms",
25            "default": 1,
26            "minimum": 0,
27            "editor": "number"
28        },
29        "maxBedrooms": {
30            "title": "Maximum Bedrooms",
31            "type": "integer",
32            "description": "Maximum number of bedrooms (leave blank for no maximum)",
33            "minimum": 0,
34            "nullable": true,
35            "editor": "number"
36        },
37        "minPrice": {
38            "title": "Minimum Price",
39            "type": "integer",
40            "description": "Minimum price (in USD)",
41            "default": 0,
42            "minimum": 0,
43            "editor": "number"
44        },
45        "maxPrice": {
46            "title": "Maximum Price",
47            "type": "integer",
48            "description": "Maximum price (in USD)",
49            "minimum": 0,
50            "nullable": true,
51            "editor": "number"
52        },
53        "amenities": {
54            "title": "Amenities",
55            "type": "array",
56            "description": "Desired amenities for the property",
57            "editor": "stringList",
58            "default": []
59        },
60        "searchType": {
61            "title": "Search Type",
62            "type": "string",
63            "description": "Type of search to perform",
64            "enum": ["rent", "buy"],
65            "enumTitles": ["Rent", "Buy"],
66            "default": "rent",
67            "editor": "select"
68        },
69        "sources": {
70            "title": "Data Sources",
71            "type": "array",
72            "description": "Sources to search for listings",
73            "editor": "stringList",
74            "default": ["zillow", "realtor", "apartments"]
75        },
76        "llmApiToken": {
77            "title": "LLM API Token",
78            "type": "string",
79            "description": "OpenAI API token for processing results (optional)",
80            "editor": "textfield",
81            "nullable": true
82        }
83    },
84    "required": ["location"]
85}

.actor/pay_per_event.json

1{
2    "actor-start": {
3        "eventTitle": "Search Initiated",
4        "eventDescription": "Flat fee for starting a real estate search.",
5        "eventPriceUsd": 0.1
6    },
7    "property-found": {
8        "eventTitle": "Property Found",
9        "eventDescription": "Fee for each property matching your criteria.",
10        "eventPriceUsd": 0.05
11    },
12    "search-completed": {
13        "eventTitle": "Search Completed",
14        "eventDescription": "Fee for completing a full property search across all selected platforms.",
15        "eventPriceUsd": 0.3
16    }
17}

src/__init__.py

src/__main__.py

1import asyncio
2
3from .main import main
4
5# Execute the Actor entry point.
6asyncio.run(main())

src/main.py

1"""Main entry point for the Listing Sleuth Apify Actor.
2
3This module contains the main entry point for the Actor, which searches for real estate
4listings based on user-specified criteria.
5"""
6
7import os
8import sys
9import json
10from apify import Actor
11from dotenv import load_dotenv
12
13from .models.property import SearchCriteria
14from .search_agent import SearchAgentCrew
15
16# Load environment variables from .env file if present
17load_dotenv()
18
19
20async def main() -> None:
21    """Main entry point for the Apify Actor.
22    
23    This function initializes the Actor, processes input data, runs the search agent,
24    and saves the results to the Actor's dataset.
25    """
26    # Enter the context of the Actor.
27    async with Actor:
28        # Log the Actor's version
29        Actor.log.info(f"Listing Sleuth is starting...")
30        
31        # Charge for actor start
32        await Actor.charge('actor-start')
33        
34        # Retrieve the Actor input, and use default values if not provided.
35        actor_input = await Actor.get_input() or {}
36        
37        # For local testing, try to load from INPUT.json if actor_input is empty
38        if not actor_input or 'location' not in actor_input:
39            try:
40                if os.path.exists('INPUT.json'):
41                    with open('INPUT.json', 'r') as f:
42                        actor_input = json.load(f)
43                    Actor.log.info(f"Loaded input from INPUT.json: {actor_input}")
44            except Exception as e:
45                Actor.log.error(f"Error loading from INPUT.json: {str(e)}")
46        
47        Actor.log.info(f"Using input: {actor_input}")
48        
49        # Parse location (required)
50        location = actor_input.get("location")
51        if not location:
52            Actor.log.error("No location specified in Actor input, exiting...")
53            # Just exit with an error code
54            sys.exit(1)
55        
56        # Parse other inputs with defaults
57        property_type = actor_input.get("propertyType", "any")
58        min_bedrooms = int(actor_input.get("minBedrooms", 1))
59        max_bedrooms = actor_input.get("maxBedrooms")
60        if max_bedrooms is not None:
61            max_bedrooms = int(max_bedrooms)
62        
63        min_price = float(actor_input.get("minPrice", 0))
64        max_price = actor_input.get("maxPrice")
65        if max_price is not None:
66            max_price = float(max_price)
67        
68        # Amenities as a list
69        amenities = actor_input.get("amenities", [])
70        
71        # Search type (rent/buy)
72        search_type = actor_input.get("searchType", "rent")
73        
74        # Data sources to search
75        sources = actor_input.get("sources", ["zillow", "realtor", "apartments"])
76        
77        # LLM API token (optional)
78        llm_api_token = actor_input.get("llmApiToken") or os.environ.get("OPENAI_API_KEY")
79        
80        # Create search criteria
81        search_criteria = SearchCriteria(
82            location=location,
83            property_type=property_type,
84            min_bedrooms=min_bedrooms,
85            max_bedrooms=max_bedrooms,
86            min_price=min_price,
87            max_price=max_price,
88            amenities=amenities,
89            search_type=search_type,
90            sources=sources,
91            llm_api_token=llm_api_token
92        )
93        
94        Actor.log.info(f"Search criteria: {search_criteria}")
95        
96        # Create and run the search agent
97        search_agent = SearchAgentCrew(search_criteria)
98        results = search_agent.run()
99        
100        # Charge for each property found
101        if results.total_results > 0:
102            await Actor.charge('property-found', count=results.total_results)
103        
104        # Log results
105        Actor.log.info(f"Search complete. Found {results.total_results} properties.")
106        Actor.log.info(f"New listings: {results.new_results}")
107        
108        # Charge for search completion
109        await Actor.charge('search-completed')
110        
111        # The results have already been saved to the dataset by the search agent

src/py.typed

1

src/search_agent.py

1"""Search agent for real estate properties."""
2
3import os
4import json
5from typing import List, Dict, Any, Optional, Tuple
6from datetime import datetime
7from crewai import Agent, Task, Crew
8from langchain.tools import BaseTool
9from langchain_openai import ChatOpenAI
10from apify import Actor
11
12from .models.property import PropertyListing, SearchCriteria, SearchResults
13from .scrapers.zillow import ZillowScraper
14from .scrapers.realtor import RealtorScraper
15from .scrapers.apartments import ApartmentsScraper
16from .utils.llm import filter_properties_with_llm, summarize_property
17from .utils.storage import (
18    load_previous_results,
19    mark_new_listings,
20    save_search_results,
21    push_results_to_dataset
22)
23
24
25class SearchTool(BaseTool):
26    """Tool for searching real estate listings."""
27    
28    name = "search_real_estate"
29    description = "Search for real estate listings based on search criteria"
30    search_criteria: SearchCriteria = None
31    
32    def __init__(self, search_criteria: SearchCriteria):
33        """Initialize the search tool.
34        
35        Args:
36            search_criteria: Search criteria
37        """
38        super().__init__()
39        self.search_criteria = search_criteria
40    
41    def _run(self, query: str) -> Dict[str, Any]:
42        """Run the search tool.
43        
44        Args:
45            query: Search query (not used, but required by BaseTool)
46            
47        Returns:
48            Search results
49        """
50        # Initialize scrapers
51        scrapers = []
52        if "zillow" in self.search_criteria.sources:
53            scrapers.append(ZillowScraper(self.search_criteria))
54        if "realtor" in self.search_criteria.sources:
55            scrapers.append(RealtorScraper(self.search_criteria))
56        if "apartments" in self.search_criteria.sources:
57            scrapers.append(ApartmentsScraper(self.search_criteria))
58        
59        # Run scrapers
60        all_listings = []
61        sources_searched = []
62        
63        for scraper in scrapers:
64            try:
65                listings = scraper.scrape()
66                all_listings.extend(listings)
67                sources_searched.append(scraper.source_name)
68            except Exception as e:
69                Actor.log.exception(f"Error scraping {scraper.source_name}: {e}")
70        
71        # Load previous results
72        previous_results = load_previous_results(self.search_criteria)
73        
74        # Mark new listings
75        marked_listings = mark_new_listings(all_listings, previous_results)
76        
77        # Create search results
78        results = SearchResults(
79            search_criteria=self.search_criteria,
80            results=marked_listings,
81            total_results=len(marked_listings),
82            new_results=sum(1 for listing in marked_listings if listing.is_new),
83            sources_searched=sources_searched
84        )
85        
86        # Save results
87        save_search_results(results)
88        push_results_to_dataset(results)
89        
90        # Return results
91        return {
92            "total_results": results.total_results,
93            "new_results": results.new_results,
94            "sources_searched": results.sources_searched,
95            "search_date": results.search_date.isoformat()
96        }
97    
98    async def _arun(self, query: str) -> Dict[str, Any]:
99        """Async version of _run.
100        
101        Args:
102            query: Search query
103            
104        Returns:
105            Search results
106        """
107        return self._run(query)
108
109
110class FilterTool(BaseTool):
111    """Tool for filtering property listings with LLM."""
112    
113    name = "filter_properties"
114    description = "Filter property listings based on search criteria using LLM"
115    search_criteria: SearchCriteria = None
116    
117    def __init__(self, search_criteria: SearchCriteria):
118        """Initialize the filter tool.
119        
120        Args:
121            search_criteria: Search criteria
122        """
123        super().__init__()
124        self.search_criteria = search_criteria
125    
126    def _run(self, query: str) -> Dict[str, Any]:
127        """Run the filter tool.
128        
129        Args:
130            query: Filter query (not used, but required by BaseTool)
131            
132        Returns:
133            Filtered search results
134        """
135        # Try to load saved results
136        try:
137            results_dict = None
138            
139            # Try to load from Apify KV store if available
140            if hasattr(Actor, 'main_kv_store'):
141                results_dict = Actor.main_kv_store.get_value("search_results")
142            # Otherwise try to load from local file
143            elif os.path.exists("storage/key_value_stores/search_results.json"):
144                with open("storage/key_value_stores/search_results.json", "r") as f:
145                    results_dict = json.load(f)
146            
147            if not results_dict:
148                return {"error": "No search results found"}
149            
150            # Convert to SearchResults
151            search_results = SearchResults(**results_dict)
152            
153            if not search_results.results:
154                return {"error": "No results to filter"}
155            
156            # Filter results with LLM if token is available
157            if self.search_criteria.llm_api_token:
158                filtered_listings = filter_properties_with_llm(
159                    search_results.results,
160                    self.search_criteria,
161                    self.search_criteria.llm_api_token
162                )
163                
164                # Update results
165                search_results.results = filtered_listings
166                search_results.total_results = len(filtered_listings)
167                
168                # Save filtered results
169                save_search_results(search_results)
170                
171                return {
172                    "total_results_after_filtering": len(filtered_listings),
173                    "filter_date": datetime.now().isoformat()
174                }
175            else:
176                return {"error": "No LLM API token provided for filtering"}
177        
178        except Exception as e:
179            Actor.log.exception(f"Error filtering properties: {e}")
180            return {"error": str(e)}
181    
182    async def _arun(self, query: str) -> Dict[str, Any]:
183        """Async version of _run.
184        
185        Args:
186            query: Filter query
187            
188        Returns:
189            Filtered search results
190        """
191        return self._run(query)
192
193
194class SummarizeTool(BaseTool):
195    """Tool for summarizing property listings."""
196    
197    name = "summarize_properties"
198    description = "Generate summaries of property listings"
199    search_criteria: SearchCriteria = None
200    
201    def __init__(self, search_criteria: SearchCriteria):
202        """Initialize the summarize tool.
203        
204        Args:
205            search_criteria: Search criteria
206        """
207        super().__init__()
208        self.search_criteria = search_criteria
209    
210    def _run(self, query: str) -> Dict[str, Any]:
211        """Run the summarize tool.
212        
213        Args:
214            query: Summarize query (not used, but required by BaseTool)
215            
216        Returns:
217            Summarized search results
218        """
219        # Try to load saved results
220        try:
221            results_dict = None
222            
223            # Try to load from Apify KV store if available
224            if hasattr(Actor, 'main_kv_store'):
225                results_dict = Actor.main_kv_store.get_value("search_results")
226            # Otherwise try to load from local file
227            elif os.path.exists("storage/key_value_stores/search_results.json"):
228                with open("storage/key_value_stores/search_results.json", "r") as f:
229                    results_dict = json.load(f)
230            
231            if not results_dict:
232                return {"error": "No search results found"}
233            
234            # Convert to SearchResults
235            search_results = SearchResults(**results_dict)
236            
237            if not search_results.results:
238                return {"error": "No results to summarize"}
239            
240            # Generate summaries if LLM API token is available
241            if self.search_criteria.llm_api_token:
242                summaries = []
243                
244                for listing in search_results.results:
245                    summary = summarize_property(listing, self.search_criteria.llm_api_token)
246                    summaries.append({
247                        "id": listing.id,
248                        "summary": summary,
249                        "is_new": listing.is_new
250                    })
251                
252                return {
253                    "summaries": summaries,
254                    "total_summaries": len(summaries),
255                    "summarize_date": datetime.now().isoformat()
256                }
257            else:
258                # Generate basic summaries without LLM
259                summaries = []
260                
261                for listing in search_results.results:
262                    basic_summary = (
263                        f"{listing.title}: {listing.bedrooms} bed, "
264                        f"{listing.bathrooms or 'unknown'} bath {listing.property_type} "
265                        f"for ${listing.price:,.2f} in {listing.address.city}, "
266                        f"{listing.address.state}."
267                    )
268                    
269                    summaries.append({
270                        "id": listing.id,
271                        "summary": basic_summary,
272                        "is_new": listing.is_new
273                    })
274                
275                return {
276                    "summaries": summaries,
277                    "total_summaries": len(summaries),
278                    "summarize_date": datetime.now().isoformat()
279                }
280        
281        except Exception as e:
282            Actor.log.exception(f"Error summarizing properties: {e}")
283            return {"error": str(e)}
284    
285    async def _arun(self, query: str) -> Dict[str, Any]:
286        """Async version of _run.
287        
288        Args:
289            query: Summarize query
290            
291        Returns:
292            Summarized search results
293        """
294        return self._run(query)
295
296
297class SearchAgentCrew:
298    """Crew of agents for property search."""
299    
300    def __init__(self, search_criteria: SearchCriteria):
301        """Initialize the search agent crew.
302        
303        Args:
304            search_criteria: Search criteria
305        """
306        self.search_criteria = search_criteria
307        self.llm = None
308        
309        # Initialize LLM if token is provided
310        if search_criteria.llm_api_token:
311            self.llm = ChatOpenAI(
312                api_key=search_criteria.llm_api_token,
313                temperature=0,
314                model="gpt-3.5-turbo"
315            )
316    
317    def run(self) -> SearchResults:
318        """Run the search agent crew.
319        
320        Returns:
321            Search results
322        """
323        # If no LLM, just run the search directly
324        if not self.llm:
325            Actor.log.info("No LLM API token provided, running basic search without agents")
326            search_tool = SearchTool(self.search_criteria)
327            search_tool._run("")
328            
329            # Load and return results
330            try:
331                # Try loading from Apify KV store if available
332                if hasattr(Actor, 'main_kv_store'):
333                    results_dict = Actor.main_kv_store.get_value("search_results")
334                # Otherwise try to load from local file
335                elif os.path.exists("storage/key_value_stores/search_results.json"):
336                    with open("storage/key_value_stores/search_results.json", "r") as f:
337                        results_dict = json.load(f)
338                else:
339                    results_dict = None
340                
341                if results_dict:
342                    return SearchResults(**results_dict)
343            except Exception as e:
344                Actor.log.error(f"Error loading search results: {e}")
345            
346            # Create empty results if loading failed
347            return SearchResults(
348                search_criteria=self.search_criteria,
349                results=[],
350                total_results=0,
351                new_results=0,
352                sources_searched=[]
353            )
354        
355        # Create tools
356        search_tool = SearchTool(self.search_criteria)
357        filter_tool = FilterTool(self.search_criteria)
358        summarize_tool = SummarizeTool(self.search_criteria)
359        
360        # Create agents
361        search_agent = Agent(
362            role="Real Estate Search Specialist",
363            goal="Find properties that match the search criteria",
364            backstory="You are an expert in finding real estate listings across multiple platforms.",
365            verbose=True,
366            allow_delegation=True,
367            tools=[search_tool],
368            llm=self.llm
369        )
370        
371        filter_agent = Agent(
372            role="Property Filter Specialist",
373            goal="Filter properties to find the best matches for the user",
374            backstory="You are an expert in analyzing property details and matching them with user preferences.",
375            verbose=True,
376            allow_delegation=True,
377            tools=[filter_tool],
378            llm=self.llm
379        )
380        
381        summarize_agent = Agent(
382            role="Property Summarizer",
383            goal="Create concise, informative summaries of properties",
384            backstory="You are skilled at creating appealing property descriptions that highlight key features.",
385            verbose=True,
386            allow_delegation=True,
387            tools=[summarize_tool],
388            llm=self.llm
389        )
390        
391        # Create tasks
392        search_task = Task(
393            description=(
394                f"Search for properties in {self.search_criteria.location} "
395                f"with {self.search_criteria.min_bedrooms}+ bedrooms, "
396                f"maximum price of ${self.search_criteria.max_price or 'any'}, "
397                f"property type: {self.search_criteria.property_type}. "
398                f"Search sources: {', '.join(self.search_criteria.sources)}."
399            ),
400            agent=search_agent,
401            expected_output="A report of the total number of properties found"
402        )
403        
404        filter_task = Task(
405            description=(
406                "Filter the search results to find properties that best match "
407                f"the user's criteria, especially regarding amenities: {', '.join(self.search_criteria.amenities)}"
408            ),
409            agent=filter_agent,
410            expected_output="A report of how many properties passed the filtering"
411        )
412        
413        summarize_task = Task(
414            description=(
415                "Create summaries for each property highlighting key features. "
416                "Mark new listings that weren't found in previous searches."
417            ),
418            agent=summarize_agent,
419            expected_output="Summaries of each property"
420        )
421        
422        # Create crew
423        crew = Crew(
424            agents=[search_agent, filter_agent, summarize_agent],
425            tasks=[search_task, filter_task, summarize_task],
426            verbose=True
427        )
428        
429        # Run the crew
430        try:
431            result = crew.kickoff()
432            
433            # Load and return results
434            try:
435                # Try loading from Apify KV store if available
436                if hasattr(Actor, 'main_kv_store'):
437                    results_dict = Actor.main_kv_store.get_value("search_results")
438                # Otherwise try to load from local file
439                elif os.path.exists("storage/key_value_stores/search_results.json"):
440                    with open("storage/key_value_stores/search_results.json", "r") as f:
441                        results_dict = json.load(f)
442                else:
443                    results_dict = None
444                
445                if results_dict:
446                    return SearchResults(**results_dict)
447            except Exception as e:
448                Actor.log.error(f"Error loading search results: {e}")
449        except Exception as e:
450            Actor.log.error(f"Error running crew: {e}")
451            
452        # If we got here, either there was an error or no results were found
453        # Create empty results
454        return SearchResults(
455            search_criteria=self.search_criteria,
456            results=[],
457            total_results=0,
458            new_results=0,
459            sources_searched=[]
460        )

src/agents/__init__.py

1"""Agent classes for Listing Sleuth."""

src/scrapers/__init__.py

1"""Scrapers for real estate platforms."""

src/scrapers/apartments.py

1"""Apartments.com scraper."""
2
3import re
4import json
5import uuid
6from typing import Dict, Any, List, Optional
7from datetime import datetime
8from pydantic import HttpUrl
9
10from apify import Actor
11from apify_client import ApifyClient
12
13from .base import BaseScraper
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class ApartmentsScraper(BaseScraper):
18    """Apartments.com scraper."""
19    
20    @property
21    def actor_id(self) -> str:
22        """Get Apify actor ID for Apartments.com.
23        
24        Returns:
25            Actor ID
26        """
27        return "epctex/apartments-scraper"
28    
29    @property
30    def source_name(self) -> str:
31        """Get source name.
32        
33        Returns:
34            Source name
35        """
36        return "apartments"
37    
38    def prepare_input(self) -> Dict[str, Any]:
39        """Prepare input for the Apartments.com scraper.
40        
41        Returns:
42            Actor input
43        """
44        # Parse location into city and state
45        location_parts = self.search_criteria.location.split(",")
46        city = location_parts[0].strip().replace(" ", "-").lower()
47        state = ""
48        if len(location_parts) > 1:
49            state = location_parts[1].strip().lower()
50        
51        # Construct location for URL
52        if state:
53            location_url = f"{city}-{state}"
54        else:
55            location_url = city
56        
57        # Base URL
58        base_url = f"https://www.apartments.com/{location_url}"
59        
60        # Start building search parameters
61        search_params = {}
62        
63        # Bedrooms filter
64        if self.search_criteria.min_bedrooms > 0 and self.search_criteria.max_bedrooms:
65            if self.search_criteria.min_bedrooms == self.search_criteria.max_bedrooms:
66                search_params["br"] = str(self.search_criteria.min_bedrooms)
67            else:
68                search_params["br-min"] = str(self.search_criteria.min_bedrooms)
69                search_params["br-max"] = str(self.search_criteria.max_bedrooms)
70        elif self.search_criteria.min_bedrooms > 0:
71            search_params["br-min"] = str(self.search_criteria.min_bedrooms)
72        elif self.search_criteria.max_bedrooms:
73            search_params["br-max"] = str(self.search_criteria.max_bedrooms)
74        
75        # Price filter
76        if self.search_criteria.min_price > 0:
77            search_params["price-min"] = str(int(self.search_criteria.min_price))
78        if self.search_criteria.max_price:
79            search_params["price-max"] = str(int(self.search_criteria.max_price))
80        
81        # Property type - apartments.com primarily focuses on apartments, but can filter for types
82        if self.search_criteria.property_type != "any" and self.search_criteria.property_type != "apartment":
83            search_params["type"] = self.search_criteria.property_type
84        
85        return {
86            "startUrls": [{"url": base_url}],
87            "searchParams": search_params,
88            "maxItems": self.max_items,
89            "extendOutputFunction": """async ({ data, item, customData, Apify }) => {
90                return { ...item };
91            }""",
92            "proxy": {
93                "useApifyProxy": True,
94                "apifyProxyGroups": ["RESIDENTIAL"]
95            }
96        }
97    
98    def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
99        """Transform an Apartments.com listing to a PropertyListing.
100        
101        Args:
102            item: Apartments.com listing
103            
104        Returns:
105            PropertyListing
106        """
107        # Parse price
108        price_str = item.get("rent", "0")
109        if isinstance(price_str, str):
110            # Extract digits from price string
111            price_match = re.search(r'(\d{1,3}(?:,\d{3})*(?:\.\d+)?)', price_str)
112            if price_match:
113                price_clean = price_match.group(1).replace(",", "")
114                price = float(price_clean)
115            else:
116                price = 0
117        else:
118            price = float(price_str) if price_str else 0
119        
120        # Parse address
121        property_address = item.get("propertyAddress", {})
122        address_line = property_address.get("addressLine", "")
123        neighborhood = property_address.get("neighborhood", "")
124        city = property_address.get("city", "")
125        state = property_address.get("state", "")
126        postal_code = property_address.get("postalCode", None)
127        
128        address = Address(
129            street=address_line,
130            city=city or neighborhood,  # Use neighborhood if city is missing
131            state=state,
132            zip_code=postal_code
133        )
134        
135        # Parse bedrooms
136        bedrooms = 0
137        beds = item.get("beds", 0)
138        if isinstance(beds, str):
139            bed_match = re.search(r'(\d+\.?\d*)', beds)
140            bedrooms = float(bed_match.group(1)) if bed_match else 0
141        else:
142            bedrooms = float(beds) if beds else 0
143        
144        # Parse bathrooms
145        bathrooms = None
146        baths = item.get("baths", None)
147        if baths:
148            if isinstance(baths, str):
149                bath_match = re.search(r'(\d+\.?\d*)', baths)
150                bathrooms = float(bath_match.group(1)) if bath_match else None
151            else:
152                bathrooms = float(baths)
153        
154        # Parse square feet
155        sqft = None
156        sqft_str = item.get("sqft", None)
157        if sqft_str:
158            if isinstance(sqft_str, str):
159                sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))
160                sqft = int(sqft_match.group(1)) if sqft_match else None
161            else:
162                sqft = int(sqft_str)
163        
164        # Determine property type
165        property_type = "apartment"  # Default for apartments.com
166        if "condo" in item.get("title", "").lower() or "condo" in item.get("description", "").lower():
167            property_type = "condo"
168        elif "townhouse" in item.get("title", "").lower() or "townhouse" in item.get("description", "").lower():
169            property_type = "townhouse"
170        elif "house" in item.get("title", "").lower() and "townhouse" not in item.get("title", "").lower():
171            property_type = "house"
172        
173        # Get URL
174        url = item.get("url", "")
175        
176        # Get images
177        images = []
178        photos = item.get("photos", [])
179        if isinstance(photos, list):
180            for photo in photos:
181                if isinstance(photo, dict) and "url" in photo:
182                    images.append(photo["url"])
183                elif isinstance(photo, str) and photo.startswith("http"):
184                    images.append(photo)
185        
186        # Extract amenities
187        amenities = []
188        
189        # Add apartment amenities
190        apartment_amenities = item.get("apartmentAmenities", [])
191        if isinstance(apartment_amenities, list):
192            amenities.extend(apartment_amenities)
193        
194        # Add community amenities
195        community_amenities = item.get("communityAmenities", [])
196        if isinstance(community_amenities, list):
197            amenities.extend(community_amenities)
198        
199        # Also use the base extract_amenities method to catch any missed ones
200        amenities.extend(self.extract_amenities(item))
201        
202        # Remove duplicates while preserving order
203        amenities = list(dict.fromkeys(amenities))
204        
205        # Generate a unique ID
206        property_id = str(item.get("id", uuid.uuid4()))
207        
208        # Create features dictionary for additional data
209        additional_features = {}
210        for key, value in item.items():
211            if key not in [
212                "rent", "propertyAddress", "beds", "baths", "sqft", "url", "photos",
213                "apartmentAmenities", "communityAmenities", "id", "title", "description",
214            ]:
215                additional_features[key] = value
216        
217        # Parse listing date if available
218        listed_date = None
219        date_str = item.get("dateAvailable", item.get("datePosted", None))
220        if date_str and isinstance(date_str, str):
221            try:
222                # Try common date formats
223                for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%b %d, %Y"]:
224                    try:
225                        listed_date = datetime.strptime(date_str, fmt)
226                        break
227                    except ValueError:
228                        continue
229            except Exception:
230                pass
231        
232        return PropertyListing(
233            id=property_id,
234            title=item.get("title", "Property Listing"),
235            description=item.get("description", None),
236            price=price,
237            address=address,
238            bedrooms=bedrooms,
239            bathrooms=bathrooms,
240            square_feet=sqft,
241            property_type=property_type,
242            url=url,
243            source="apartments",
244            amenities=amenities,
245            images=images,
246            listed_date=listed_date,
247            features=additional_features
248        )

src/scrapers/base.py

1"""Base scraper class for all real estate platform scrapers."""
2
3import re
4import json
5import uuid
6import os
7from abc import ABC, abstractmethod
8from typing import List, Dict, Any, Optional
9from datetime import datetime
10from apify import Actor
11from apify_client import ApifyClient
12from pydantic import HttpUrl
13
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class BaseScraper(ABC):
18    """Base scraper class that all platform-specific scrapers should inherit from."""
19    
20    def __init__(
21        self,
22        search_criteria: SearchCriteria,
23        apify_client: Optional[ApifyClient] = None,
24        max_items: int = 100
25    ):
26        """Initialize the scraper.
27        
28        Args:
29            search_criteria: Search criteria
30            apify_client: Apify client. If None, creates a new client
31            max_items: Maximum number of items to scrape
32        """
33        self.search_criteria = search_criteria
34        self.apify_client = apify_client or ApifyClient()
35        self.max_items = max_items
36        
37    @property
38    @abstractmethod
39    def actor_id(self) -> str:
40        """Apify actor ID for the scraper.
41        
42        Returns:
43            Actor ID
44        """
45        pass
46    
47    @property
48    @abstractmethod
49    def source_name(self) -> str:
50        """Name of the source.
51        
52        Returns:
53            Source name
54        """
55        pass
56    
57    @abstractmethod
58    def prepare_input(self) -> Dict[str, Any]:
59        """Prepare input for the Apify actor.
60        
61        Returns:
62            Actor input
63        """
64        pass
65    
66    @abstractmethod
67    def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
68        """Transform a scraped item into a PropertyListing.
69        
70        Args:
71            item: Scraped item
72            
73        Returns:
74            PropertyListing
75        """
76        pass
77    
78    def parse_address(self, address_str: str) -> Address:
79        """Parse address string into Address model.
80        
81        Args:
82            address_str: Address string
83            
84        Returns:
85            Address
86        """
87        # Default implementation with simple parsing
88        # Subclasses can override for platform-specific parsing
89        address_parts = address_str.split(",")
90        
91        if len(address_parts) >= 3:
92            street = address_parts[0].strip()
93            city = address_parts[1].strip()
94            state_zip = address_parts[2].strip().split()
95            state = state_zip[0].strip() if state_zip else ""
96            zip_code = state_zip[1].strip() if len(state_zip) > 1 else None
97        elif len(address_parts) == 2:
98            street = None
99            city = address_parts[0].strip()
100            state_zip = address_parts[1].strip().split()
101            state = state_zip[0].strip() if state_zip else ""
102            zip_code = state_zip[1].strip() if len(state_zip) > 1 else None
103        else:
104            # If we can't parse the address properly, use a minimal approach
105            street = None
106            # Try to extract a known state abbreviation
107            state_match = re.search(r'\b([A-Z]{2})\b', address_str)
108            if state_match:
109                state = state_match.group(1)
110                # Assume the city is before the state
111                city_match = re.search(r'([^,]+),\s*' + state, address_str)
112                city = city_match.group(1) if city_match else address_str
113            else:
114                # If we can't extract a state, use the whole string as city
115                city = address_str
116                state = ""
117            zip_code = None
118        
119        return Address(
120            street=street,
121            city=city,
122            state=state,
123            zip_code=zip_code
124        )
125    
126    def extract_amenities(self, item: Dict[str, Any]) -> List[str]:
127        """Extract amenities from a scraped item.
128        
129        Args:
130            item: Scraped item
131            
132        Returns:
133            List of amenities
134        """
135        # Default implementation that subclasses can override
136        amenities = []
137        
138        # Look for amenities in features or amenities field
139        if "amenities" in item and isinstance(item["amenities"], list):
140            amenities.extend(item["amenities"])
141        
142        if "features" in item and isinstance(item["features"], list):
143            amenities.extend(item["features"])
144        
145        # Look for amenities in description
146        if "description" in item and isinstance(item["description"], str):
147            # Common amenities to look for in descriptions
148            common_amenities = [
149                "parking", "garage", "gym", "fitness", "pool", "washer", "dryer", 
150                "dishwasher", "air conditioning", "ac", "balcony", "patio", 
151                "hardwood", "fireplace", "wheelchair", "elevator", "pet friendly"
152            ]
153            
154            description = item["description"].lower()
155            for amenity in common_amenities:
156                if amenity in description and amenity not in amenities:
157                    amenities.append(amenity)
158        
159        return amenities
160    
161    def scrape(self) -> List[PropertyListing]:
162        """Scrape properties based on search criteria.
163        
164        Returns:
165            List of property listings
166        """
167        Actor.log.info(f"Starting {self.source_name} scraper")
168        
169        # Prepare input for the Apify actor
170        input_data = self.prepare_input()
171        
172        # Check if we're running in local mode for testing
173        if os.environ.get("ACTOR_TEST_PAY_PER_EVENT") == "true" and not os.environ.get("APIFY_TOKEN"):
174            Actor.log.info(f"Running in local test mode, using mock data for {self.source_name}")
175            return self.get_mock_listings()
176        
177        Actor.log.info(f"Running Apify actor {self.actor_id} with input: {input_data}")
178        
179        try:
180            # Run the actor
181            run = self.apify_client.actor(self.actor_id).call(
182                run_input=input_data,
183                build="latest"
184            )
185            
186            # Get the dataset
187            dataset_id = run["defaultDatasetId"]
188            items = self.apify_client.dataset(dataset_id).list_items(limit=self.max_items).items
189            
190            Actor.log.info(f"Scraped {len(items)} items from {self.source_name}")
191            
192            # Transform items to PropertyListings
193            listings = []
194            for item in items:
195                try:
196                    listing = self.transform_item(item)
197                    listings.append(listing)
198                except Exception as e:
199                    Actor.log.exception(f"Error transforming item: {e}")
200                    continue
201            
202            Actor.log.info(f"Transformed {len(listings)} listings from {self.source_name}")
203            
204            return listings
205        except Exception as e:
206            Actor.log.error(f"Error scraping {self.source_name}: {e}")
207            return self.get_mock_listings()
208    
209    def get_mock_listings(self) -> List[PropertyListing]:
210        """Get mock listings for local testing.
211        
212        Returns:
213            List of mock property listings
214        """
215        Actor.log.info(f"Generating mock data for {self.source_name}")
216        
217        # Create 5 mock listings
218        mock_listings = []
219        
220        for i in range(1, 6):
221            mock_listings.append(
222                PropertyListing(
223                    id=f"{self.source_name}_mock_{i}",
224                    title=f"Mock {self.source_name} Listing {i}",
225                    description=f"This is a mock listing for testing purposes. In {self.search_criteria.location} with {self.search_criteria.min_bedrooms} bedrooms.",
226                    url=f"https://example.com/{self.source_name}/mock-listing-{i}",
227                    price=float(self.search_criteria.min_price or 1000) + (i * 200),
228                    bedrooms=self.search_criteria.min_bedrooms + (i % 2),
229                    bathrooms=self.search_criteria.min_bedrooms / 2 + (i % 2),
230                    address=Address(
231                        street=f"{100 + i} Main St",
232                        city=self.search_criteria.location.split(",")[0].strip(),
233                        state=self.search_criteria.location.split(",")[-1].strip(),
234                        zip_code="12345"
235                    ),
236                    property_type=self.search_criteria.property_type,
237                    source=self.source_name,
238                    amenities=self.search_criteria.amenities + ["parking", "air conditioning"],
239                    listed_date=datetime.now(),
240                    is_new=True
241                )
242            )
243        
244        Actor.log.info(f"Generated {len(mock_listings)} mock listings for {self.source_name}")
245        return mock_listings

src/scrapers/realtor.py

1"""Realtor.com scraper."""
2
3import re
4import json
5import uuid
6from typing import Dict, Any, List, Optional
7from datetime import datetime
8from pydantic import HttpUrl
9
10from apify import Actor
11from apify_client import ApifyClient
12
13from .base import BaseScraper
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class RealtorScraper(BaseScraper):
18    """Realtor.com scraper."""
19    
20    @property
21    def actor_id(self) -> str:
22        """Get Apify actor ID for Realtor.com.
23        
24        Returns:
25            Actor ID
26        """
27        return "epctex/realtor-scraper"
28    
29    @property
30    def source_name(self) -> str:
31        """Get source name.
32        
33        Returns:
34            Source name
35        """
36        return "realtor"
37    
38    def prepare_input(self) -> Dict[str, Any]:
39        """Prepare input for the Realtor.com scraper.
40        
41        Returns:
42            Actor input
43        """
44        # Parse location into city and state
45        location_parts = self.search_criteria.location.split(",")
46        city = location_parts[0].strip().replace(" ", "-").lower()
47        state = ""
48        if len(location_parts) > 1:
49            state = location_parts[1].strip().lower()
50        
51        # Property type mapping
52        property_type_map = {
53            "apartment": "apartments",
54            "house": "single-family-home",
55            "condo": "condos",
56            "townhouse": "townhomes",
57            "any": "any"
58        }
59        
60        property_type = property_type_map.get(
61            self.search_criteria.property_type, "any"
62        )
63        
64        # Base search URL
65        if self.search_criteria.search_type == "rent":
66            base_url = "https://www.realtor.com/apartments"
67        else:
68            base_url = "https://www.realtor.com/realestateandhomes-search"
69        
70        # Construct location part of URL
71        if state:
72            location_url = f"{city}_{state}"
73        else:
74            location_url = city
75        
76        # Build search URL
77        input_url = f"{base_url}/{location_url}"
78        
79        # Start building search parameters
80        search_params = {}
81        
82        # Add property type
83        if property_type != "any":
84            search_params["prop"] = property_type
85        
86        # Add bedroom filter
87        if self.search_criteria.min_bedrooms > 0:
88            search_params["beds-lower"] = str(self.search_criteria.min_bedrooms)
89        if self.search_criteria.max_bedrooms:
90            search_params["beds-upper"] = str(self.search_criteria.max_bedrooms)
91        
92        # Add price filter
93        if self.search_criteria.min_price > 0:
94            search_params["price-lower"] = str(int(self.search_criteria.min_price))
95        if self.search_criteria.max_price:
96            search_params["price-upper"] = str(int(self.search_criteria.max_price))
97        
98        return {
99            "startUrls": [{"url": input_url}],
100            "searchParams": search_params,
101            "maxItems": self.max_items,
102            "extendOutputFunction": """async ({ data, item, customData, Apify }) => {
103                return { ...item };
104            }""",
105            "proxy": {
106                "useApifyProxy": True,
107                "apifyProxyGroups": ["RESIDENTIAL"]
108            }
109        }
110    
111    def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
112        """Transform a Realtor.com listing to a PropertyListing.
113        
114        Args:
115            item: Realtor.com listing
116            
117        Returns:
118            PropertyListing
119        """
120        # Parse price
121        price_str = item.get("price", "0")
122        if isinstance(price_str, str):
123            # Remove currency symbols and commas
124            price_str = re.sub(r'[^\d.]', '', price_str)
125            price = float(price_str) if price_str else 0
126        else:
127            price = float(price_str) if price_str else 0
128        
129        # Get address components
130        full_address = item.get("address", "")
131        address_components = item.get("addressComponents", {})
132        
133        # Construct address
134        street = address_components.get("streetName", "")
135        if "streetNumber" in address_components:
136            street = f"{address_components['streetNumber']} {street}"
137            
138        address = Address(
139            street=street,
140            city=address_components.get("city", ""),
141            state=address_components.get("state", ""),
142            zip_code=address_components.get("zipcode", None)
143        )
144        
145        # Parse bedrooms
146        bedrooms = 0
147        beds = item.get("beds", 0)
148        if isinstance(beds, str):
149            bed_match = re.search(r'(\d+\.?\d*)', beds)
150            bedrooms = float(bed_match.group(1)) if bed_match else 0
151        else:
152            bedrooms = float(beds) if beds else 0
153        
154        # Parse bathrooms
155        bathrooms = None
156        baths = item.get("baths", None)
157        if baths:
158            if isinstance(baths, str):
159                bath_match = re.search(r'(\d+\.?\d*)', baths)
160                bathrooms = float(bath_match.group(1)) if bath_match else None
161            else:
162                bathrooms = float(baths)
163        
164        # Parse square feet
165        sqft = None
166        sqft_str = item.get("sqft", None)
167        if sqft_str:
168            if isinstance(sqft_str, str):
169                sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))
170                sqft = int(sqft_match.group(1)) if sqft_match else None
171            else:
172                sqft = int(sqft_str)
173        
174        # Determine property type
175        property_type = item.get("propertyType", "").lower()
176        if not property_type:
177            property_subtype = item.get("propertySubType", "").lower()
178            if property_subtype:
179                property_type = property_subtype
180            else:
181                property_type = "unknown"
182        
183        # Get URL
184        url = item.get("detailUrl", "")
185        if not url.startswith("http"):
186            url = f"https://www.realtor.com{url}"
187        
188        # Get images
189        images = []
190        photos = item.get("photos", [])
191        if isinstance(photos, list):
192            for photo in photos:
193                if isinstance(photo, dict) and "url" in photo:
194                    images.append(photo["url"])
195                elif isinstance(photo, str) and photo.startswith("http"):
196                    images.append(photo)
197        
198        # Extract amenities
199        amenities = self.extract_amenities(item)
200        
201        # Check for specific features in the item data
202        features = item.get("features", {})
203        if features:
204            for category, feature_list in features.items():
205                if isinstance(feature_list, list):
206                    amenities.extend(feature_list)
207        
208        # Generate a unique ID
209        property_id = str(item.get("listingId", uuid.uuid4()))
210        
211        # Create features dictionary for additional data
212        additional_features = {}
213        for key, value in item.items():
214            if key not in [
215                "price", "address", "addressComponents", "beds", "baths", "sqft",
216                "propertyType", "propertySubType", "detailUrl", "photos", "features",
217                "listingId", "description", "amenities"
218            ]:
219                additional_features[key] = value
220        
221        return PropertyListing(
222            id=property_id,
223            title=item.get("title", "Property Listing"),
224            description=item.get("description", None),
225            price=price,
226            address=address,
227            bedrooms=bedrooms,
228            bathrooms=bathrooms,
229            square_feet=sqft,
230            property_type=property_type,
231            url=url,
232            source="realtor",
233            amenities=amenities,
234            images=images,
235            features=additional_features
236        )

src/scrapers/zillow.py

1"""Zillow scraper."""
2
3import re
4import json
5import uuid
6from typing import Dict, Any, List, Optional
7from datetime import datetime
8from pydantic import HttpUrl
9
10from apify import Actor
11from apify_client import ApifyClient
12
13from .base import BaseScraper
14from ..models.property import PropertyListing, Address, SearchCriteria
15
16
17class ZillowScraper(BaseScraper):
18    """Zillow scraper."""
19    
20    @property
21    def actor_id(self) -> str:
22        """Get Apify actor ID for Zillow.
23        
24        Returns:
25            Actor ID
26        """
27        return "maxcopell/zillow-detail-scraper"
28    
29    @property
30    def source_name(self) -> str:
31        """Get source name.
32        
33        Returns:
34            Source name
35        """
36        return "zillow"
37    
38    def prepare_input(self) -> Dict[str, Any]:
39        """Prepare input for the Zillow scraper.
40        
41        Returns:
42            Actor input
43        """
44        location = self.search_criteria.location.replace(", ", ",").replace(" ", "-").lower()
45        
46        # Property type mapping
47        property_type_map = {
48            "apartment": "apartment",
49            "house": "house",
50            "condo": "condo",
51            "townhouse": "townhome",
52            "any": ""
53        }
54        
55        property_type = property_type_map.get(
56            self.search_criteria.property_type, ""
57        )
58        
59        # Build the URL
60        if self.search_criteria.search_type == "rent":
61            base_url = f"https://www.zillow.com/homes/for_rent/{location}"
62        else:
63            base_url = f"https://www.zillow.com/homes/{location}"
64        
65        # Add filters based on search criteria
66        filters = []
67        
68        # Price filter
69        if self.search_criteria.min_price > 0 or self.search_criteria.max_price:
70            price_filter = "price"
71            if self.search_criteria.min_price > 0:
72                price_filter += f"_gte-{int(self.search_criteria.min_price)}"
73            if self.search_criteria.max_price:
74                price_filter += f"_lte-{int(self.search_criteria.max_price)}"
75            filters.append(price_filter)
76        
77        # Bedroom filter
78        if self.search_criteria.min_bedrooms > 0 or self.search_criteria.max_bedrooms:
79            if self.search_criteria.min_bedrooms == self.search_criteria.max_bedrooms:
80                filters.append(f"{self.search_criteria.min_bedrooms}-_beds")
81            else:
82                bedroom_filter = "beds"
83                if self.search_criteria.min_bedrooms > 0:
84                    bedroom_filter += f"_gte-{self.search_criteria.min_bedrooms}"
85                if self.search_criteria.max_bedrooms:
86                    bedroom_filter += f"_lte-{self.search_criteria.max_bedrooms}"
87                filters.append(bedroom_filter)
88        
89        # Property type filter
90        if property_type:
91            filters.append(f"type-{property_type}")
92        
93        # Assemble the URL with filters
94        if filters:
95            filter_string = "/".join(filters)
96            url = f"{base_url}/{filter_string}"
97        else:
98            url = base_url
99        
100        return {
101            "startUrls": [{"url": url}],
102            "maxPages": 10,
103            "includeRental": self.search_criteria.search_type == "rent",
104            "includeSale": self.search_criteria.search_type == "buy",
105            "includeAuction": False,
106            "proxy": {
107                "useApifyProxy": True,
108                "apifyProxyGroups": ["RESIDENTIAL"]
109            }
110        }
111    
112    def transform_item(self, item: Dict[str, Any]) -> PropertyListing:
113        """Transform a Zillow listing to a PropertyListing.
114        
115        Args:
116            item: Zillow listing
117            
118        Returns:
119            PropertyListing
120        """
121        # Parse price
122        price_str = item.get("price", "0")
123        if isinstance(price_str, str):
124            # Remove currency symbols and commas
125            price_str = re.sub(r'[^\d.]', '', price_str)
126            price = float(price_str) if price_str else 0
127        else:
128            price = float(price_str)
129        
130        # Parse address
131        address_str = item.get("address", "")
132        address = self.parse_address(address_str)
133        
134        # Parse bedrooms
135        bedrooms_str = item.get("bedrooms", "0")
136        if isinstance(bedrooms_str, str):
137            bedroom_match = re.search(r'(\d+\.?\d*)', bedrooms_str)
138            bedrooms = float(bedroom_match.group(1)) if bedroom_match else 0
139        else:
140            bedrooms = float(bedrooms_str) if bedrooms_str else 0
141        
142        # Parse bathrooms
143        bathrooms_str = item.get("bathrooms", None)
144        if bathrooms_str:
145            if isinstance(bathrooms_str, str):
146                bathroom_match = re.search(r'(\d+\.?\d*)', bathrooms_str)
147                bathrooms = float(bathroom_match.group(1)) if bathroom_match else None
148            else:
149                bathrooms = float(bathrooms_str)
150        else:
151            bathrooms = None
152        
153        # Parse square feet
154        sqft_str = item.get("livingArea", None)
155        if sqft_str:
156            if isinstance(sqft_str, str):
157                # Remove non-digit characters
158                sqft_match = re.search(r'(\d+)', sqft_str.replace(',', ''))
159                sqft = int(sqft_match.group(1)) if sqft_match else None
160            else:
161                sqft = int(sqft_str)
162        else:
163            sqft = None
164        
165        # Extract amenities
166        amenities = self.extract_amenities(item)
167        
168        # Get property type
169        property_type = item.get("homeType", "").lower()
170        if not property_type:
171            # Try to infer from description or facts
172            if "apartment" in item.get("description", "").lower():
173                property_type = "apartment"
174            elif "condo" in item.get("description", "").lower():
175                property_type = "condo"
176            elif "house" in item.get("description", "").lower():
177                property_type = "house"
178            elif "townhouse" in item.get("description", "").lower() or "town house" in item.get("description", "").lower():
179                property_type = "townhouse"
180            else:
181                property_type = "unknown"
182        
183        # Get listing URL
184        url = item.get("url", "")
185        if not url.startswith("http"):
186            url = f"https://www.zillow.com{url}"
187        
188        # Get images
189        images = []
190        if "images" in item and isinstance(item["images"], list):
191            for img in item["images"]:
192                if isinstance(img, str) and img.startswith("http"):
193                    images.append(img)
194        
195        # Generate a unique ID
196        property_id = str(item.get("zpid", uuid.uuid4()))
197        
198        # Extract any additional features
199        features = {}
200        for key, value in item.items():
201            if key not in [
202                "price", "address", "bedrooms", "bathrooms", "livingArea", 
203                "homeType", "description", "url", "images", "zpid", "amenities"
204            ]:
205                features[key] = value
206        
207        return PropertyListing(
208            id=property_id,
209            title=item.get("streetAddress", "Property Listing"),
210            description=item.get("description", None),
211            price=price,
212            address=address,
213            bedrooms=bedrooms,
214            bathrooms=bathrooms,
215            square_feet=sqft,
216            property_type=property_type,
217            url=url,
218            source="zillow",
219            amenities=amenities,
220            images=images,
221            features=features
222        )

src/utils/__init__.py

1"""Utility functions for Listing Sleuth."""

src/utils/llm.py

1"""LLM utility functions for Listing Sleuth."""
2
3import os
4from typing import List, Dict, Any, Optional
5from langchain_openai import ChatOpenAI
6from langchain.prompts import ChatPromptTemplate
7from langchain.output_parsers import PydanticOutputParser
8from langchain.schema import Document
9
10from ..models.property import PropertyListing, SearchCriteria
11
12
13def get_llm(api_token: Optional[str] = None) -> ChatOpenAI:
14    """Get LLM client.
15    
16    Args:
17        api_token: OpenAI API token. If None, tries to get from environment.
18        
19    Returns:
20        ChatOpenAI instance
21    
22    Raises:
23        ValueError: If API token is not provided and not found in environment.
24    """
25    token = api_token or os.environ.get("OPENAI_API_KEY")
26    if not token:
27        raise ValueError(
28            "OpenAI API token not provided. Please provide a token in the input "
29            "or set the OPENAI_API_KEY environment variable."
30        )
31    
32    return ChatOpenAI(
33        api_key=token,
34        model="gpt-3.5-turbo",
35        temperature=0
36    )
37
38
39def filter_properties_with_llm(
40    properties: List[PropertyListing],
41    search_criteria: SearchCriteria,
42    api_token: Optional[str] = None
43) -> List[PropertyListing]:
44    """Filter properties with LLM based on search criteria.
45    
46    Args:
47        properties: List of property listings
48        search_criteria: Search criteria
49        api_token: OpenAI API token
50        
51    Returns:
52        Filtered list of property listings
53    """
54    if not properties:
55        return []
56    
57    if not api_token and not search_criteria.llm_api_token:
58        # Without a token, just do basic filtering
59        return properties
60    
61    llm = get_llm(api_token or search_criteria.llm_api_token)
62    parser = PydanticOutputParser(pydantic_object=PropertyListing)
63    
64    template = """
65    You are an AI assistant helping to filter real estate listings based on specific criteria.
66    
67    The user is looking for the following:
68    - Location: {location}
69    - Property type: {property_type}
70    - Price range: ${min_price} - ${max_price} (0 means no minimum, None means no maximum)
71    - Bedrooms: {min_bedrooms} - {max_bedrooms} (None means no maximum)
72    - Desired amenities: {amenities}
73    
74    For each property, evaluate how well it fits the criteria, with special attention to amenities
75    and any specific requirements. Return the property object unmodified if it's a good match,
76    filtering out properties that don't meet the criteria.
77    
78    Here are the properties to evaluate:
79    {properties}
80    
81    If the user mentioned any amenities, prioritize properties with those amenities.
82    """
83    
84    # Process in smaller batches to avoid token limits
85    batch_size = 5
86    filtered_properties = []
87    
88    for i in range(0, len(properties), batch_size):
89        batch = properties[i:i+batch_size]
90        
91        prompt = ChatPromptTemplate.from_template(template)
92        chain = prompt | llm
93        
94        # Simplify property objects for LLM consumption
95        simplified_batch = [
96            {
97                "id": p.id,
98                "title": p.title,
99                "price": p.price,
100                "bedrooms": p.bedrooms,
101                "bathrooms": p.bathrooms,
102                "property_type": p.property_type,
103                "address": str(p.address),
104                "amenities": p.amenities,
105                "description": p.description,
106                "url": str(p.url)
107            }
108            for p in batch
109        ]
110        
111        result = chain.invoke({
112            "location": search_criteria.location,
113            "property_type": search_criteria.property_type,
114            "min_price": search_criteria.min_price,
115            "max_price": search_criteria.max_price,
116            "min_bedrooms": search_criteria.min_bedrooms,
117            "max_bedrooms": search_criteria.max_bedrooms,
118            "amenities": search_criteria.amenities,
119            "properties": simplified_batch
120        })
121        
122        # Extract property IDs that the LLM determined to be good matches
123        response_text = result.content
124        passing_ids = []
125        
126        # Simple parsing of response - in production, this would be more robust
127        for line in response_text.split("\n"):
128            if "id:" in line and "good match" in line.lower():
129                try:
130                    id_part = line.split("id:")[1].strip()
131                    property_id = id_part.split()[0].strip(",")
132                    passing_ids.append(property_id)
133                except IndexError:
134                    continue
135        
136        # Add matching properties to filtered list
137        for p in batch:
138            if p.id in passing_ids:
139                filtered_properties.append(p)
140    
141    return filtered_properties
142
143
144def summarize_property(
145    property_listing: PropertyListing,
146    api_token: Optional[str] = None
147) -> str:
148    """Generate a natural language summary of a property.
149    
150    Args:
151        property_listing: Property listing to summarize
152        api_token: OpenAI API token
153        
154    Returns:
155        Summary of property
156    """
157    try:
158        llm = get_llm(api_token)
159    except ValueError:
160        # Fall back to basic summary if no API token
161        return (
162            f"{property_listing.title}: {property_listing.bedrooms} bed, "
163            f"{property_listing.bathrooms or 'unknown'} bath {property_listing.property_type} "
164            f"for ${property_listing.price:,.2f} in {property_listing.address.city}, "
165            f"{property_listing.address.state}."
166        )
167    
168    template = """
169    Create a concise, appealing summary of this property listing in one paragraph:
170    
171    Title: {title}
172    Price: ${price}
173    Address: {address}
174    Property type: {property_type}
175    Bedrooms: {bedrooms}
176    Bathrooms: {bathrooms}
177    Square feet: {square_feet}
178    Amenities: {amenities}
179    Description: {description}
180    
181    Keep the summary brief but informative, highlighting key selling points.
182    """
183    
184    prompt = ChatPromptTemplate.from_template(template)
185    chain = prompt | llm
186    
187    result = chain.invoke({
188        "title": property_listing.title,
189        "price": f"{property_listing.price:,.2f}",
190        "address": str(property_listing.address),
191        "property_type": property_listing.property_type,
192        "bedrooms": property_listing.bedrooms,
193        "bathrooms": property_listing.bathrooms or "unknown",
194        "square_feet": property_listing.square_feet or "unknown",
195        "amenities": ", ".join(property_listing.amenities) or "none specified",
196        "description": property_listing.description or "No description provided"
197    })
198    
199    return result.content.strip()

src/utils/storage.py

1"""Storage utility functions for Listing Sleuth."""
2
3import json
4import os
5from typing import Dict, List, Optional, Any, Union
6from datetime import datetime
7from pydantic import BaseModel
8from apify import Actor
9
10from ..models.property import PropertyListing, SearchResults, SearchCriteria
11
12
13def save_search_results(results: SearchResults) -> None:
14    """Save search results to Apify key-value store.
15    
16    Args:
17        results: Search results to save
18    """
19    # Convert results to dict for storage
20    results_dict = results.model_dump()
21    
22    # Convert datetime objects to ISO format strings
23    results_dict["search_date"] = results_dict["search_date"].isoformat()
24    for i, result in enumerate(results_dict["results"]):
25        if result.get("listed_date"):
26            results_dict["results"][i]["listed_date"] = result["listed_date"].isoformat()
27    
28    try:
29        # Save to Apify key-value store if in production
30        if hasattr(Actor, 'main_kv_store'):
31            Actor.main_kv_store.set_value("search_results", results_dict)
32            
33            # Also save the individual listings separately for easier access
34            for listing in results.results:
35                Actor.main_kv_store.set_value(f"listing_{listing.id}", listing.model_dump())
36        else:
37            # Local testing - save to a local file
38            Actor.log.info("Running in local mode, saving to local file")
39            os.makedirs("storage/key_value_stores", exist_ok=True)
40            with open("storage/key_value_stores/search_results.json", "w") as f:
41                json.dump(results_dict, f)
42    except Exception as e:
43        Actor.log.error(f"Error saving search results: {e}")
44
45
46def load_previous_results(search_criteria: SearchCriteria) -> Optional[SearchResults]:
47    """Load previous search results from Apify key-value store.
48    
49    Args:
50        search_criteria: Current search criteria, to compare with previous search
51        
52    Returns:
53        Previous search results, or None if no previous results or criteria changed
54    """
55    # Try to get previous results
56    try:
57        results_dict = None
58        
59        # Try to load from Apify KV store first
60        if hasattr(Actor, 'main_kv_store'):
61            results_dict = Actor.main_kv_store.get_value("search_results")
62        
63        # If not found or in local mode, try loading from local file
64        if not results_dict and os.path.exists("storage/key_value_stores/search_results.json"):
65            Actor.log.info("Loading from local file")
66            with open("storage/key_value_stores/search_results.json", "r") as f:
67                results_dict = json.load(f)
68        
69        if not results_dict:
70            return None
71        
72        # Parse dates
73        results_dict["search_date"] = datetime.fromisoformat(results_dict["search_date"])
74        for i, result in enumerate(results_dict["results"]):
75            if result.get("listed_date"):
76                results_dict["results"][i]["listed_date"] = datetime.fromisoformat(
77                    result["listed_date"]
78                )
79        
80        # Convert back to model
81        previous_results = SearchResults(**results_dict)
82        
83        # Check if search criteria has changed
84        prev_criteria = previous_results.search_criteria
85        if (
86            prev_criteria.location != search_criteria.location
87            or prev_criteria.property_type != search_criteria.property_type
88            or prev_criteria.min_bedrooms != search_criteria.min_bedrooms
89            or prev_criteria.max_bedrooms != search_criteria.max_bedrooms
90            or prev_criteria.min_price != search_criteria.min_price
91            or prev_criteria.max_price != search_criteria.max_price
92            or prev_criteria.search_type != search_criteria.search_type
93            or set(prev_criteria.sources) != set(search_criteria.sources)
94            # Amenities might be in different order but same content
95            or set(prev_criteria.amenities) != set(search_criteria.amenities)
96        ):
97            # Criteria changed, don't use previous results
98            return None
99        
100        return previous_results
101    
102    except Exception as e:
103        Actor.log.error(f"Error loading previous results: {e}")
104        return None
105
106
107def mark_new_listings(
108    current_results: List[PropertyListing],
109    previous_results: Optional[SearchResults]
110) -> List[PropertyListing]:
111    """Mark new listings in current results compared to previous results.
112    
113    Args:
114        current_results: Current property listings
115        previous_results: Previous search results, or None if no previous results
116        
117    Returns:
118        Updated current property listings with is_new flag set
119    """
120    if not previous_results:
121        # If no previous results, all are new
122        for listing in current_results:
123            listing.is_new = True
124        return current_results
125    
126    # Get IDs of previous listings
127    previous_ids = {listing.id for listing in previous_results.results}
128    
129    # Mark new listings
130    for listing in current_results:
131        if listing.id not in previous_ids:
132            listing.is_new = True
133    
134    return current_results
135
136
137def push_results_to_dataset(results: SearchResults) -> None:
138    """Push search results to Apify dataset.
139    
140    Args:
141        results: Search results to push
142    """
143    # Convert to simple dicts for the dataset
144    listings_data = []
145    for listing in results.results:
146        listing_dict = listing.model_dump()
147        # Convert complex types to strings for better compatibility
148        listing_dict["address"] = str(listing.address)
149        if listing.listed_date:
150            listing_dict["listed_date"] = listing.listed_date.isoformat()
151        listings_data.append(listing_dict)
152    
153    try:
154        # Push each listing as a separate item
155        Actor.push_data(listings_data)
156    except Exception as e:
157        Actor.log.error(f"Error pushing data to dataset: {e}")
158        # In local mode, save to local file
159        try:
160            os.makedirs("storage/datasets/default", exist_ok=True)
161            with open("storage/datasets/default/results.json", "w") as f:
162                json.dump(listings_data, f)
163            Actor.log.info("Saved results to local file")
164        except Exception as e2:
165            Actor.log.error(f"Error saving to local file: {e2}")

src/models/__init__.py

1"""Models for Listing Sleuth."""

src/models/property.py

1"""Property data models for Listing Sleuth."""
2
3from typing import List, Optional, Dict, Any
4from pydantic import BaseModel, Field, HttpUrl
5from datetime import datetime
6
7
8class Address(BaseModel):
9    """Model for property address."""
10    
11    street: Optional[str] = None
12    city: str
13    state: str
14    zip_code: Optional[str] = None
15    country: str = "United States"
16    
17    def __str__(self) -> str:
18        """Return string representation of address."""
19        parts = []
20        if self.street:
21            parts.append(self.street)
22        parts.append(f"{self.city}, {self.state}")
23        if self.zip_code:
24            parts.append(self.zip_code)
25        return ", ".join(parts)
26
27
28class PropertyListing(BaseModel):
29    """Model for property listing data."""
30    
31    id: str
32    title: str
33    description: Optional[str] = None
34    price: float
35    address: Address
36    bedrooms: float
37    bathrooms: Optional[float] = None
38    square_feet: Optional[int] = None
39    property_type: str
40    url: HttpUrl
41    source: str
42    amenities: List[str] = Field(default_factory=list)
43    images: List[HttpUrl] = Field(default_factory=list)
44    listed_date: Optional[datetime] = None
45    is_new: bool = False  # Flag for new listings since last search
46    features: Dict[str, Any] = Field(default_factory=dict)  # Additional property features
47    
48    class Config:
49        """Pydantic config."""
50        
51        extra = "ignore"
52
53
54class SearchCriteria(BaseModel):
55    """Model for search criteria."""
56    
57    location: str
58    property_type: str = "any"
59    min_bedrooms: int = 0
60    max_bedrooms: Optional[int] = None
61    min_price: float = 0
62    max_price: Optional[float] = None
63    amenities: List[str] = Field(default_factory=list)
64    search_type: str = "rent"
65    sources: List[str] = Field(default=["zillow", "realtor", "apartments"])
66    llm_api_token: Optional[str] = None
67
68
69class SearchResults(BaseModel):
70    """Model for search results."""
71    
72    search_criteria: SearchCriteria
73    results: List[PropertyListing] = Field(default_factory=list)
74    total_results: int = 0
75    new_results: int = 0
76    search_date: datetime = Field(default_factory=datetime.now)
77    sources_searched: List[str] = Field(default_factory=list)