1# AI Chatbot Integration Guide for Sahibinden.com Scraper
2
3This guide provides instructions for integrating the scraped car listing data from BaseRow with an AI chatbot for used car price estimation.
4
5## Overview
6
7The AI chatbot will:
81. Accept user descriptions of cars (e.g., "2017 Passat 3 parça boya 150bin km")
92. Query BaseRow for comparable listings
103. Calculate an estimated price range
114. Return the estimate to the user with supporting data
12
13## Data Access
14
15### Direct BaseRow API Access
16
17The AI chatbot can access the data directly from BaseRow using their API:
18
19```python
20import requests
21import json
22
23def query_baserow(api_token, table_id, filters=None):
24 """
25 Query BaseRow table with optional filters
26
27 Args:
28 api_token (str): BaseRow API token
29 table_id (str): BaseRow table ID
30 filters (dict): Optional filters to apply
31
32 Returns:
33 list: Matching car listings
34 """
35 headers = {
36 'Authorization': f'Token {api_token}',
37 'Content-Type': 'application/json'
38 }
39
40 url = f'https://api.baserow.io/api/database/rows/table/{table_id}/'
41
42 params = {
43 'user_field_names': 'true'
44 }
45
46 if filters:
47 for key, value in filters.items():
48 params[f'filter__{key}'] = value
49
50 response = requests.get(url, headers=headers, params=params)
51
52 if response.status_code == 200:
53 return response.json()['results']
54 else:
55 raise Exception(f"Error querying BaseRow: {response.text}")
56```
57
58### Example Usage
59
60```python
61# Find similar cars based on make, model, and year
62similar_cars = query_baserow(
63 api_token='YOUR_API_TOKEN',
64 table_id='YOUR_TABLE_ID',
65 filters={
66 'make__contains': 'Volkswagen',
67 'model__contains': 'Passat',
68 'year__contains': '2017'
69 }
70)
71
72# Process results
73for car in similar_cars:
74 print(f"Title: {car['title']}")
75 print(f"Price: {car['price']} {car['price_currency']}")
76 print(f"Mileage: {car['mileage']}")
77 print(f"Location: {car['location']}")
78 print("---")
79```
80
81## Price Estimation Algorithm
82
83Here's a simple algorithm for estimating car prices:
84
85```python
86def estimate_price(user_description, api_token, table_id):
87 """
88 Estimate car price based on user description
89
90 Args:
91 user_description (str): User's description of the car
92 api_token (str): BaseRow API token
93 table_id (str): BaseRow table ID
94
95 Returns:
96 dict: Price estimation and supporting data
97 """
98 # Extract key information from user description
99 extracted_info = extract_car_info(user_description)
100
101 # Query similar cars
102 filters = {}
103 if 'make' in extracted_info:
104 filters['make__contains'] = extracted_info['make']
105 if 'model' in extracted_info:
106 filters['model__contains'] = extracted_info['model']
107 if 'year' in extracted_info:
108 filters['year__contains'] = extracted_info['year']
109
110 similar_cars = query_baserow(api_token, table_id, filters)
111
112 if not similar_cars:
113 return {
114 'status': 'no_matches',
115 'message': 'No similar cars found in our database.'
116 }
117
118 # Calculate price statistics
119 prices = [car['price'] for car in similar_cars if car['price']]
120
121 if not prices:
122 return {
123 'status': 'no_prices',
124 'message': 'Found similar cars but no valid price data.'
125 }
126
127 avg_price = sum(prices) / len(prices)
128 min_price = min(prices)
129 max_price = max(prices)
130
131 # Adjust based on mileage if available
132 if 'mileage' in extracted_info:
133 user_mileage = extract_numeric(extracted_info['mileage'])
134 if user_mileage:
135 # Calculate average mileage
136 mileages = [extract_numeric(car['mileage']) for car in similar_cars if car['mileage']]
137 mileages = [m for m in mileages if m] # Filter out None values
138
139 if mileages:
140 avg_mileage = sum(mileages) / len(mileages)
141 # Adjust price based on mileage difference
142 if avg_mileage > 0:
143 mileage_factor = 1 - ((user_mileage - avg_mileage) / avg_mileage) * 0.1
144 avg_price = avg_price * mileage_factor
145
146 # Adjust based on damage status if available
147 if 'damage' in extracted_info:
148 damage_status = extracted_info['damage']
149 if damage_status:
150 # Reduce price for damaged cars
151 avg_price = avg_price * 0.85
152
153 return {
154 'status': 'success',
155 'estimated_price': round(avg_price),
156 'price_range': {
157 'min': round(min_price),
158 'max': round(max_price)
159 },
160 'similar_cars_count': len(similar_cars),
161 'currency': similar_cars[0]['price_currency'] if similar_cars else 'TL',
162 'similar_cars': similar_cars[:5] # Return top 5 similar cars
163 }
164
165def extract_car_info(description):
166 """
167 Extract car information from user description
168 This is a simplified example - in a real implementation,
169 you would use NLP techniques or a language model
170 """
171 info = {}
172
173 # Extract make and model
174 common_makes = ['Volkswagen', 'BMW', 'Mercedes', 'Audi', 'Toyota', 'Honda', 'Ford', 'Renault']
175 for make in common_makes:
176 if make.lower() in description.lower():
177 info['make'] = make
178 # Look for common models for this make
179 if make == 'Volkswagen':
180 models = ['Passat', 'Golf', 'Polo', 'Tiguan', 'Jetta']
181 elif make == 'BMW':
182 models = ['320', '520', 'X5', 'X3', 'M3', 'M5']
183 # Add more makes and models as needed
184
185 for model in models:
186 if model.lower() in description.lower():
187 info['model'] = model
188 break
189 break
190
191 # Extract year (4-digit number between 1990 and current year)
192 import re
193 year_matches = re.findall(r'\b(19[9][0-9]|20[0-2][0-9])\b', description)
194 if year_matches:
195 info['year'] = year_matches[0]
196
197 # Extract mileage (number followed by km, bin km, or similar)
198 mileage_matches = re.findall(r'(\d+)(?:\s*(?:bin|k|b|000))?(?:\s*km)', description.lower())
199 if mileage_matches:
200 mileage = int(mileage_matches[0])
201 # If the number is small, assume it's in thousands
202 if mileage < 1000:
203 mileage *= 1000
204 info['mileage'] = str(mileage)
205
206 # Extract damage information
207 damage_keywords = ['hasar', 'boya', 'değişen', 'tramer']
208 for keyword in damage_keywords:
209 if keyword in description.lower():
210 info['damage'] = True
211 break
212
213 return info
214
215def extract_numeric(text):
216 """Extract numeric value from text"""
217 if not text:
218 return None
219
220 import re
221 numbers = re.findall(r'\d+', text)
222 if numbers:
223 return int(''.join(numbers))
224 return None
225```
226
227## Integration with Popular AI Frameworks
228
229### OpenAI Integration
230
231```python
232import openai
233
234def get_price_estimate_openai(user_query, api_token, table_id):
235 # First extract structured data from the user query
236 extraction_prompt = f"""
237 Extract car information from the following user query:
238 "{user_query}"
239
240 Return a JSON object with the following fields if present:
241 - make: Car manufacturer
242 - model: Car model
243 - year: Manufacturing year
244 - mileage: Mileage in km
245 - damage: Boolean indicating if damage is mentioned
246 """
247
248 extraction_response = openai.ChatCompletion.create(
249 model="gpt-4",
250 messages=[
251 {"role": "system", "content": "You are a car information extraction assistant."},
252 {"role": "user", "content": extraction_prompt}
253 ]
254 )
255
256 extracted_info = json.loads(extraction_response.choices[0].message.content)
257
258 # Query BaseRow for similar cars
259 filters = {}
260 if 'make' in extracted_info:
261 filters['make__contains'] = extracted_info['make']
262 if 'model' in extracted_info:
263 filters['model__contains'] = extracted_info['model']
264 if 'year' in extracted_info:
265 filters['year__contains'] = extracted_info['year']
266
267 similar_cars = query_baserow(api_token, table_id, filters)
268
269 # Generate price estimate and response
270 if not similar_cars:
271 return "I couldn't find any similar cars in our database. Please provide more details about the car."
272
273 # Calculate price statistics and prepare data for the AI
274 prices = [car['price'] for car in similar_cars if car['price']]
275 avg_price = sum(prices) / len(prices) if prices else 0
276 min_price = min(prices) if prices else 0
277 max_price = max(prices) if prices else 0
278
279 # Prepare data for the AI to generate a response
280 car_data = {
281 "query": user_query,
282 "extracted_info": extracted_info,
283 "similar_cars_count": len(similar_cars),
284 "price_stats": {
285 "average": round(avg_price),
286 "minimum": round(min_price),
287 "maximum": round(max_price),
288 "currency": similar_cars[0]['price_currency'] if similar_cars else "TL"
289 },
290 "example_listings": similar_cars[:3] # First 3 similar cars
291 }
292
293 # Generate a natural language response
294 response_prompt = f"""
295 Based on the following car data, provide a price estimate and explanation:
296 {json.dumps(car_data, indent=2)}
297
298 Your response should include:
299 1. The estimated price range
300 2. Factors that influence the price
301 3. A brief explanation of how you arrived at this estimate
302 4. Any caveats or additional information the user should know
303 """
304
305 final_response = openai.ChatCompletion.create(
306 model="gpt-4",
307 messages=[
308 {"role": "system", "content": "You are a car price estimation assistant."},
309 {"role": "user", "content": response_prompt}
310 ]
311 )
312
313 return final_response.choices[0].message.content
314```
315
316## Handling Edge Cases
317
318### Insufficient Data
319
320When there aren't enough similar listings:
321
322```python
323def get_fallback_estimate(extracted_info, api_token, table_id):
324 """Get a fallback estimate when exact matches aren't available"""
325 # Try with just make
326 if 'make' in extracted_info:
327 similar_make = query_baserow(
328 api_token,
329 table_id,
330 {'make__contains': extracted_info['make']}
331 )
332
333 if similar_make:
334 return {
335 'status': 'partial_match',
336 'message': f"Found {len(similar_make)} cars of the same make, but not the exact model or year.",
337 'data': calculate_price_stats(similar_make)
338 }
339
340 # Try with just year range
341 if 'year' in extracted_info:
342 year = int(extracted_info['year'])
343 year_range = query_baserow(
344 api_token,
345 table_id,
346 {
347 'year__gte': str(year - 2),
348 'year__lte': str(year + 2)
349 }
350 )
351
352 if year_range:
353 return {
354 'status': 'year_range_match',
355 'message': f"Found {len(year_range)} cars from similar years ({year-2}-{year+2}).",
356 'data': calculate_price_stats(year_range)
357 }
358
359 return {
360 'status': 'no_data',
361 'message': "Insufficient data to provide an estimate."
362 }
363```
364
365### Handling Ambiguous Queries
366
367For ambiguous queries, prompt the user for clarification:
368
369```python
370def handle_ambiguous_query(user_query, possible_matches):
371 """Handle ambiguous car queries by asking for clarification"""
372 if len(possible_matches) > 1:
373 makes = set(car['make'] for car in possible_matches if 'make' in car)
374 models = set(car['model'] for car in possible_matches if 'model' in car)
375
376 clarification_message = "I found multiple possible matches. Could you specify which one you mean?\n\n"
377
378 if len(makes) > 1:
379 clarification_message += f"Makes: {', '.join(makes)}\n"
380
381 if len(models) > 1:
382 clarification_message += f"Models: {', '.join(models)}\n"
383
384 return {
385 'status': 'needs_clarification',
386 'message': clarification_message,
387 'options': [f"{car['make']} {car['model']} {car['year']}" for car in possible_matches[:5]]
388 }
389```
390
391## Performance Optimization
392
393For better performance with large datasets:
394
3951. Create indexes on frequently queried fields in BaseRow
3962. Cache common queries
3973. Implement pagination for large result sets
3984. Pre-process and aggregate data for common queries
399
400## Conclusion
401
402This integration guide provides the foundation for connecting your AI chatbot to the scraped Sahibinden.com data in BaseRow. The actual implementation may vary depending on your specific AI platform and requirements.
403
404For best results:
4051. Regularly update the scraped data
4062. Fine-tune the price estimation algorithm based on user feedback
4073. Expand the car information extraction to handle more complex queries
4084. Consider implementing a feedback mechanism to improve estimates over time