Zillow Explorer avatar
Zillow Explorer
Try for free

7 days trial then $30.00/month - No credit card required now

View all Actors
Zillow Explorer

Zillow Explorer

jupri/zillow-scraper
Try for free

7 days trial then $30.00/month - No credit card required now

Search property data from Zillow.com

User avatar

Scrapes have different headers with same search parameters

Open

NoUturns opened this issue
4 months ago

I have been scraping different cities and MSA's. I change nothing but the locations and run the same scrape. The column headers for the scrapes sometimes add additional headers, making the data difficult to further parse.

What can we do about this. Is there a way to create a template for what headers I want that doesn't involve me going in a selecting the columns (headers) everytime? Why would this be adding (or subtracting) additional columns when I don't change anything in the scrape parameters?

Please help!

Thank you!

User avatar

NoUturns

4 months ago

Please see attached exported scrapes, run within 10 minutes of each other. Notice that one has the "groupType" header and the other doesn't. Why is this?

User avatar

NoUturns

4 months ago

IT appears the "groupType" is a "for rent" thing as its an apartmetn complex and has a for rent listing. Why is this? Can we get rid of this option? I am running "sold" scrapes right now and don't want for rent listings. Why did that slip in there?

User avatar

cmpusa

2 months ago

I am experiencing the same issue. This does not allow for standard import mapping to a database as the headers can change at random.

User avatar

cat (Jupri)

2 months ago

Hello, sorry for the inconvenience. Please add dev_no_strip=1 to your input. example: { "location": "New York", "limit": 10, "dev_no_strip": 1 }

Explanation: This actor normally will remove any empty values (NULL, FALSE, empty array/object and empty string) from the results. This is done to save space and memory. But this will make number of columns inconsistent, from one run to another. dev_no_strip will disable this behavior, and will keep empty values in datasets, so number of columns will consistent from 1 run to another. The dev_no_strip flag is hidden parameter and there is no UI for that (yet). I will soon update the actor UI to include this.

The "stripping" process is done before the results sent to Dataset Storage.

I hope this make sense. :)

User avatar

cat (Jupri)

2 months ago

Another way is to "re-shape" the dataset using dev_transform_enable and dev_transform_fields

User avatar

cmpusa

2 months ago

@cat(Jupri) thanks for the suggestions. I'm definitely going to try the custom fields...would save a little time shaping the output file.

User avatar

cmpusa

2 months ago

Update...the Custom Field option worked perfectly for me. I now have a standardized output file to be imported to our internal systems. Thank you.

Developer
Maintained by Community
Actor metrics
  • 26 monthly users
  • 97.5% runs succeeded
  • 14.3 days response time
  • Created in Jul 2022
  • Modified 3 months ago
Categories