Scrape SOLD Poshmark listings into resale price comps: sold price, days-to-sell, condition, and size, plus a summary per search (median, p25, p75 sold price, comp confidence, and a suggested list price). Built for Poshmark's 403 blocking.
All notable changes to this Actor are documented here. Format follows
Keep a Changelog ; versions follow semver.
[0.4.0] — 2026-06-17
Changed
Search-only extraction. Poshmark's search page embeds a grid ($_search.gridData.data) with
full per-listing data (price, inventory.status + status_changed_at, first_published_at,
condition, size, brand), so the Actor now builds every comp from one search fetch and no longer
requests per-listing detail pages. Measured on the platform: $0.25 per 1,000 results (down
from $1.00 in v0.3 and $35.72 on the old browser), 48 comps from a single request, 0 blocks,
~97% core completeness.
parse_detail is retained and refactored to share one record assembler (_assemble_record)
with the new search path.
Added
Search pagination. Follows more.next_max_id to page through results, so one query returns
up to maxItems comps (tested to 150 across 4 requests at $0.21/1,000, 0 blocks). Cross-page
itemId de-dup; page count is bounded by maxItems. Bigger samples lift compConfidence (150
comps reaches "high").
[0.3.0] — 2026-06-17
Changed
HTTP-first rewrite. Replaced the Playwright browser crawler with Crawlee's
BeautifulSoupCrawler (HTTP plus the impit browser-impersonation client). Every field is in
Poshmark's server-rendered HTML (window.__INITIAL_STATE__), so the browser was pure cost.
Measured on the platform: cost per 1,000 results fell from $35.72 to $1.00-$1.85
(datacenter / residential), run time for 20 items from ~391s to ~84s, still 0 blocks and
100% core completeness.
Default proxy is now Apify datacenter (cheapest, works at modest volume). Residential is the
fallback for heavy or blocked runs.
HTTP-mode block detection: HTTP 403/429/503 plus challenge-page markers, keyed off the
presence of __INITIAL_STATE__ so the word "captcha" in normal script bundles is not a false
positive.
Note
includeSellerStats is not fetched in the HTTP path (it was experimental and null anyway).
[0.2.0] — 2026-06-17
Added
Per-run DATA_QUALITY record + log line (src/quality.py): block rate, items returned vs
requested, fill rate, and per-field core completeness. Reliability becomes measurable per run.
Comps intelligence layer (src/aggregate.py): recency window + pctSoldWithin30/60/90Days,
IQR outlier trimming (outliersRemoved), a transparent compConfidence score, priceByBrand,
medianDaysToSell, and a pricingRecommendation (listAt / fastSaleListAt / topDollarListAt
/ expectedDaysToSell / confidence).
Reproducible /benchmarks: fetch_live.py (re-pull live HTML) + measure_live.py (parse and
score real data). Phase 0 teardown documented in benchmarks/README.md.
Changed
Honest sellThroughRate: now null (with a note) in the default sold-only mode instead of a
misleading 1.0. Real rate requires soldOnly: false (sold + active sample).
Hardened block detection: added PerimeterX / DataDome / Imperva challenge-page markers and
HTTP 403/429/503 awareness on navigations (src/config.py, src/poshmark.py, src/scraper.py).
Store listing rewritten with measured, reproducible benchmark numbers and a 403 FAQ; Store name,
title, description, and categories tuned for search. Version bumped to 0.2.
Fixed (surfaced by the first live platform run)
ConcurrencySettings: Crawlee 1.7.2 rejects desired_concurrency (default 10) > max_concurrency.
Now clamped to min(DESIRED_CONCURRENCY, max_concurrency).
Detail/search navigations hung on the load event (~57 s timeout) on Poshmark. Now navigate with
wait_until="domcontentloaded" (all data is in the SSR HTML) and block_requests() to drop
images/css/fonts (faster + much less residential proxy bandwidth).
Verified
pytest 27 passed; parser scores 100% core completeness on 17 live sold listings across 3
queries; 0 blocks in a 24-request paced datacenter-IP baseline. See /benchmarks.
Deployed and ran live on Apify (RESIDENTIAL proxy): run SUCCEEDED, 0 blocks, 100% core
completeness, DATA_QUALITY + AGGREGATE records written as designed.
[0.1.0] — 2026-06-17
Added
Initial release: Poshmark SOLD listing scraper producing resale price comps.
Fixed, validated dataset record (Pydantic) — one object per listing with soldPrice,
daysToSell, originalRetailPrice, condition/size/color, likes/comments, seller fields,
and image URLs.
Per‑query aggregate written to the Key‑Value Store (AGGREGATE-<slug>): count, soldCount,
median/mean/p25/p75 sold price, avg days‑to‑sell, sell‑through rate, price by condition/size.