Albertsons Product Scraper
Pricing
from $9.00 / 1,000 results
Albertsons Product Scraper
Pricing
from $9.00 / 1,000 results
Rating
0.0
(0)
Developer

GetDataForMe
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
15 days ago
Last modified
Categories
Share
Albertsons Crawler for Foodgraph
A professional-grade web scraper built with Crawlee and Playwright for extracting product data from Albertsons.com. This scraper is designed to meet Foodgraph's specific requirements for grocery product data collection.
Features
- Full Category Coverage: Scrapes all specified product categories with inclusion/exclusion rules
- Browser-based Scraping: Uses real browser automation for reliable data extraction
- API Interception: Captures and reuses Albertsons API calls for efficient data collection
- Session Management: Automatic session refresh and token management
- GTIN/UPC Validation: Ensures all products have valid GTIN/UPC codes
- Structured Output: Produces data in Foodgraph's required format with
rid,sourcePdpUrl, andproductfields - Proxy Support: Compatible with Bright Data, Apify Proxy, and custom proxy solutions
- Health Monitoring: Built-in health checker for daily validation
- Error Handling: Robust retry logic and exponential backoff
Quick Start
-
Installation
$npm install -
Basic Usage
$npm start -
Development Mode
$npm run start:dev
Configuration
Default Categories (Foodgraph Test Project)
The scraper is pre-configured with the exact categories specified in the Foodgraph RFP:
Include All Categories:
- Beverages
- Breakfast & Cereal
- Canned Goods & Soups
- Condiments, Spice & Bake
- Cookies, Snacks & Candy
- Dairy, Eggs & Cheese
- Frozen Foods
- Fruits & Vegetables
- Grains, Pasta & Sides
- International Cuisine
- Meat & Seafood
Include Specific Subcategories Only:
- Baby Care → Formula & Baby Food only
- Wine, Beer & Spirits → Non-Alcoholic Beer and Cocktail Mixes only
Exclude Specific Subcategories:
- Bread & Bakery → Exclude Bakery Beverages & Snacks, Bakery Catering Trays
- Deli → Exclude Deli Bar & Food Service, Deli Sandwiches and Wraps, Sushi
Input Parameters
{"startUrls": ["https://www.albertsons.com/shop/aisles/beverages.html"],"storeIds": [177, 154, 1680],"maxRequestsPerCrawl": 1000,"headless": true}
Proxy Configuration
Bright Data (Recommended):
{"proxyConfiguration": {"proxyUrls": ["wss://brd-customer-hl_1548877d-zone-scraping_browser1-country-us:f6kbfntem9hn@brd.superproxy.io:9222"]}}
Apify Proxy:
{"proxyConfiguration": {"useApifyProxy": true}}
Output Format
The scraper produces data in the exact format required by Foodgraph:
{"rid": "550e8400-e29b-41d4-a716-446655440000","sourcePdpUrl": "https://www.albertsons.com/product-detail/...","product": {"fullCategoryTaxonomy": ["Beverages", "Water & Sparkling Water"],"id": "123456","name": "Product Name","upc": "123456789012","brand": "Brand Name","ingredients": "...","nutrition": {...},"images": ["https://..."]}}
Key Requirements Compliance
✅ Technology Stack
- JavaScript: ✓ Built with Node.js and TypeScript
- Playwright: ✓ Browser automation with Firefox support
- Crawlee: ✓ Latest version 3.x framework
✅ Scraping Approach
- API First: ✓ Intercepts and uses Albertsons internal APIs
- Browser Fallback: ✓ Uses browser automation when needed
- Session Management: ✓ Handles token refresh and session expiry
✅ Data Requirements
- Raw Data: ✓ No transformations, preserves original structure
- Required Fields: ✓ Includes
rid,sourcePdpUrl,product,fullCategoryTaxonomy - GTIN/UPC: ✓ Validates presence of product identifiers
- No Deduplication: ✓ Captures all product instances
✅ Exclusions Implemented
- Reviews and ratings
- Pickup/delivery options
- Price and promotions (captured but not required)
- Related/similar products
- Marketplace sellers
✅ Category Management
- Full inclusion/exclusion rule support
- Configurable category targeting
- Automatic subcategory discovery
Health Monitoring
Run health check manually:
$npm run healthcheck
The health checker validates:
- Category page navigation
- API connection functionality
- Product data extraction
- GTIN validation
- Proxy connectivity
Development
Project Structure
src/├── main.ts # Main entry point├── routes.ts # Request routing logic├── categories.ts # Category configuration├── types.ts # TypeScript definitions├── utils.ts # Utility functions└── healthcheck.ts # Health monitoring
Adding New Categories
Update src/categories.ts:
export const DEFAULT_CATEGORY_CONFIG = {includeAll: ['https://www.albertsons.com/shop/aisles/new-category.html']};
Debugging
Enable debug mode:
{"debugMode": true,"headless": false}
Production Deployment
Apify Platform
- Upload project to Apify
- Configure input schema
- Set up scheduling (every 4-6 weeks)
- Monitor via health checker
Environment Variables
BRIGHT_DATA_ENDPOINT=wss://brd-customer-...APIFY_PROXY_PASSWORD=your-password
Performance
- Concurrency: Default 1 (recommended for stability)
- Request Rate: ~2-3 seconds between requests
- Session Lifetime: ~100 requests per session
- Error Recovery: 3 retries with exponential backoff
Troubleshooting
Common Issues
No products found:
- Check store ID validity (try 177, 154, 1680)
- Verify category URLs are accessible
- Check if session tokens are being captured
Session expired errors:
- Automatic session refresh is implemented
- Monitor for rate limiting (429 errors)
- Consider reducing concurrency
Proxy issues:
- Verify Bright Data credentials
- Test connection with health checker
- Check proxy endpoint accessibility
Support
For technical issues:
- Check health checker output
- Review error logs in Actor platform
- Verify category URLs are current
- Test with single category first
License
This scraper is designed for legitimate business use in compliance with website terms of service and applicable laws.