Gumtree Business Contact Scraper avatar
Gumtree Business Contact Scraper
Under maintenance

Pricing

$8.00 / 1,000 leads

Go to Apify Store
Gumtree Business Contact Scraper

Gumtree Business Contact Scraper

Under maintenance

Developed by

Țugui Dragoș

Țugui Dragoș

Maintained by Community

Scrape business contact details from Gumtree classified ads across all categories. Extract phone numbers, email addresses, company websites, and physical addresses from UK, Australia, and international listings. Perfect for B2B lead generation, sales prospecting, and outreach campaigns.

5.0 (1)

Pricing

$8.00 / 1,000 leads

1

1

1

Last modified

4 days ago

Gumtree Multi-Country Company Contact Scraper v10.9

📋 TL;DR

Powerful Apify Actor that scrapes company contact information and listings from Gumtree across 4 countries: UK, Ireland, South Africa, and Australia. Extracts phone numbers, emails, prices, descriptions, images, and more with intelligent country-specific selectors and proxy support.

Quick Start:

npm install
npm start

🌍 Supported Countries

  • 🇬🇧 United Kingdom - gumtree.com
  • 🇮🇪 Ireland - gumtree.ie
  • 🇿🇦 South Africa - gumtree.co.za
  • 🇦🇺 Australia - gumtree.com.au

✨ Features

Multi-Country Support - Automatically adapts to each country's website structure ✅ Contact Extraction - Reveals and extracts phone numbers and email addresses ✅ Smart Selectors - Uses JSON-LD schema, data-q attributes, and CSS fallbacks ✅ Proxy Support - Built-in Apify residential proxy configuration ✅ Comprehensive Data - Title, price, location, category, images, attributes, and more ✅ Robust Error Handling - Multiple fallback strategies for each field ✅ Structured Output - Consistent dataset schema across all countries


📊 Extracted Data

Each listing includes:

FieldDescription
01_urlFull URL of the listing page
02_ad_idUnique advertisement ID
03_countryCountry code (UK, IE, ZA, AU)
04_titleListing title/headline
05_pricePrice or salary information
06_categoryCategory breadcrumb path
07_locationGeographic location (city, region)
08_date_postedDate when listing was posted
09_seller_nameSeller or company name
10_attributesAdditional attributes (year, make, model, etc.)
11_image_urlsArray of image URLs
12_descriptionFull text description
13_phone_numberContact phone number (if available)
14_emailContact email address (if found)

🚀 Usage

Running Locally

  1. Install dependencies:

    $npm install
  2. Create input file .actor/INPUT.json:

    {
    "country": "uk",
    "searchQuery": "laptop",
    "maxItems": 10
    }
  3. Run the scraper:

    $npm start

Running on Apify Platform

  1. Upload to Apify Console or use Apify CLI:

    $apify push
  2. Configure input:

    • Country: Select UK, Australia, South Africa, or Ireland
    • Search Query: Enter your search term (e.g., "car", "apartment", "jobs")
    • Max Items: Number of listings to scrape (1-200)
  3. Run the Actor and download results as JSON, CSV, Excel, or HTML


🔧 Configuration

Input Parameters

{
"country": "uk", // Options: "uk", "ie", "za", "au"
"searchQuery": "laptop", // Your search term
"maxItems": 10 // Maximum listings to scrape (1-200)
}

Proxy Configuration

The actor uses Apify residential proxies by default. To configure your own:

const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'],
countryCode: 'GB' // or 'IE', 'ZA', 'AU'
});

Or use your Apify API key for residential proxies:

apify_api_ZZq2ZK353IifC85OqTJ8okjESKuwEf2VfZeK

🏗️ Architecture

Country-Specific Handlers

The scraper uses intelligent routing to handle different website structures:

  • UK (detail_uk): Uses CSS class selectors (e.g., css-1utqs9u-header-block)
  • Ireland (detail_ie): Primary JSON-LD schema.org extraction with CSS fallbacks
  • South Africa & Australia (detail): Uses data-q attribute selectors

Extraction Strategy

  1. JSON-LD First (Ireland) - Most reliable structured data
  2. data-q Attributes (ZA, AU) - Semantic attribute selectors
  3. CSS Classes (UK) - Specific class-based selectors
  4. Fallback Selectors - Multiple alternatives for each field
  5. Text Pattern Matching - Email regex extraction from descriptions

📁 Project Structure

gumtree-company-contact-scraper/
├── .actor/
│ ├── actor.json # Actor configuration (v10.9)
│ ├── input_schema.json # Input validation schema
│ └── dataset_schema.json # Output dataset schema
├── src/
│ ├── main.js # Entry point with country configs
│ └── routes.js # Request handlers (UK, IE, ZA/AU)
├── package.json # Dependencies and scripts
├── Dockerfile # Container configuration
└── README.md # This file

🔍 How It Works

  1. Start Handler - Processes search results page
  2. Enqueue Links - Finds all listing detail page URLs
  3. Country-Specific Handler - Routes to appropriate extraction logic
  4. Data Extraction:
    • Text fields (title, price, location, etc.)
    • Interactive elements (phone number reveal)
    • Image galleries
    • Structured attributes
    • Contact information
  5. Data Validation - Ensures consistent output format
  6. Dataset Push - Saves to Apify dataset

🛠️ Development

Running Tests

Test different countries and search queries:

# Test UK
echo '{"country":"uk","searchQuery":"laptop","maxItems":5}' > .actor/INPUT.json
npm start
# Test Ireland
echo '{"country":"ie","searchQuery":"jobs","maxItems":5}' > .actor/INPUT.json
npm start
# Test South Africa
echo '{"country":"za","searchQuery":"furniture","maxItems":5}' > .actor/INPUT.json
npm start
# Test Australia
echo '{"country":"au","searchQuery":"bicycle","maxItems":5}' > .actor/INPUT.json
npm start

Debugging

Enable debug logs in main.js:

const crawler = new PlaywrightCrawler({
proxyConfiguration,
requestHandler: router,
maxRequestsPerCrawl: maxItems + 20,
launchContext: {
launchOptions: {
args: ['--disable-gpu'],
headless: false, // Set to false to see browser
},
},
// Add this for debug logs:
log: log.child({ prefix: 'PlaywrightCrawler' }),
});

🌐 Country-Specific Notes

🇬🇧 United Kingdom

  • URL Pattern: /p/**/*
  • Selectors: CSS class-based (auto-generated classes)
  • Phone Reveal: Anchor tag with /reveal/number/ endpoint

🇮🇪 Ireland

  • URL Pattern: /**/*.html
  • Selectors: JSON-LD schema.org (most reliable)
  • Special Features: Rich JobPosting and BreadcrumbList schemas

🇿🇦 South Africa

  • URL Pattern: /a-/**/*
  • Selectors: data-q attributes
  • Phone Reveal: Button with data-q="reveal-phone-number"

🇦🇺 Australia

  • URL Pattern: /s-ad/**/*
  • Selectors: data-q attributes (similar to ZA)
  • Phone Reveal: Button with data-q="reveal-phone-number"

⚠️ Anti-Scraping Considerations

Gumtree implements several anti-bot measures:

  • Cloudflare Protection - Requires browser automation (✅ handled by Playwright)
  • reCAPTCHA - May trigger on suspicious patterns (✅ mitigated by proxies)
  • Rate Limiting - IP-based throttling (✅ use proxy rotation)
  • JavaScript Rendering - Heavy client-side rendering (✅ Playwright handles)
  • Phone Number Protection - Requires click interaction (✅ implemented)

Best Practices:

  • Use residential proxies (already configured)
  • Respect rate limits (adjust maxRequestsPerCrawl)
  • Add random delays if needed
  • Monitor for CAPTCHA challenges

📦 Dependencies

  • apify ^3.4.2 - Apify SDK for actors
  • crawlee ^3.13.8 - Web scraping and crawling library
  • playwright 1.54.1 - Browser automation

🔗 Resources


📝 Version History

v10.9 (Current)

  • ✅ Multi-country support (UK, IE, ZA, AU)
  • ✅ Country-specific route handlers
  • ✅ JSON-LD extraction for Ireland
  • ✅ Enhanced fallback selectors
  • ✅ Improved contact extraction
  • ✅ Dataset schema validation
  • ✅ Proxy configuration

📄 License

ISC


👤 Author

It's not you it's me


🤝 Contributing

Contributions, issues, and feature requests are welcome!


⭐ Show Your Support

Give a ⭐️ if this project helped you!


Built with ❤️ using Apify + Crawlee + Playwright