Government Tender Scraper avatar
Government Tender Scraper

Pricing

from $20.00 / 1,000 results

Go to Apify Store
Government Tender Scraper

Government Tender Scraper

A powerful Apify Actor that scrapes government tender listings from multiple official portals across different countries into a single normalized dataset. Built with the Adapter/Plugin pattern for easy extensibility.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

HappiTap

HappiTap

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

a minute ago

Last modified

Share

A powerful Apify Actor that scrapes government tender listings from multiple official portals across different countries into a single normalized dataset. Built with the Adapter/Plugin pattern for easy extensibility.

๐ŸŒ Supported Portals

RegionPortalOfficial URLNotes
๐Ÿ‡บ๐Ÿ‡ธ USASAM.govsam.govFederal procurement opportunities
๐Ÿ‡ช๐Ÿ‡บ EUTEDted.europa.euEU-wide tenders
๐Ÿ‡ฌ๐Ÿ‡ง UKFind a Tendergov.uk/find-tenderUK post-Brexit procurement
๐Ÿ‡ฎ๐Ÿ‡ณ IndiaCPPP (GePNIC)etenders.gov.inHigh volume Indian tenders
๐Ÿ‡จ๐Ÿ‡ฆ CanadaCanadaBuyscanadabuys.canada.caCanadian government procurement

โœจ Features

  • Multi-portal support with unified output schema
  • Smart deduplication across runs
  • Keyword filtering with include/exclude support
  • Date range filtering for published and deadline dates
  • Detail page scraping with full tender information
  • Document/attachment tracking
  • Webhook notifications for new tenders
  • Proxy support with Apify proxy integration
  • Error handling with retry logic and classification
  • Metrics & observability with detailed statistics

๐Ÿš€ Quick Start

Example 1: US Tenders (Last 7 Days)

{
"country": "US",
"keywords": ["IT", "software"],
"publishedLastNDays": 7,
"maxItems": 200,
"includeDetails": true
}

Example 2: UK Construction Tenders

{
"country": "UK",
"keywords": ["construction", "road"],
"deadlineTo": "2024-02-28",
"maxItems": 100
}

Example 3: India IT Tenders with Value Filter

{
"country": "IN",
"keywords": ["software", "development"],
"publishedLastNDays": 14,
"minValue": 1000000,
"maxItems": 500
}

Example 4: EU Cybersecurity Tenders

{
"country": "EU",
"keywords": ["cybersecurity", "security"],
"category": "72000000",
"maxItems": 300
}

Example 5: Canada Healthcare Tenders

{
"country": "CA",
"keywords": ["healthcare", "medical"],
"region": "Ontario",
"publishedLastNDays": 30,
"maxItems": 150
}

๐Ÿ“‹ Input Parameters

Required

  • country (string) - Country/region to scrape: US, EU, UK, IN, CA

Search Filters

  • keywords (array) - Keywords to search for (OR logic)
  • excludeKeywords (array) - Keywords to exclude from results
  • dateFrom (string) - Published date from (YYYY-MM-DD)
  • dateTo (string) - Published date to (YYYY-MM-DD)
  • publishedLastNDays (number) - Filter last N days (overrides dateFrom/dateTo)
  • deadlineFrom (string) - Deadline date from (YYYY-MM-DD)
  • deadlineTo (string) - Deadline date to (YYYY-MM-DD)
  • buyerName (string) - Filter by buyer/agency name
  • region (string) - Filter by geographic location
  • category (string) - Filter by category (CPV/NAICS codes)
  • minValue (number) - Minimum tender value
  • maxValue (number) - Maximum tender value

Crawl Controls

  • maxItems (number, default: 500) - Maximum tenders to scrape
  • maxPages (number, default: 50) - Maximum list pages to scrape
  • includeDetails (boolean, default: true) - Scrape full detail pages
  • downloadAttachments (boolean, default: false) - Download documents (experimental)
  • startUrls (array) - Override with custom start URLs

Runtime Settings

  • useProxy (boolean, default: true) - Use Apify proxy
  • proxyGroups (array, default: ["RESIDENTIAL"]) - Proxy groups to use
  • maxConcurrency (number, default: 10) - Concurrent requests
  • requestRetries (number, default: 5) - Retry attempts
  • minDelayMs (number, default: 1000) - Minimum delay between requests
  • maxDelayMs (number, default: 3000) - Maximum delay between requests

Integration

  • webhookUrl (string) - URL for webhook notifications
  • webhookSecret (string) - Secret for webhook authentication
  • notifyOnlyNew (boolean, default: true) - Only notify for new tenders

๐Ÿ“Š Output Schema

Each tender is normalized into this unified format:

{
"sourceCountry": "US",
"sourcePortal": "sam_gov",
"sourceId": "12345678",
"title": "IT Services Contract",
"tenderUrl": "https://sam.gov/opp/12345678/view",
"buyerName": "Department of Defense",
"publishedAt": "2024-01-15T00:00:00.000Z",
"deadlineAt": "2024-02-15T23:59:59.000Z",
"scrapedAt": "2024-01-20T10:30:00.000Z",
"hash": "abc123def456...",
"summary": "Procurement for IT services...",
"status": "open",
"procurementType": "services",
"categoryCodes": ["541512"],
"locations": [
{
"country": "US",
"state": "VA",
"city": "Arlington"
}
],
"estimatedValue": {
"amount": 1000000,
"currency": "USD"
},
"documents": [
{
"name": "RFP Document",
"url": "https://...",
"type": "attachment",
"sizeBytes": 524288
}
],
"contact": {
"name": "John Doe",
"email": "john.doe@agency.gov",
"phone": "+1-555-0100"
},
"awardedTo": null,
"raw": { }
}

๐Ÿ—๏ธ Architecture

Adapter Pattern

The actor uses a plugin-based architecture where each portal has its own adapter:

src/
โ”œโ”€โ”€ adapters/
โ”‚ โ”œโ”€โ”€ us/SamGovAdapter.js # USA (SAM.gov)
โ”‚ โ”œโ”€โ”€ eu/TedAdapter.js # EU (TED)
โ”‚ โ”œโ”€โ”€ uk/FindATenderAdapter.js # UK
โ”‚ โ”œโ”€โ”€ in/CpppAdapter.js # India (CPPP)
โ”‚ โ””โ”€โ”€ ca/CanadaBuysAdapter.js # Canada
โ”œโ”€โ”€ core/
โ”‚ โ”œโ”€โ”€ BaseAdapter.js # Base adapter interface
โ”‚ โ”œโ”€โ”€ AdapterRegistry.js # Adapter registry
โ”‚ โ”œโ”€โ”€ TenderCrawler.js # Main crawler engine
โ”‚ โ”œโ”€โ”€ DedupeManager.js # Deduplication logic
โ”‚ โ”œโ”€โ”€ MetricsCollector.js # Statistics tracking
โ”‚ โ”œโ”€โ”€ ErrorHandler.js # Error classification
โ”‚ โ””โ”€โ”€ WebhookNotifier.js # Webhook integration
โ””โ”€โ”€ main.js # Entry point

Adding New Portals

To add a new portal, create a new adapter extending BaseAdapter:

import { BaseAdapter } from '../../core/BaseAdapter.js';
export class NewPortalAdapter extends BaseAdapter {
getSourceCountry() {
return 'XX';
}
getSourcePortal() {
return 'new_portal';
}
async buildStartRequests(normalizedQuery) {
// Build initial search URLs
}
async parseListPage(context) {
// Parse list page and extract tender stubs
return { items: [], nextRequests: [] };
}
async parseDetailPage(context, stub) {
// Parse detail page and return normalized tender
return this.normalize(tender);
}
}

Then register it in AdapterRegistry.js:

this.register('XX', NewPortalAdapter);

๐Ÿ“ˆ Metrics & Monitoring

The actor provides detailed metrics:

  • tendersFound - Total tenders discovered
  • tendersSaved - Tenders saved to dataset
  • detailsFetched - Detail pages scraped
  • duplicatesSkipped - Duplicate tenders filtered
  • parsingErrors - Parsing failures
  • networkErrors - Network/timeout errors
  • blockedCount - Blocked requests (403/401)
  • captchaCount - Captcha encounters
  • retryCount - Total retry attempts

Access metrics in the Key-Value Store under the STATS key.

๐Ÿ”ง Error Handling

Errors are classified into types:

  • BLOCKED - 403/401 responses (proxy/IP issues)
  • RATE_LIMITED - 429 responses (too many requests)
  • CAPTCHA - Captcha detection
  • PARSING_ERROR - HTML/JSON parsing failures
  • NETWORK_ERROR - Timeouts, connection errors
  • UNKNOWN - Other errors

Failed requests are logged to the Key-Value Store under ERROR_LOG.

๐Ÿ”” Webhook Integration

Configure webhooks to receive notifications for new tenders:

{
"webhookUrl": "https://your-api.com/webhook",
"webhookSecret": "your-secret-key",
"notifyOnlyNew": true
}

Webhook payload:

{
"event": "new_tenders",
"timestamp": "2024-01-20T10:30:00.000Z",
"count": 5,
"tenders": [ ]
}

The webhook includes an X-Webhook-Signature header with HMAC-SHA256 signature for verification.

๐Ÿ”„ Incremental Crawling

The actor supports incremental crawling with state persistence:

  • Deduplication state is saved between runs
  • Previously seen tenders are automatically skipped
  • Perfect for scheduled runs (daily/weekly)

๐Ÿงช Testing

Run with a small dataset first:

{
"country": "US",
"keywords": ["test"],
"maxItems": 10,
"maxPages": 2
}

๐Ÿ“ Best Practices

  1. Start small - Test with maxItems: 10 first
  2. Use proxies - Always enable useProxy: true for production
  3. Set delays - Use minDelayMs and maxDelayMs to avoid rate limits
  4. Filter early - Use keywords and date filters to reduce load
  5. Monitor metrics - Check STATS and ERROR_LOG after runs
  6. Schedule runs - Use Apify Scheduler for daily/weekly updates

๐Ÿšจ Limitations

  • Some portals may require authentication (not currently supported)
  • Captcha protection may block automated access
  • Rate limits vary by portal
  • Document downloads are experimental
  • Some portals may change their HTML structure

๐Ÿ“„ License

Apache-2.0

๐Ÿค Contributing

To add support for new portals:

  1. Create a new adapter in src/adapters/
  2. Implement the required methods
  3. Register in AdapterRegistry.js
  4. Test thoroughly
  5. Submit a pull request

๐Ÿ’ก Use Cases

  • Tender monitoring - Track opportunities in your industry
  • Market research - Analyze government spending patterns
  • Competitive intelligence - Monitor competitor wins
  • Lead generation - Find relevant procurement opportunities
  • Data analysis - Build datasets for research
  • Automated alerts - Get notified of matching tenders

๐Ÿ†˜ Support

For issues or questions:

  • Check the error logs in Key-Value Store
  • Review metrics in STATS
  • Enable debug logging
  • Contact Apify support

Built with โค๏ธ using Apify SDK and Crawlee