Gvt Procurement Scraper avatar

Gvt Procurement Scraper

Pricing

from $1.00 / actor start

Go to Apify Store
Gvt Procurement Scraper

Gvt Procurement Scraper

Scrape tender notices & contract awards from 11 government procurement databases across 9 countries. Covers US, EU, UK, Ukraine, France, Brazil, Australia & Canada. Filter by keyword, date, and contract value.

Pricing

from $1.00 / actor start

Rating

0.0

(0)

Developer

Zac

Zac

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Government & Public Procurement Data Scraper

A powerful Apify actor for scraping government tender databases and contract awards from multiple sources including SAM.gov, TED EU, and UK Find a Tender.

Features

  • Multi-Source Support: Simultaneously scrape from multiple government procurement databases
  • Flexible Data Extraction: Extract tender notices, contract awards, or both
  • Advanced Filtering: Filter by keywords, date range, contract value, and more
  • Multiple Output Formats: Export results as JSON, CSV, or JSONL
  • Proxy Support: Built-in support for Apify's residential proxies to avoid blocking
  • Error Handling: Robust error handling with detailed logging
  • Rate Limiting: Automatic rate limiting to respect source websites

Supported Databases

  • SAM.gov (USA Federal Procurement)
  • TED EU (European Union Tenders)
  • UK Find a Tender (United Kingdom Government Contracts)

Installation

  1. Clone or create this actor in Apify Console
  2. Install dependencies:
$npm install

Usage

Input Parameters

{
"sources": ["sam.gov", "ted.eu", "uk.findatender"],
"dataTypes": ["tender_notices", "contract_awards"],
"searchQuery": "software development",
"keywords": ["IT", "cloud"],
"dateFrom": "2024-01-01",
"dateTo": "2024-12-31",
"maxResults": 5000,
"minContractValue": 50000,
"maxContractValue": "5000000",
"outputFormat": "json",
"useProxy": true,
"browserTimeout": 60,
"debugMode": false
}

Parameters Explained

ParameterTypeDescriptionDefault
sourcesarrayData sources to scrape["sam.gov"]
dataTypesarrayTypes of data to extract["tender_notices", "contract_awards"]
searchQuerystringSearch keywords (e.g., "software", "consulting")""
keywordsarrayAdditional filter keywords[]
dateFromstringStart date (YYYY-MM-DD format)""
dateTostringEnd date (YYYY-MM-DD format)""
maxResultsintegerMaximum results to collect (1-10000)1000
minContractValuenumberMinimum contract value in USD0
maxContractValuestringMaximum contract value in USD""
outputFormatstringOutput format: json, csv, jsonl"json"
useProxybooleanUse Apify residential proxyfalse
browserTimeoutintegerTimeout for browser operations (10-600s)60
debugModebooleanEnable verbose loggingfalse

Output

Results are stored in Apify's Key-Value Store with the following structure:

{
"id": "notice_id",
"source": "sam.gov",
"type": "tender_notice",
"title": "Tender Title",
"description": "Full tender description",
"url": "https://...",
"postedDate": "2024-01-15T10:30:00Z",
"deadline": "2024-02-15",
"organization": "Agency Name",
"budget": "500000",
"metadata": {
"country": "US",
"category": "Software",
"reference": "SOL-2024-001"
},
"scrapedAt": "2024-01-20T12:00:00Z"
}

Running Locally

$npm start

Set environment variable DEBUG=true for verbose logging:

$DEBUG=true npm start

API Documentation

SamGovScraper

Scrapes tender opportunities from SAM.gov using their REST API.

const scraper = new SamGovScraper(input);
const results = await scraper.scrape();

TedEuScraper

Scrapes tender notices from TED EU using browser automation.

const scraper = new TedEuScraper(input);
const results = await scraper.scrape();

UkFindTenderScraper

Scrapes government contracts from UK Find a Tender using browser automation.

const scraper = new UkFindTenderScraper(input);
const results = await scraper.scrape();

DataProcessor

Utility class for formatting and filtering results.

const processor = new DataProcessor('json');
const formatted = processor.format(data);
// Helper methods
DataProcessor.normalizeTender(tender);
DataProcessor.filterByKeywords(tenders, ['IT', 'cloud']);
DataProcessor.sortByDate(tenders, 'desc');
DataProcessor.removeDuplicates(tenders);

Best Practices

  1. Rate Limiting: The scrapers include built-in rate limiting. Don't remove delays between requests.
  2. Proxy Usage: For high-volume scraping or repeated runs, enable proxy support to avoid IP blocking.
  3. Search Strategy: Be specific with search queries to get more relevant results and reduce processing time.
  4. Date Ranges: Use date filters to limit scope and reduce unnecessary processing.
  5. Keywords: Use keywords to post-filter results for higher relevance.

Error Handling

The actor captures and logs errors from each source separately. If one source fails, others continue processing. A summary of errors is provided in the output.

Performance Tips

  • Limit Results: Set maxResults to a reasonable number to reduce processing time
  • Narrow Date Range: Use dateFrom and dateTo to focus on recent tenders
  • Use Keywords: Pre-filter with keywords to reduce unnecessary data collection
  • Disable Debug Mode: Keep debugMode: false for production runs for better performance

Troubleshooting

No Results Returned

  • Check if the source website is accessible
  • Verify search parameters are valid
  • Try with a broader search query
  • Check if proxy is needed due to IP blocking

Timeout Errors

  • Increase browserTimeout parameter
  • Try with fewer results or narrower date range
  • Enable proxy support for more reliable connections

Rate Limit Errors

  • Reduce maxResults or split into smaller date ranges
  • Increase delays between requests (modify scraper code)
  • Use Apify residential proxies

License

Apache 2.0

Support

For issues or questions, please create an issue in the repository or contact the development team.

Contributing

Contributions are welcome! Please follow the existing code style and add tests for new features.

Adding a New Source

  1. Create a new scraper file in src/scrapers/ (e.g., newSourceScraper.js)
  2. Implement the scraper class extending the base pattern
  3. Add the scraper to src/main.js in the scrapers map
  4. Update input schema in .actor/input_schema.json
  5. Add documentation in this README

Roadmap

  • Add support for additional databases (France, Germany, Canada)
  • Implement contract award extraction for all sources
  • Add email notification on new matching tenders
  • Create web dashboard for tracking results
  • Add machine learning for opportunity relevance scoring
  • Implement incremental scraping with state management