Data Gov India Actor
Pricing
$20.00 / 1,000 results
Data Gov India Actor
Data Gov India Actor connects to the Data.gov.in platform to access, search, and analyze open government datasets across domains like agriculture, health, finance, and environment. Ideal for data-driven projects and automation.
Pricing
$20.00 / 1,000 results
Rating
0.0
(0)
Developer

Yash Kavaiya
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
6 days ago
Last modified
Categories
Share
DataGovIN Sentinel ๐ฎ๐ณ
Production-Ready AI Actor for India's Open Government Data Platform
Advanced Discovery โข Secure Retrieval โข Ethical Analysis โข Compliance-First
๐ฏ Overview
DataGovIN Sentinel is a sophisticated Apify actor designed exclusively for interacting with data.gov.in, India's Open Government Data (OGD) Platform. This actor empowers citizens, researchers, policymakers, and developers to seamlessly discover, retrieve, analyze, and visualize datasets while maintaining full compliance with:
- โ Open Government Data (OGD) Policy India 2025
- โ Digital Personal Data Protection Act (DPDP) 2023
- โ Creative Commons CC-BY 4.0 Licensing
- โ API v3 Rate Limits (2000 requests/hour with OAuth 2.0)
๐ Key Features
๐ Intelligent Discovery
- Semantic Search: Natural language queries with SOLR-powered search
- Advanced Filtering: Filter by organization, sector, format, tags, date ranges
- Relevance Scoring: Automatic filtering of results by relevance threshold
- Faceted Navigation: Browse by organizations, sectors, tags, and formats
๐ฅ Secure Data Acquisition
- Multi-Format Support: CSV, JSON, XML, XLS, XLSX, TXT, TSV
- Smart Parsing: Automatic data parsing with preview generation
- Size Management: Configurable file size limits and streaming
- Rate-Limited Downloads: Respects API quotas with exponential backoff
๐ Analytics Engine
- Data Quality Assessment: Comprehensive quality scoring (0-100)
- Statistical Analysis: Descriptive statistics using simple-statistics
- Trend Detection: Linear regression for time-series analysis
- Resource Analysis: Format distribution, completeness metrics
๐ Governance & Compliance
- License Enforcement: Automatic CC-BY 4.0 attribution generation
- Access Control: Blocks restricted/confidential datasets
- PII Detection: DPDP Act 2023 compliance with pattern matching
- Audit Logging: Complete activity trail for transparency
โก Performance & Resilience
- Rate Limiting: Bottleneck-based throttling (configurable 100-2000 req/hour)
- Retry Logic: Exponential backoff with jitter (up to 5 attempts)
- Concurrent Processing: Parallel requests (1-10 concurrent)
- Fault Tolerance: Graceful error handling and fallback mechanisms
๐ Quick Start
Prerequisites
- Node.js 18+
- Apify account (for deployment)
- Optional: data.gov.in API key (for enhanced access)
Installation
Local Development
# Clone repositorygit clone <repository-url>cd data-gov-in-actor# Install dependenciesnpm install# Configure environment (optional)cp .env.example .env# Edit .env with your API credentials# Run locallynpm start
Deploy to Apify
# Install Apify CLInpm install -g apify-cli# Login to Apifyapify login# Deploy actorapify push
๐ Usage Guide
Operation Modes
The actor supports 4 operation modes:
1. ๐ Search Mode (Discovery)
Find datasets matching your query without downloading data.
Input Example:
{"mode": "search","query": "agriculture production statistics 2024","filters": {"organization": "Ministry of Agriculture","sector": "Agriculture","format": ["CSV", "JSON"],"dateFrom": "2024-01-01"},"maxResults": 50}
Output:
- Dataset metadata (title, description, organization, tags)
- Resource information (formats, sizes, URLs)
- Quality scores and relevance rankings
- Governance validation results
2. ๐ฅ Retrieve Mode (Data Acquisition)
Download and parse actual dataset files.
Input Example:
{"mode": "retrieve","query": "GDP quarterly data","includeResources": true,"resourceLimit": 3,"maxFileSize": 50,"analytics": {"dataQualityScore": true,"enableStatistics": true}}
Output:
- Full dataset metadata
- Parsed resource data with previews (first 100 rows)
- Data quality assessment
- Basic statistics (mean, median, std dev)
- Attribution text for proper citation
3. ๐ Analyze Mode (Deep Analytics)
Comprehensive analysis with statistical insights.
Input Example:
{"mode": "analyze","datasetIds": ["dataset-id-1", "dataset-id-2"],"analytics": {"enableStatistics": true,"enableTrends": true,"dataQualityScore": true}}
Output:
- Quality assessment with recommendations
- Resource completeness analysis
- Trend detection for time-series data
- PII scan results (DPDP compliance)
- Actionable insights and warnings
4. ๐๏ธ Monitor Mode (Platform Insights)
Track platform activity and discover trending datasets.
Input Example:
{"mode": "monitor","maxResults": 50}
Output:
- Recently updated datasets (last 30 days)
- Trending datasets (by views)
- Active organizations and their dataset counts
- Popular tags and categories
๐ง Configuration
Authentication (Optional)
For enhanced access and higher rate limits:
{"authentication": {"apiKey": "your-api-key-here","enableOAuth": false,"oauthToken": "your-oauth-token"}}
Note: Most public data is accessible without authentication.
Rate Limiting
Stay compliant with OGD Policy 2025:
{"rateLimit": {"requestsPerHour": 1800,"concurrentRequests": 5,"retryAttempts": 5}}
Default: 1800 requests/hour (buffer under 2000 limit)
Governance & Compliance
Configure ethical data handling:
{"governance": {"respectLicenses": true,"blockRestrictedData": true,"enableAuditLog": true,"piiDetection": true}}
๐ Output Format
Standard Output Structure
{"success": true,"mode": "retrieve","metadata": {"actor": "DataGovIN Sentinel","version": "1.0.0","executionTime": "2025-11-06T12:00:00.000Z","dataSource": "Open Government Data Platform India","compliance": {"ogdPolicy": "OGD Policy India 2025","dataProtection": "DPDP Act 2023","license": "CC-BY 4.0"}},"results": [ /* Array of datasets */ ],"statistics": {"resultsCount": 10,"apiRequests": 45,"successRate": "97.78%"},"compliance": {"summary": {"totalAccessAttempts": 10,"blockedDatasets": 2,"licenseWarnings": 1}},"attribution": "Data Source: ... Retrieved from ... License: CC-BY 4.0 ..."}
๐๏ธ Architecture
Module Overview
src/โโโ main.js # Main actor entry pointโโโ api-client.js # API client with rate limitingโโโ search-discovery.js # Search and discovery moduleโโโ data-acquisition.js # Data retrieval and parsingโโโ analytics.js # Statistical analysis engineโโโ governance.js # Compliance and ethics layerโโโ utils.js # Utility functions
Technology Stack
- Runtime: Node.js 18+
- Framework: Apify SDK 3.x
- HTTP Client: Axios with axios-retry
- Rate Limiting: Bottleneck
- Statistics: simple-statistics
- Data Parsing: csv-parse, xlsx
- Queue Management: p-queue
๐ Security & Compliance
Data Protection (DPDP Act 2023)
The actor includes PII detection using pattern matching for:
- Aadhaar numbers (12-digit patterns)
- PAN card numbers
- Email addresses
- Phone numbers
Automatic Actions:
- Detection warnings in output
- Optional redaction (if enabled)
- Audit log entries
License Compliance (CC-BY 4.0)
Automatic Attribution Generation:
Data Source: "Dataset Title" by Organization Name.Retrieved from Open Government Data Platform India (https://data.gov.in/...)on 2025-11-06. License: CC-BY 4.0.Attribution required under CC-BY 4.0 and OGD Policy India 2025.
API Rate Limits
OGD Policy 2025 Limits:
- Public Access: 2000 requests/hour
- Default Setting: 1800 requests/hour (10% buffer)
- Implementation: Bottleneck with reservoir refresh
๐ Use Cases
1. Research & Academia
{"mode": "retrieve","query": "education statistics literacy rates","filters": {"organization": "Ministry of Education","format": ["CSV"]},"includeResources": true,"analytics": {"enableStatistics": true,"enableTrends": true}}
2. Policy Analysis
{"mode": "analyze","query": "healthcare expenditure states","filters": {"sector": "Health","dateFrom": "2020-01-01"},"maxResults": 100}
3. Journalism & Transparency
{"mode": "search","query": "budget allocation 2024","filters": {"organization": "Ministry of Finance"},"governance": {"enableAuditLog": true}}
4. App Development
{"mode": "retrieve","datasetIds": ["weather-api-dataset"],"includeResources": true,"output": {"format": "json"}}
๐ Troubleshooting
Common Issues
API Connection Failed
Problem: Failed to connect to data.gov.in API
Solutions:
- Check internet connectivity
- Verify data.gov.in is accessible (not blocked)
- Try with API key if available
- Check rate limit status
Rate Limit Exceeded
Problem: Rate limit exceeded. Implement backoff.
Solutions:
- Reduce
requestsPerHourin configuration - Decrease
concurrentRequests - Wait for reservoir to refresh (1 hour)
Dataset Blocked
Problem: Dataset contains restricted content
Solutions:
- Review dataset tags and description
- Disable
blockRestrictedDataif false positive - Check governance audit log for details
Resource Download Failed
Problem: Failed to acquire resource
Solutions:
- Increase
maxFileSizelimit - Check resource URL availability
- Verify format is supported
- Review network logs
๐ API Reference
CKAN API v3 Endpoints
The actor uses data.gov.in's CKAN API:
- Base URL:
https://data.gov.in/api/3/action - Endpoints Used:
package_search- Search datasetspackage_show- Get dataset detailsresource_show- Get resource detailsorganization_list- List organizationsgroup_list- List sectors/groupstag_list- List tags
Official Documentation: data.gov.in/apis
๐ค Contributing
We welcome contributions! To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Follow existing code style
- Add JSDoc comments for new functions
- Test with real data.gov.in data
- Update documentation
- Maintain compliance checks
๐ License
This project is licensed under the Apache License 2.0.
Data License
Data retrieved from data.gov.in is subject to:
- Creative Commons CC-BY 4.0 (most datasets)
- Open Government Data Policy India 2025
- Individual dataset licenses (always check)
Attribution Required: Yes, for all CC-BY licensed datasets.
๐ Acknowledgments
- Open Government Data Platform India - For providing open data infrastructure
- CKAN Project - For the open-source data portal software
- Apify - For the actor platform and SDK
- Indian Government - For commitment to open data transparency
๐ Support
Issues & Questions
- GitHub Issues: [Report bugs or request features]
- Email: support@example.com
- Documentation: [Full API docs]
Resources
- data.gov.in - Official platform
- OGD Policy India - Policy details
- DPDP Act 2023 - Data protection law
- Apify Docs - Actor development
๐๏ธ Changelog
Version 1.0.0 (2025-11-06)
- โจ Initial release
- ๐ Search, Retrieve, Analyze, Monitor modes
- ๐ Analytics engine with statistical analysis
- ๐ Governance layer with DPDP Act 2023 compliance
- โก Rate limiting and fault tolerance
- ๐ Comprehensive documentation
- โ Full OGD Policy 2025 compliance
๐ฎ Roadmap
Upcoming Features
- Visualization generation (charts, maps)
- Multi-language support (Hindi, regional languages)
- Real-time monitoring with webhooks
- Advanced ML-based data quality scoring
- Integration with Google Earth Engine for geospatial data
- Export to multiple formats (Excel, PDF reports)
- Collaborative features (sharing, annotations)
Built with โค๏ธ for India's Open Data Community
Empowering citizens through transparent access to government data
๐ Stats