Data Gov India Actor avatar
Data Gov India Actor
Under maintenance

Pricing

$20.00 / 1,000 results

Go to Apify Store
Data Gov India Actor

Data Gov India Actor

Under maintenance

Data Gov India Actor connects to the Data.gov.in platform to access, search, and analyze open government datasets across domains like agriculture, health, finance, and environment. Ideal for data-driven projects and automation.

Pricing

$20.00 / 1,000 results

Rating

0.0

(0)

Developer

Yash Kavaiya

Yash Kavaiya

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

6 days ago

Last modified

Share

DataGovIN Sentinel ๐Ÿ‡ฎ๐Ÿ‡ณ

Production-Ready AI Actor for India's Open Government Data Platform

Advanced Discovery โ€ข Secure Retrieval โ€ข Ethical Analysis โ€ข Compliance-First


๐ŸŽฏ Overview

DataGovIN Sentinel is a sophisticated Apify actor designed exclusively for interacting with data.gov.in, India's Open Government Data (OGD) Platform. This actor empowers citizens, researchers, policymakers, and developers to seamlessly discover, retrieve, analyze, and visualize datasets while maintaining full compliance with:

  • โœ… Open Government Data (OGD) Policy India 2025
  • โœ… Digital Personal Data Protection Act (DPDP) 2023
  • โœ… Creative Commons CC-BY 4.0 Licensing
  • โœ… API v3 Rate Limits (2000 requests/hour with OAuth 2.0)

๐ŸŒŸ Key Features

๐Ÿ” Intelligent Discovery

  • Semantic Search: Natural language queries with SOLR-powered search
  • Advanced Filtering: Filter by organization, sector, format, tags, date ranges
  • Relevance Scoring: Automatic filtering of results by relevance threshold
  • Faceted Navigation: Browse by organizations, sectors, tags, and formats

๐Ÿ“ฅ Secure Data Acquisition

  • Multi-Format Support: CSV, JSON, XML, XLS, XLSX, TXT, TSV
  • Smart Parsing: Automatic data parsing with preview generation
  • Size Management: Configurable file size limits and streaming
  • Rate-Limited Downloads: Respects API quotas with exponential backoff

๐Ÿ“Š Analytics Engine

  • Data Quality Assessment: Comprehensive quality scoring (0-100)
  • Statistical Analysis: Descriptive statistics using simple-statistics
  • Trend Detection: Linear regression for time-series analysis
  • Resource Analysis: Format distribution, completeness metrics

๐Ÿ”’ Governance & Compliance

  • License Enforcement: Automatic CC-BY 4.0 attribution generation
  • Access Control: Blocks restricted/confidential datasets
  • PII Detection: DPDP Act 2023 compliance with pattern matching
  • Audit Logging: Complete activity trail for transparency

โšก Performance & Resilience

  • Rate Limiting: Bottleneck-based throttling (configurable 100-2000 req/hour)
  • Retry Logic: Exponential backoff with jitter (up to 5 attempts)
  • Concurrent Processing: Parallel requests (1-10 concurrent)
  • Fault Tolerance: Graceful error handling and fallback mechanisms

๐Ÿš€ Quick Start

Prerequisites

  • Node.js 18+
  • Apify account (for deployment)
  • Optional: data.gov.in API key (for enhanced access)

Installation

Local Development

# Clone repository
git clone <repository-url>
cd data-gov-in-actor
# Install dependencies
npm install
# Configure environment (optional)
cp .env.example .env
# Edit .env with your API credentials
# Run locally
npm start

Deploy to Apify

# Install Apify CLI
npm install -g apify-cli
# Login to Apify
apify login
# Deploy actor
apify push

๐Ÿ“– Usage Guide

Operation Modes

The actor supports 4 operation modes:

1. ๐Ÿ” Search Mode (Discovery)

Find datasets matching your query without downloading data.

Input Example:

{
"mode": "search",
"query": "agriculture production statistics 2024",
"filters": {
"organization": "Ministry of Agriculture",
"sector": "Agriculture",
"format": ["CSV", "JSON"],
"dateFrom": "2024-01-01"
},
"maxResults": 50
}

Output:

  • Dataset metadata (title, description, organization, tags)
  • Resource information (formats, sizes, URLs)
  • Quality scores and relevance rankings
  • Governance validation results

2. ๐Ÿ“ฅ Retrieve Mode (Data Acquisition)

Download and parse actual dataset files.

Input Example:

{
"mode": "retrieve",
"query": "GDP quarterly data",
"includeResources": true,
"resourceLimit": 3,
"maxFileSize": 50,
"analytics": {
"dataQualityScore": true,
"enableStatistics": true
}
}

Output:

  • Full dataset metadata
  • Parsed resource data with previews (first 100 rows)
  • Data quality assessment
  • Basic statistics (mean, median, std dev)
  • Attribution text for proper citation

3. ๐Ÿ“Š Analyze Mode (Deep Analytics)

Comprehensive analysis with statistical insights.

Input Example:

{
"mode": "analyze",
"datasetIds": ["dataset-id-1", "dataset-id-2"],
"analytics": {
"enableStatistics": true,
"enableTrends": true,
"dataQualityScore": true
}
}

Output:

  • Quality assessment with recommendations
  • Resource completeness analysis
  • Trend detection for time-series data
  • PII scan results (DPDP compliance)
  • Actionable insights and warnings

4. ๐Ÿ‘๏ธ Monitor Mode (Platform Insights)

Track platform activity and discover trending datasets.

Input Example:

{
"mode": "monitor",
"maxResults": 50
}

Output:

  • Recently updated datasets (last 30 days)
  • Trending datasets (by views)
  • Active organizations and their dataset counts
  • Popular tags and categories

๐Ÿ”ง Configuration

Authentication (Optional)

For enhanced access and higher rate limits:

{
"authentication": {
"apiKey": "your-api-key-here",
"enableOAuth": false,
"oauthToken": "your-oauth-token"
}
}

Note: Most public data is accessible without authentication.


Rate Limiting

Stay compliant with OGD Policy 2025:

{
"rateLimit": {
"requestsPerHour": 1800,
"concurrentRequests": 5,
"retryAttempts": 5
}
}

Default: 1800 requests/hour (buffer under 2000 limit)


Governance & Compliance

Configure ethical data handling:

{
"governance": {
"respectLicenses": true,
"blockRestrictedData": true,
"enableAuditLog": true,
"piiDetection": true
}
}

๐Ÿ“Š Output Format

Standard Output Structure

{
"success": true,
"mode": "retrieve",
"metadata": {
"actor": "DataGovIN Sentinel",
"version": "1.0.0",
"executionTime": "2025-11-06T12:00:00.000Z",
"dataSource": "Open Government Data Platform India",
"compliance": {
"ogdPolicy": "OGD Policy India 2025",
"dataProtection": "DPDP Act 2023",
"license": "CC-BY 4.0"
}
},
"results": [ /* Array of datasets */ ],
"statistics": {
"resultsCount": 10,
"apiRequests": 45,
"successRate": "97.78%"
},
"compliance": {
"summary": {
"totalAccessAttempts": 10,
"blockedDatasets": 2,
"licenseWarnings": 1
}
},
"attribution": "Data Source: ... Retrieved from ... License: CC-BY 4.0 ..."
}

๐Ÿ—๏ธ Architecture

Module Overview

src/
โ”œโ”€โ”€ main.js # Main actor entry point
โ”œโ”€โ”€ api-client.js # API client with rate limiting
โ”œโ”€โ”€ search-discovery.js # Search and discovery module
โ”œโ”€โ”€ data-acquisition.js # Data retrieval and parsing
โ”œโ”€โ”€ analytics.js # Statistical analysis engine
โ”œโ”€โ”€ governance.js # Compliance and ethics layer
โ””โ”€โ”€ utils.js # Utility functions

Technology Stack

  • Runtime: Node.js 18+
  • Framework: Apify SDK 3.x
  • HTTP Client: Axios with axios-retry
  • Rate Limiting: Bottleneck
  • Statistics: simple-statistics
  • Data Parsing: csv-parse, xlsx
  • Queue Management: p-queue

๐Ÿ” Security & Compliance

Data Protection (DPDP Act 2023)

The actor includes PII detection using pattern matching for:

  • Aadhaar numbers (12-digit patterns)
  • PAN card numbers
  • Email addresses
  • Phone numbers

Automatic Actions:

  • Detection warnings in output
  • Optional redaction (if enabled)
  • Audit log entries

License Compliance (CC-BY 4.0)

Automatic Attribution Generation:

Data Source: "Dataset Title" by Organization Name.
Retrieved from Open Government Data Platform India (https://data.gov.in/...)
on 2025-11-06. License: CC-BY 4.0.
Attribution required under CC-BY 4.0 and OGD Policy India 2025.

API Rate Limits

OGD Policy 2025 Limits:

  • Public Access: 2000 requests/hour
  • Default Setting: 1800 requests/hour (10% buffer)
  • Implementation: Bottleneck with reservoir refresh

๐ŸŽ“ Use Cases

1. Research & Academia

{
"mode": "retrieve",
"query": "education statistics literacy rates",
"filters": {
"organization": "Ministry of Education",
"format": ["CSV"]
},
"includeResources": true,
"analytics": {
"enableStatistics": true,
"enableTrends": true
}
}

2. Policy Analysis

{
"mode": "analyze",
"query": "healthcare expenditure states",
"filters": {
"sector": "Health",
"dateFrom": "2020-01-01"
},
"maxResults": 100
}

3. Journalism & Transparency

{
"mode": "search",
"query": "budget allocation 2024",
"filters": {
"organization": "Ministry of Finance"
},
"governance": {
"enableAuditLog": true
}
}

4. App Development

{
"mode": "retrieve",
"datasetIds": ["weather-api-dataset"],
"includeResources": true,
"output": {
"format": "json"
}
}

๐Ÿ› Troubleshooting

Common Issues

API Connection Failed

Problem: Failed to connect to data.gov.in API

Solutions:

  • Check internet connectivity
  • Verify data.gov.in is accessible (not blocked)
  • Try with API key if available
  • Check rate limit status

Rate Limit Exceeded

Problem: Rate limit exceeded. Implement backoff.

Solutions:

  • Reduce requestsPerHour in configuration
  • Decrease concurrentRequests
  • Wait for reservoir to refresh (1 hour)

Dataset Blocked

Problem: Dataset contains restricted content

Solutions:

  • Review dataset tags and description
  • Disable blockRestrictedData if false positive
  • Check governance audit log for details

Resource Download Failed

Problem: Failed to acquire resource

Solutions:

  • Increase maxFileSize limit
  • Check resource URL availability
  • Verify format is supported
  • Review network logs

๐Ÿ“š API Reference

CKAN API v3 Endpoints

The actor uses data.gov.in's CKAN API:

  • Base URL: https://data.gov.in/api/3/action
  • Endpoints Used:
    • package_search - Search datasets
    • package_show - Get dataset details
    • resource_show - Get resource details
    • organization_list - List organizations
    • group_list - List sectors/groups
    • tag_list - List tags

Official Documentation: data.gov.in/apis


๐Ÿค Contributing

We welcome contributions! To contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow existing code style
  • Add JSDoc comments for new functions
  • Test with real data.gov.in data
  • Update documentation
  • Maintain compliance checks

๐Ÿ“„ License

This project is licensed under the Apache License 2.0.

Data License

Data retrieved from data.gov.in is subject to:

  • Creative Commons CC-BY 4.0 (most datasets)
  • Open Government Data Policy India 2025
  • Individual dataset licenses (always check)

Attribution Required: Yes, for all CC-BY licensed datasets.


๐Ÿ™ Acknowledgments

  • Open Government Data Platform India - For providing open data infrastructure
  • CKAN Project - For the open-source data portal software
  • Apify - For the actor platform and SDK
  • Indian Government - For commitment to open data transparency

๐Ÿ“ž Support

Issues & Questions

  • GitHub Issues: [Report bugs or request features]
  • Email: support@example.com
  • Documentation: [Full API docs]

Resources


๐Ÿ—“๏ธ Changelog

Version 1.0.0 (2025-11-06)

  • โœจ Initial release
  • ๐Ÿ” Search, Retrieve, Analyze, Monitor modes
  • ๐Ÿ“Š Analytics engine with statistical analysis
  • ๐Ÿ”’ Governance layer with DPDP Act 2023 compliance
  • โšก Rate limiting and fault tolerance
  • ๐Ÿ“ Comprehensive documentation
  • โœ… Full OGD Policy 2025 compliance

๐Ÿ”ฎ Roadmap

Upcoming Features

  • Visualization generation (charts, maps)
  • Multi-language support (Hindi, regional languages)
  • Real-time monitoring with webhooks
  • Advanced ML-based data quality scoring
  • Integration with Google Earth Engine for geospatial data
  • Export to multiple formats (Excel, PDF reports)
  • Collaborative features (sharing, annotations)

Built with โค๏ธ for India's Open Data Community

Empowering citizens through transparent access to government data


๐Ÿ“Š Stats

Node License Compliance Security