Colombia Open Data Scraper avatar

Colombia Open Data Scraper

Pricing

from $14.00 / 1,000 result items

Go to Apify Store
Colombia Open Data Scraper

Colombia Open Data Scraper

Export records from datos.gov.co, Colombia's national open-data portal. Pull rows from any dataset resource: COVID cases, government contracts, education, transport, health, public salaries. Filter by field values, sort by column, paginate full datasets.

Pricing

from $14.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

17 hours ago

Last modified

Share

ParseForge Banner

🇨🇴 Colombia Open Data Scraper

🚀 Export Colombia government datasets in seconds. Pull COVID epidemiology, public spending, healthcare, education, government contracts, public salaries, and thousands more datasets from the official datos.gov.co catalog. No login, no manual CSV stitching.

🕒 Last updated: 2026-05-22 · 📊 3 fields per record · 🇨🇴 Whole-of-government Colombia catalog · 🏛️ Hundreds of publishers

The Colombia Open Data Scraper taps the official datos.gov.co catalog and returns every row of any chosen dataset. The portal is the central transparency platform for the Colombian Government, hosting datasets from the Ministry of Health, DIAN tax authority, SECOP public-contracts system, DANE statistics agency, and dozens of other publishers.

Coverage spans health, public spending, education, security, transport, demographics, government contracts, public salaries, and environmental data. This Actor returns clean structured rows ready to download as CSV, Excel, JSON, or XML, with optional field filters and sort order applied at the source so you skip the data wrangling.

🎯 Target Audience💡 Primary Use Cases
LATAM data analysts, health-policy researchers, anti-corruption journalists, NGOs, civic-tech builders, public-contract auditorsHealth-policy dashboards, public-spending audits, COVID retrospective analyses, contract-fraud detection, education access tracking

📋 What the Colombia Open Data Scraper does

Three workflows in a single run:

  • 🏛️ Pull any dataset. Provide a 4x4 dataset identifier and the Actor streams every row.
  • 🎯 Exact-field filters. Combine column-level filters like department, sex, or status to slice the dataset.
  • 🔤 Custom sort order. Sort rows ascending or descending by any field before download.

Each row is returned with its original column structure preserved under a data object, plus the resource identifier and a timestamp.

💡 Why it matters: Colombia is one of the most transparent governments in LATAM, but the catalog is huge and the column schemas vary wildly. This Actor handles pagination, filter encoding, and refresh logic so your analysts and journalists can focus on the story.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded Colombia dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Rows to return. Free plan caps at 10, paid plan at 1,000,000.
resourceIdstring"gt2j-8ykr"Dataset identifier (4x4 code) from datos.gov.co.
filtersobject{}Exact-match column filters as a JSON object.
sortOrderstring""Sort spec, e.g. "fecha_reporte_web DESC".

Example: latest 50 reported COVID cases in Colombia.

{
"maxItems": 50,
"resourceId": "gt2j-8ykr",
"sortOrder": "fecha_reporte_web DESC"
}

Example: female COVID cases in Bogota.

{
"maxItems": 100,
"resourceId": "gt2j-8ykr",
"filters": { "departamento_nom": "BOGOTA", "sexo": "F" }
}

⚠️ Good to Know: dataset schemas vary by publisher and update cadence. Health datasets refresh daily, public-contract datasets can refresh hourly, and historical retrospective releases are static after publication. Always inspect the source dataset page for refresh notes before scheduling production pipelines.


📊 Output

Each row contains 3 top-level fields, with the full original column set nested in data. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🏷️ resourceIdstring"gt2j-8ykr"
📦 dataobject{ "fecha_reporte_web": "2026-05-20T00:00:00.000", "departamento_nom": "BOGOTA", "edad": "37", "sexo": "F", ... }
🕒 scrapedAtISO 8601"2026-05-22T00:00:00.000Z"

📦 Sample record


✨ Why choose this Actor

Capability
🇨🇴Whole-of-government catalog. Thousands of Colombian datasets in one Actor.
🎯Server-side filters. Exact-column filters reduce the dataset before download.
🔤Custom sort order. Sort rows ascending or descending by any field.
🔄Always fresh. Every run streams the latest published rows.
Fast. Pages of 100 rows, automatic retries on transient errors.
🔓No login. The Colombia Open Data catalog is free and public.
📦Export anywhere. CSV, Excel, JSON, or XML straight from the Apify dataset.

📊 Whether you are auditing SECOP contracts, modelling COVID waves, or tracking education enrolment, the same Actor backs every workflow.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Colombia Open Data Scraper (this Actor)$5 free credit, then pay-per-useThousands of CO datasetsLive per runfilters, sort⚡ 2 min
Manual CSV downloads from datos.gov.coFreeWhole catalogRe-download manuallyNone🐢 Slow
Custom Socrata clientsFree + dev timeCatalogSelf-managedSelf-coded🐌 Days
Paid LATAM data vendors$$$ subscriptionCurated subsetVendor cadenceVendor schema⏳ Weeks

Pick this Actor when you want server-side filtering, sort, and pagination with zero pipeline maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Colombia Open Data Scraper page on the Apify Store.
  3. 🎯 Set input. Paste a 4x4 dataset identifier from datos.gov.co, optionally add filters or sort order.
  4. 🚀 Run it. Click Start and let the Actor collect your rows.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🦠 Health Policy & Epidemiology

  • COVID retrospective analyses by department
  • Vaccination uptake and coverage dashboards
  • Hospital capacity and ICU utilisation tracking
  • Disease-surveillance dashboards

💼 Public Spending & Anti-Corruption

  • SECOP contract audits and anomaly detection
  • Public-salary transparency dashboards
  • Ministry budget vs. execution tracking
  • Procurement-fraud red flags

🚍 Transport & Security

  • TransMilenio and SITP ridership trends
  • Traffic-accident analytics by city
  • Crime statistics and policing dashboards
  • Highway tolls and concession analysis

🎓 Education & Demographics

  • Enrolment and dropout tracking by region
  • ICFES test-score analyses
  • Teacher distribution and access maps
  • Census-adjacent demographic dashboards

🔌 Automating Colombia Open Data Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream warehouses in sync automatically.


🌟 Beyond business use cases

Colombia open data fuels more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Public-health studies using ministry datasets
  • LATAM political-economy theses
  • Education-access research using ICFES microdata
  • Reproducible studies with cited dataset pulls

🎨 Personal and creative

  • Indie data-journalism projects
  • Side projects exploring Colombian history and demographics
  • Personal dashboards on public spending
  • Hobbyist epidemiology tracker tools

🤝 Non-profit and civic

  • Anti-corruption transparency dashboards on SECOP
  • NGO targeting using health and education data
  • Election-adjacent civic analyses
  • Investigative journalism on ministry budgets

🧪 Experimentation

  • Train Spanish-language NLP on government text
  • Prototype civic-tech apps with real public data
  • Validate health-tech product hypotheses with ministry data
  • Build geospatial demos for Colombian departments

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge Actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Paste a 4x4 dataset identifier from datos.gov.co, optionally add column filters and a sort order, and the Actor streams every matching row to your Apify dataset with the original schema preserved.

🔎 Where do I find a dataset identifier?

Open the dataset page on datos.gov.co. The identifier is the short alphanumeric code shown in the dataset URL after /resource/, formatted as four characters, dash, four characters (for example gt2j-8ykr).

📏 Does it cover every Colombian dataset?

The Actor works with any dataset published to the central catalog by national ministries, SECOP, DIAN, DANE, and partner publishers.

🔁 How fresh is the data?

Each run pulls the latest rows published at run time. The publishing cadence depends on the source publisher, ranging from hourly contract feeds to daily health updates.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to refresh your dataset on any cron interval and keep downstream pipelines in sync.

🎯 Can I filter the rows?

Yes. Use filters for exact-column matches and sortOrder for ascending or descending order by any field.

🆔 What does the 4x4 identifier mean?

It is the canonical short ID assigned by datos.gov.co to every published dataset. Two groups of four characters separated by a dash, used in dataset URLs and the data feed.

The datos.gov.co catalog is published under permissive open licenses. Always review the specific dataset license for attribution requirements and downstream compliance.

💼 Can I use this data commercially?

Yes. Most datasets are published under licenses that allow commercial reuse with attribution. You are responsible for downstream compliance in your own product.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 rows per run). A paid plan lifts the limit and unlocks scheduling, higher concurrency, and bigger datasets.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. Partial datasets are preserved so you never lose progress.

🆘 What if I need help?

Our support team is here to help. Reach us through the Apify platform or via the Tally form linked below.


🔌 Integrate with any app

Colombia Open Data Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe Colombia datasets into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh Colombia rows into your product backend, or alert your team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Government of Colombia, the Colombian open-data portal, or any publishing agency. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.