NASA Exoplanet Archive Scraper
Pricing
Pay per usage
NASA Exoplanet Archive Scraper
Extract confirmed exoplanet data from NASA Exoplanet Archive TAP API. Filter by discovery method and year. Returns orbital period, mass, radius, equilibrium temp, and stellar data.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Compute Edge
Actor stats
1
Bookmarked
1
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Extract confirmed exoplanet data from the NASA Exoplanet Archive — Query the world's most comprehensive confirmed exoplanet dataset via the Table Access Protocol (TAP) API. This Actor retrieves exoplanet discovery data, orbital parameters, and stellar information for over 5,500 confirmed planets without any authentication required.
What data can you scrape from NASA Exoplanet Archive?
| Field | Description |
|---|---|
| pl_name | Exoplanet name (e.g., "Kepler-1167 b") |
| hostname | Host star name (e.g., "Kepler-1167") |
| disc_year | Year of discovery (integer) |
| discoverymethod | Discovery method (Transit, Radial Velocity, Direct Imaging, etc.) |
| pl_orbper | Orbital period in days (float) |
| pl_bmasse | Planet mass in Earth masses (float) |
| pl_rade | Planet radius in Earth radii (float) |
| pl_eqt | Equilibrium temperature in Kelvin (integer) |
| sy_snum | Number of stars in the system (integer) |
| sy_pnum | Number of planets in the system (integer) |
| url | NASA Exoplanet Archive overview page URL |
Why use NASA Exoplanet Archive Scraper?
- No authentication required — Direct access to NASA's public scientific data
- Comprehensive dataset — 5,500+ confirmed exoplanets with standardized fields
- Flexible filtering — Filter by discovery method, discovery year range, and result limit
- Research-ready data — All fields follow NASA's standardized variable naming conventions
- Fast API access — Direct fetch from TAP API with no browser overhead
- Scientific accuracy — Data curated and maintained by NASA's Exoplanet Archive team
- Perfect for analysis — Ideal input for exoplanet trend analysis, statistical research, or educational visualization
How to scrape NASA Exoplanet Archive data
- Go to the NASA Exoplanet Archive Scraper on Apify Store
- Click Try for free
- (Optional) Set Discovery Method (e.g.,
Transit,Radial Velocity,Direct Imaging) — leave empty to include all methods - Set Start Year to filter planets discovered in this year or later (default: 1992, the year of the first confirmed exoplanet discovery)
- Set End Year to filter planets discovered in this year or earlier (default: 2026)
- Set Max Results to limit the number of exoplanets returned (default: 500, max: 10,000)
- Click Start and wait for the API to return results
Input example
{"discoveryMethod": "Transit","startYear": 2020,"endYear": 2023,"maxResults": 20}
Expected: 20 exoplanets discovered via the Transit method between 2020-2023, sorted by discovery year (newest first).
Output example
Each exoplanet returns a JSON object like this:
{"pl_name": "TOI-1695 b","hostname": "TOI-1695","disc_year": 2023,"discoverymethod": "Transit","pl_orbper": 5.26,"pl_bmasse": 4.87,"pl_rade": 1.98,"pl_eqt": 874,"sy_snum": 1,"sy_pnum": 3,"url": "https://exoplanetarchive.ipac.caltech.edu/overview/TOI-1695%20b"}
You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
Data field reference
Planet Properties
pl_name: Exoplanet name (string)hostname: Host star name (string)disc_year: Discovery year (integer, 1992-present)discoverymethod: How the planet was discovered (string, enum)pl_orbper: Orbital period in days (float)pl_bmasse: Planet mass in Earth masses (float)pl_rade: Planet radius in Earth radii (float)pl_eqt: Equilibrium temperature in Kelvin (integer)
System Properties
sy_snum: Number of stars in the system (integer)sy_pnum: Number of planets in the system (integer)
Discovery Methods Filter
Common discovery methods include:
Transit— Exoplanet transit (planet passes in front of star)Radial Velocity— Doppler shift of host starDirect Imaging— Direct observation of planet's lightMicrolensing— Gravitational lensing effectTiming Variations— Pulsar/eclipse timing anomaliesTTV— Transit Timing VariationsOther— Less common detection methods
How much does it cost to scrape NASA Exoplanet Archive?
This Actor uses direct API fetch (no browser, no crawling), so compute costs are extremely low.
| Scenario | Results | Est. Compute | Est. Actor Fee |
|---|---|---|---|
| Quick sample (20 planets) | 20 | ~$0.001 | ~$0.02 |
| Medium query (500 planets) | 500 | ~$0.01 | ~$0.50 |
| Maximum extraction (10,000 planets) | 10,000 | ~$0.05 | ~$10.00 |
Actor pricing: $0.001 per result + minimal compute costs (usually under $0.01 per run). A typical extraction of 100 exoplanets costs under $0.15 total.
Tips for efficient scraping
- Use discovery method filter — Narrow your query to a specific detection method to reduce results and cost. For example, "Transit" returns ~3,500 planets; "Radial Velocity" returns ~900.
- Set realistic year ranges — Discovery rate varies by year. Between 2010-2015, ~1,000 planets were discovered. Recent years (2020+) show ~1,000+ per year.
- Batch multiple queries — If you need all exoplanets, run one Actor with
discoveryMethodempty instead of separate runs per method. - Schedule recurring updates — Set up a weekly or monthly schedule to track newly discovered exoplanets.
FAQ & support
Q: How current is this data? A: The NASA Exoplanet Archive is updated continuously as new discoveries are confirmed. This Actor queries the live API, so data is current within hours of confirmation.
Q: What if I need stellar properties (temperature, mass, radius)?
A: The current Actor focuses on planet and system properties. To get stellar data, you would need to query additional TAP tables (e.g., stars) and merge the results. Contact support if you need this feature.
Q: Can I filter by planet properties (e.g., mass > 10 Earth masses)? A: The current Actor supports year and method filtering. Custom SQL WHERE clauses could be added in a future version. Please open an issue on GitHub if this is important for your use case.
Q: Is this legal to use? A: Yes. NASA's Exoplanet Archive data is public domain and explicitly designed for research, education, and analysis. No terms-of-service restrictions apply.
Q: Can I use this data commercially? A: Yes. NASA public data has no copyright restrictions. You may use, modify, and commercialize the data as needed.
Need a custom solution?
This Actor provides a flexible foundation for exoplanet research. If you need:
- Custom TAP query syntax
- Integration with other NASA datasets
- Real-time webhooks on new discoveries
- Scheduled extractions with versioning
Please contact support or open an issue on the Actor's GitHub repository.
Data attribution
Data sourced from the NASA Exoplanet Archive, a service of the NASA Exoplanet Science Institute at the California Institute of Technology, funded by NASA's Science Mission Directorate.
Citation: NASA Exoplanet Science Institute (2024). Exoplanet Archive. Retrieved from https://exoplanetarchive.ipac.caltech.edu
- Apify SDK - toolkit for building Actors
- Crawlee - web scraping and browser automation library
- Input schema - define and easily validate a schema for your Actor's input
- Dataset - store structured data where each object stored has the same attributes
- Cheerio - a fast, flexible & elegant library for parsing and manipulating HTML and XML
- Proxy configuration - rotate IP addresses to prevent blocking
Resources
- Quick Start guide for building your first Actor
- Video tutorial on building a scraper using CheerioCrawler
- Written tutorial on building a scraper using CheerioCrawler
- Web scraping with Cheerio in 2023
- How to scrape a dynamic page using Cheerio
- Integration with Zapier, Make, Google Drive and others
- Video guide on getting data using Apify API
Creating Actors with templates
Getting started
For complete information see this article. To run the Actor use the following command:
$apify run
Deploy to Apify
Connect Git repository to Apify
If you've created a Git repository for the project, you can easily connect to Apify:
- Go to Actor creation page
- Click on Link Git Repository button
Push project on your local machine to Apify
You can also deploy the project on your local machine to Apify without the need for the Git repository.
-
Log in to Apify. You will need to provide your Apify API Token to complete this action.
$apify login -
Deploy your Actor. This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.
$apify push
Documentation reference
To learn more about Apify and Actors, take a look at the following resources: