Nasa Reports Scraper
Pricing
from $0.00005 / actor start
Nasa Reports Scraper
Access NASA's Technical Reports Server (NTRS) with an automated scraper that collects scientific papers, conference proceedings, journal articles, and research reports. Provides structured metadata for researchers, scientists, and academics needing large-scale access to NASA's technical publications
Pricing
from $0.00005 / actor start
Rating
0.0
(0)
Developer

Akash Kumar Naik
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
NASA Technical Reports Scraper (NTRS)
Extract structured metadata from NASA Technical Reports Server (NTRS) search results, with optional citation API enrichment and optional PDF downloads.
What This Actor Does
This Actor searches ntrs.nasa.gov for a keyword query and stores report metadata to the default dataset.
It is designed for research workflows where you need:
- Repeatable NTRS collection for a query
- Cleaned metadata fields (title, authors, dates, categories)
- Optional PDF URL capture and optional PDF file download
- Optional API enrichment for better field completeness
Why Use This Actor
- Handles dynamic NTRS search pages with Playwright
- Moves through pagination reliably (in-page "Next page" flow)
- Deduplicates by citation ID during a run
- Backfills missing fields from
https://ntrs.nasa.gov/api/citations/{id}when enabled - Supports optional PPE charging attempts per scraped record
Typical Use Cases
- Literature review and bibliography building
- Space-tech trend analysis across years/topics
- Building downstream datasets for NLP or search indexing
- Monitoring new reports for a recurring query
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
searchQuery | string | Yes | "alien" | Search phrase sent to NTRS |
maxResults | integer | No | 100 | Maximum items to save. 0 means unlimited |
startPage | integer | No | 1 | Start scraping from this search page |
downloadPDFs | boolean | No | true | Download PDF binaries to key-value store when available |
proxyConfiguration | object | No | { "useApifyProxy": false } | Apify proxy settings |
Example Input
Metadata-first run
{"searchQuery": "mars sample return","maxResults": 50,"startPage": 1,"downloadPDFs": false}
Include PDF downloads
{"searchQuery": "lunar habitat","maxResults": 25,"downloadPDFs": true}
Output
Dataset item
Each item includes fields such as:
titleurliddocumentIdabstractdocumentTypeauthorspublicationDatedateAcquiredsubjectCategoryacquisitionSourcereportNumberorganizationdistributionLimitscopyrightmeetingInformation(when available)pdfUrl(only if a PDF URL exists)downloadedPdfPath(only if PDF was downloaded)
Example:
{"title": "Searching for Alien Life Having Unearthly Biochemistry","url": "https://ntrs.nasa.gov/citations/20040015106","id": "20040015106","documentType": "Preprint (Draft being sent to journal)","authors": "Jones, Harry\n(NASA Ames Research Center Moffett Field, CA, United States)","publicationDate": "January 1, 2003","organization": "NASA Ames Research Center","pdfUrl": "https://ntrs.nasa.gov/api/citations/20040015106/downloads/20040015106.pdf?attachment=true"}
Key-value store SUMMARY
The Actor also writes a SUMMARY record with:
searchQuerytotalResultstotalPagesProcessedlastPageVisitedusePPEppeEventNameppeChargeAttemptscompletedtimestamp
How To Run
On Apify platform
- Open Actor input.
- Set
searchQueryand optional filters. - Run the Actor.
- Read results in Dataset and
SUMMARYin key-value store.
Pricing Notes
- End-user pricing is controlled in Apify Console publication settings.
- This Actor always attempts one PPE charge per scraped report (
ntrs-report-scraped). - Billable PPE still requires the Actor to be configured for pay-per-event in Apify Console.
FAQ
Why are some fields empty?
Some NTRS records do not provide complete metadata. Source data can still be incomplete.
Why is pdfUrl empty?
Many records do not have a downloadable PDF. The Actor keeps pdfUrl empty in that case.
Are duplicates removed?
Yes, duplicates are skipped by citation id within a run.
Limitations
- Data quality depends on NASA NTRS source records.
- PDF availability is source-dependent.
- Very large unlimited runs (
maxResults = 0) can take a long time.
Legal
This project is not affiliated with or endorsed by NASA. Use this Actor in compliance with NASA NTRS terms and applicable laws.
Support
- Apify docs: https://docs.apify.com
- NTRS: https://ntrs.nasa.gov