Nasa Reports Scraper avatar

Nasa Reports Scraper

Pricing

from $0.00005 / actor start

Go to Apify Store
Nasa Reports Scraper

Nasa Reports Scraper

Access NASA's Technical Reports Server (NTRS) with an automated scraper that collects scientific papers, conference proceedings, journal articles, and research reports. Provides structured metadata for researchers, scientists, and academics needing large-scale access to NASA's technical publications

Pricing

from $0.00005 / actor start

Rating

0.0

(0)

Developer

Akash Kumar Naik

Akash Kumar Naik

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

NASA Technical Reports Scraper (NTRS)

Extract structured metadata from NASA Technical Reports Server (NTRS) search results, with optional citation API enrichment and optional PDF downloads.

What This Actor Does

This Actor searches ntrs.nasa.gov for a keyword query and stores report metadata to the default dataset.

It is designed for research workflows where you need:

  • Repeatable NTRS collection for a query
  • Cleaned metadata fields (title, authors, dates, categories)
  • Optional PDF URL capture and optional PDF file download
  • Optional API enrichment for better field completeness

Why Use This Actor

  • Handles dynamic NTRS search pages with Playwright
  • Moves through pagination reliably (in-page "Next page" flow)
  • Deduplicates by citation ID during a run
  • Backfills missing fields from https://ntrs.nasa.gov/api/citations/{id} when enabled
  • Supports optional PPE charging attempts per scraped record

Typical Use Cases

  • Literature review and bibliography building
  • Space-tech trend analysis across years/topics
  • Building downstream datasets for NLP or search indexing
  • Monitoring new reports for a recurring query

Input

FieldTypeRequiredDefaultDescription
searchQuerystringYes"alien"Search phrase sent to NTRS
maxResultsintegerNo100Maximum items to save. 0 means unlimited
startPageintegerNo1Start scraping from this search page
downloadPDFsbooleanNotrueDownload PDF binaries to key-value store when available
proxyConfigurationobjectNo{ "useApifyProxy": false }Apify proxy settings

Example Input

Metadata-first run

{
"searchQuery": "mars sample return",
"maxResults": 50,
"startPage": 1,
"downloadPDFs": false
}

Include PDF downloads

{
"searchQuery": "lunar habitat",
"maxResults": 25,
"downloadPDFs": true
}

Output

Dataset item

Each item includes fields such as:

  • title
  • url
  • id
  • documentId
  • abstract
  • documentType
  • authors
  • publicationDate
  • dateAcquired
  • subjectCategory
  • acquisitionSource
  • reportNumber
  • organization
  • distributionLimits
  • copyright
  • meetingInformation (when available)
  • pdfUrl (only if a PDF URL exists)
  • downloadedPdfPath (only if PDF was downloaded)

Example:

{
"title": "Searching for Alien Life Having Unearthly Biochemistry",
"url": "https://ntrs.nasa.gov/citations/20040015106",
"id": "20040015106",
"documentType": "Preprint (Draft being sent to journal)",
"authors": "Jones, Harry\n(NASA Ames Research Center Moffett Field, CA, United States)",
"publicationDate": "January 1, 2003",
"organization": "NASA Ames Research Center",
"pdfUrl": "https://ntrs.nasa.gov/api/citations/20040015106/downloads/20040015106.pdf?attachment=true"
}

Key-value store SUMMARY

The Actor also writes a SUMMARY record with:

  • searchQuery
  • totalResults
  • totalPagesProcessed
  • lastPageVisited
  • usePPE
  • ppeEventName
  • ppeChargeAttempts
  • completed
  • timestamp

How To Run

On Apify platform

  1. Open Actor input.
  2. Set searchQuery and optional filters.
  3. Run the Actor.
  4. Read results in Dataset and SUMMARY in key-value store.

Pricing Notes

  • End-user pricing is controlled in Apify Console publication settings.
  • This Actor always attempts one PPE charge per scraped report (ntrs-report-scraped).
  • Billable PPE still requires the Actor to be configured for pay-per-event in Apify Console.

FAQ

Why are some fields empty?

Some NTRS records do not provide complete metadata. Source data can still be incomplete.

Why is pdfUrl empty?

Many records do not have a downloadable PDF. The Actor keeps pdfUrl empty in that case.

Are duplicates removed?

Yes, duplicates are skipped by citation id within a run.

Limitations

  • Data quality depends on NASA NTRS source records.
  • PDF availability is source-dependent.
  • Very large unlimited runs (maxResults = 0) can take a long time.

This project is not affiliated with or endorsed by NASA. Use this Actor in compliance with NASA NTRS terms and applicable laws.

Support