Archive.org advanced search avatar
Archive.org advanced search

Pricing

$15.00/month + usage

Go to Apify Store
Archive.org advanced search

Archive.org advanced search

Developed by

Maged

Maged

Maintained by Community

a powerful, fast and advanced seach api for archive.org leveraging its api for fast and accurate results, with all the filters supported in archive.org's advanced search

5.0 (1)

Pricing

$15.00/month + usage

0

7

3

Last modified

18 days ago

Overview

Welcome to the Archive.org Advanced Search, a powerful Apify Actor built to unlock the full potential of the Internet Archive's vast digital repository! This cutting-edge tool empowers you to perform highly customizable searches across millions of archived web pages, books, audio files, videos, and more, using a flexible and intuitive interface. Whether you're a researcher, historian, data analyst, or simply a curious explorer, this Actor delivers precise results tailored to your needs with support for advanced filters, pagination, and sorting.

Why Choose This Actor?

  • Unmatched Flexibility: Search by title, creator, description, collection, media type, and up to 5 custom fields with customizable operators (e.g., "contains" or "not").
  • Precision Control: Fine-tune your queries with exact dates or date ranges, ensuring you find exactly what you're looking for.
  • Efficient Pagination: Retrieve results in batches (up to 1000 items per page) and navigate multiple pages effortlessly.
  • Dynamic Sorting: Sort results by fields like publicdate or downloads in ascending or descending order.

This Actor leverages the Archive.org search API to deliver fast, reliable, and structured data, making it an indispensable tool for scraping, analyzing, or archiving digital content. Start exploring the depths of digital history today!

Features

  • Advanced Search Filters: Target specific fields with operators like "contains" and "not".
  • Custom Fields Support: Add up to 5 custom search criteria for niche queries.
  • Date Precision: Search by exact dates (YYYY-MM-DD) or date ranges.
  • Pagination Support: Control the number of results per page and page number.
  • Sorting Options: Customize result ordering based on your preferences.
  • Detailed Output: Receive structured JSON data with metadata like titles, dates, and URLs.

Usage

This Actor is designed to run on the Apify platform. No local installation is required! Simply:

  1. Sign up or log in to Apify.
  2. Search for the "Archive.org Advanced Search" in the Apify Store
  3. Configure the input parameters via the Apify UI or an INPUT.json file.
  4. Run the Actor and download the results from the dataset.

For local development or testing, clone the repository, install dependencies (e.g., apify-client, httpx), and use the apify run command with a valid INPUT.json.

Output

The Actor stores results in the default Apify dataset, accessible as JSON. See the Output Example for a sample response.

Input Documentation

The Actor accepts a JSON object with the following attributes. All fields are optional unless specified.

AttributeTypeDescriptionDefault ValueEditorConstraints/Options
any_field_valuestringSearch term to match across all fields.""textfieldNone
any_field_operatorstringOperator for the 'Any Field' search term."contains"select["contains", "not"]
title_valuestringSearch term to match in the title field.""textfieldNone
title_operatorstringOperator for the 'Title' search term."contains"select["contains", "not"]
creator_valuestringSearch term to match in the creator field.""textfieldNone
creator_operatorstringOperator for the 'Creator' search term."contains"select["contains", "not"]
description_valuestringSearch term to match in the description field.""textfieldNone
description_operatorstringOperator for the 'Description' search term."contains"select["contains", "not"]
collection_valuestringSearch term to match in the collection field.""textfieldNone
collection_operatorstringOperator for the 'Collection' search term."contains"select["contains", "not"]
mediatype_valuestringSearch term to match in the mediatype field.""textfieldNone
mediatype_operatorstringOperator for the 'Media Type' search term."is"select["is", "not"]
custom_field_1_namestringName of the first custom field to search.""textfieldNone
custom_field_1_valuestringValue of the first custom field to search.""textfieldNone
custom_field_1_operatorstringOperator for the first custom field search term."contains"select["contains", "not"]
custom_field_2_namestringName of the second custom field to search.""textfieldNone
custom_field_2_valuestringValue of the second custom field to search.""textfieldNone
custom_field_2_operatorstringOperator for the second custom field search term."contains"select["contains", "not"]
custom_field_3_namestringName of the third custom field to search.""textfieldNone
custom_field_3_valuestringValue of the third custom field to search.""textfieldNone
custom_field_3_operatorstringOperator for the third custom field search term."contains"select["contains", "not"]
custom_field_4_namestringName of the fourth custom field to search.""textfieldNone
custom_field_4_valuestringValue of the fourth custom field to search.""textfieldNone
custom_field_4_operatorstringOperator for the fourth custom field search term."contains"select["contains", "not"]
custom_field_5_namestringName of the fifth custom field to search.""textfieldNone
custom_field_5_valuestringValue of the fifth custom field to search.""textfieldNone
custom_field_5_operatorstringOperator for the fifth custom field search term."contains"select["contains", "not"]
datestringExact date to match (format: YYYY-MM-DD).""textfieldMust match ^\d{4}-\d{2}-\d{2}$
date_range_startstringStart date of the range to match (format: YYYY-MM-DD).""textfieldMust match ^\d{4}-\d{2}-\d{2}$
date_range_endstringEnd date of the range to match (format: YYYY-MM-DD).""textfieldMust match ^\d{4}-\d{2}-\d{2}$
hits_per_pageintegerNumber of items to return per page.50numberMin: 1, Max: 1000
pageintegerPage number to fetch (1-based).1numberMin: 1
sort_namestringField to sort by (e.g., publicdate, downloads).""textfieldNone
sort_valuestringSort direction (ascending or descending).""select["asc", "desc"]

Input Example

Below is an example INPUT.json file demonstrating a search for Spanish audio files created between 1994 and 2024, sorted by publication date in descending order.

{
"any_field_value": "",
"any_field_operator": "contains",
"title_value": "learn spanish",
"title_operator": "contains",
"creator_value": "",
"creator_operator": "not",
"description_value": "",
"description_operator": "not",
"collection_value": "",
"collection_operator": "contains",
"mediatype_value": "",
"mediatype_operator": "is",
"custom_field_1_name": "",
"custom_field_1_value": "",
"custom_field_1_operator": "contains",
"custom_field_2_name": "",
"custom_field_2_value": "",
"custom_field_2_operator": "contains",
"custom_field_3_name": "",
"custom_field_3_value": "",
"custom_field_3_operator": "contains",
"custom_field_4_name": "",
"custom_field_4_value": "",
"custom_field_4_operator": "contains",
"custom_field_5_name": "",
"custom_field_5_value": "",
"custom_field_5_operator": "contains",
"date": "2019-05-10",
"date_range_start": "1994-06-06",
"date_range_end": "2024-02-06",
"hits_per_page": 50,
"page": 1,
"sort_name": "downloads",
"sort_value": "desc"
}

Output Example

The Actor returns a JSON object stored in the Apify dataset. Below is a sample output for the above input, assuming a successful API response.

[
{
"index": "prod-o-001",
"service_backend": "metadata",
"hit_type": "item",
"identifier": "podcast_learn-spanish-with-daily-podca_1052684843",
"filename": "",
"file_basename": "",
"page_num": 0,
"file_creation_mtime": 0,
"updated_on": "",
"created_on": "",
"mediatype": "collection",
"title": "Learn Spanish with daily podcasts",
"publicdate": "2019-06-15T11:08:02Z",
"downloads": 3826,
"collection": [
"podcasts",
"audio"
],
"subject": [
"podcast",
"itunes",
"apple"
],
"addeddate": "2019-06-15T11:08:02Z",
"description": "L",
"result_in_subfile": false,
"__href__": "",
"highlight": [],
"_score": null,
"url": "https://archive.org/details/podcast_learn-spanish-with-daily-podca_1052684843"
},
{
"index": "prod-o-001",
"service_backend": "metadata",
"hit_type": "item",
"identifier": "lp_listen-learn-spanish_no-artist",
"filename": "",
"file_basename": "",
"page_num": 0,
"file_creation_mtime": 0,
"updated_on": "",
"created_on": "",
"mediatype": "audio",
"title": "Listen & Learn Spanish",
"publicdate": "2020-11-09T08:43:19Z",
"downloads": 2546,
"collection": [
"album_recordings",
"vinyl_bostonpubliclibrary",
"audio_music",
"unlockedrecordings"
],
"subject": [
"Non-Music",
"Speech",
"Education"
],
"addeddate": "2020-12-02T02:37:01Z",
"description": "T",
"result_in_subfile": false,
"__href__": "",
"highlight": [],
"_score": null,
"url": "https://archive.org/details/lp_listen-learn-spanish_no-artist"
},
]