
Archive.org advanced search
1 day trial then $9.00/month - No credit card required now

Archive.org advanced search
1 day trial then $9.00/month - No credit card required now
a powerful, fast and advanced seach api for archive.org leveraging its api for fast and accurate results, with all the filters supported in archive.org's advanced search
Actor Metrics
2 monthly users
No reviews yet
No bookmarks yet
Created in Mar 2025
Modified a day ago
Overview
Welcome to the Archive.org Advanced Search, a powerful Apify Actor built to unlock the full potential of the Internet Archive's vast digital repository! This cutting-edge tool empowers you to perform highly customizable searches across millions of archived web pages, books, audio files, videos, and more, using a flexible and intuitive interface. Whether you're a researcher, historian, data analyst, or simply a curious explorer, this Actor delivers precise results tailored to your needs with support for advanced filters, pagination, and sorting.
Why Choose This Actor?
- Unmatched Flexibility: Search by title, creator, description, collection, media type, and up to 5 custom fields with customizable operators (e.g., "contains" or "not").
- Precision Control: Fine-tune your queries with exact dates or date ranges, ensuring you find exactly what you're looking for.
- Efficient Pagination: Retrieve results in batches (up to 1000 items per page) and navigate multiple pages effortlessly.
- Dynamic Sorting: Sort results by fields like
publicdate
ordownloads
in ascending or descending order.
This Actor leverages the Archive.org search API to deliver fast, reliable, and structured data, making it an indispensable tool for scraping, analyzing, or archiving digital content. Start exploring the depths of digital history today!
Features
- Advanced Search Filters: Target specific fields with operators like "contains" and "not".
- Custom Fields Support: Add up to 5 custom search criteria for niche queries.
- Date Precision: Search by exact dates (YYYY-MM-DD) or date ranges.
- Pagination Support: Control the number of results per page and page number.
- Sorting Options: Customize result ordering based on your preferences.
- Detailed Output: Receive structured JSON data with metadata like titles, dates, and URLs.
Usage
This Actor is designed to run on the Apify platform. No local installation is required! Simply:
- Sign up or log in to Apify.
- Search for the "Archive.org Advanced Search" in the Apify Store
- Configure the input parameters via the Apify UI or an
INPUT.json
file. - Run the Actor and download the results from the dataset.
For local development or testing, clone the repository, install dependencies (e.g., apify-client
, httpx
), and use the apify run
command with a valid INPUT.json
.
Output
The Actor stores results in the default Apify dataset, accessible as JSON. See the Output Example for a sample response.
Input Documentation
The Actor accepts a JSON object with the following attributes. All fields are optional unless specified.
Attribute | Type | Description | Default Value | Editor | Constraints/Options |
---|---|---|---|---|---|
any_field_value | string | Search term to match across all fields. | "" | textfield | None |
any_field_operator | string | Operator for the 'Any Field' search term. | "contains" | select | ["contains", "not"] |
title_value | string | Search term to match in the title field. | "" | textfield | None |
title_operator | string | Operator for the 'Title' search term. | "contains" | select | ["contains", "not"] |
creator_value | string | Search term to match in the creator field. | "" | textfield | None |
creator_operator | string | Operator for the 'Creator' search term. | "contains" | select | ["contains", "not"] |
description_value | string | Search term to match in the description field. | "" | textfield | None |
description_operator | string | Operator for the 'Description' search term. | "contains" | select | ["contains", "not"] |
collection_value | string | Search term to match in the collection field. | "" | textfield | None |
collection_operator | string | Operator for the 'Collection' search term. | "contains" | select | ["contains", "not"] |
mediatype_value | string | Search term to match in the mediatype field. | "" | textfield | None |
mediatype_operator | string | Operator for the 'Media Type' search term. | "is" | select | ["is", "not"] |
custom_field_1_name | string | Name of the first custom field to search. | "" | textfield | None |
custom_field_1_value | string | Value of the first custom field to search. | "" | textfield | None |
custom_field_1_operator | string | Operator for the first custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_2_name | string | Name of the second custom field to search. | "" | textfield | None |
custom_field_2_value | string | Value of the second custom field to search. | "" | textfield | None |
custom_field_2_operator | string | Operator for the second custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_3_name | string | Name of the third custom field to search. | "" | textfield | None |
custom_field_3_value | string | Value of the third custom field to search. | "" | textfield | None |
custom_field_3_operator | string | Operator for the third custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_4_name | string | Name of the fourth custom field to search. | "" | textfield | None |
custom_field_4_value | string | Value of the fourth custom field to search. | "" | textfield | None |
custom_field_4_operator | string | Operator for the fourth custom field search term. | "contains" | select | ["contains", "not"] |
custom_field_5_name | string | Name of the fifth custom field to search. | "" | textfield | None |
custom_field_5_value | string | Value of the fifth custom field to search. | "" | textfield | None |
custom_field_5_operator | string | Operator for the fifth custom field search term. | "contains" | select | ["contains", "not"] |
date | string | Exact date to match (format: YYYY-MM-DD). | "" | textfield | Must match ^\d{4}-\d{2}-\d{2}$ |
date_range_start | string | Start date of the range to match (format: YYYY-MM-DD). | "" | textfield | Must match ^\d{4}-\d{2}-\d{2}$ |
date_range_end | string | End date of the range to match (format: YYYY-MM-DD). | "" | textfield | Must match ^\d{4}-\d{2}-\d{2}$ |
hits_per_page | integer | Number of items to return per page. | 50 | number | Min: 1, Max: 1000 |
page | integer | Page number to fetch (1-based). | 1 | number | Min: 1 |
sort_name | string | Field to sort by (e.g., publicdate , downloads ). | "" | textfield | None |
sort_value | string | Sort direction (ascending or descending). | "" | select | ["asc", "desc"] |
Input Example
Below is an example INPUT.json
file demonstrating a search for Spanish audio files created between 1994 and 2024, sorted by publication date in descending order.
1{ 2 "any_field_value": "", 3 "any_field_operator": "contains", 4 "title_value": "learn spanish", 5 "title_operator": "contains", 6 "creator_value": "", 7 "creator_operator": "not", 8 "description_value": "", 9 "description_operator": "not", 10 "collection_value": "", 11 "collection_operator": "contains", 12 "mediatype_value": "", 13 "mediatype_operator": "is", 14 "custom_field_1_name": "", 15 "custom_field_1_value": "", 16 "custom_field_1_operator": "contains", 17 "custom_field_2_name": "", 18 "custom_field_2_value": "", 19 "custom_field_2_operator": "contains", 20 "custom_field_3_name": "", 21 "custom_field_3_value": "", 22 "custom_field_3_operator": "contains", 23 "custom_field_4_name": "", 24 "custom_field_4_value": "", 25 "custom_field_4_operator": "contains", 26 "custom_field_5_name": "", 27 "custom_field_5_value": "", 28 "custom_field_5_operator": "contains", 29 "date": "2019-05-10", 30 "date_range_start": "1994-06-06", 31 "date_range_end": "2024-02-06", 32 "hits_per_page": 50, 33 "page": 1, 34 "sort_name": "downloads", 35 "sort_value": "desc" 36}
Output Example
The Actor returns a JSON object stored in the Apify dataset. Below is a sample output for the above input, assuming a successful API response.
1[ 2 { 3 "index": "prod-o-001", 4 "service_backend": "metadata", 5 "hit_type": "item", 6 "identifier": "podcast_learn-spanish-with-daily-podca_1052684843", 7 "filename": "", 8 "file_basename": "", 9 "page_num": 0, 10 "file_creation_mtime": 0, 11 "updated_on": "", 12 "created_on": "", 13 "mediatype": "collection", 14 "title": "Learn Spanish with daily podcasts", 15 "publicdate": "2019-06-15T11:08:02Z", 16 "downloads": 3826, 17 "collection": [ 18 "podcasts", 19 "audio" 20 ], 21 "subject": [ 22 "podcast", 23 "itunes", 24 "apple" 25 ], 26 "addeddate": "2019-06-15T11:08:02Z", 27 "description": "L", 28 "result_in_subfile": false, 29 "__href__": "", 30 "highlight": [], 31 "_score": null, 32 "url": "https://archive.org/details/podcast_learn-spanish-with-daily-podca_1052684843" 33 }, 34 { 35 "index": "prod-o-001", 36 "service_backend": "metadata", 37 "hit_type": "item", 38 "identifier": "lp_listen-learn-spanish_no-artist", 39 "filename": "", 40 "file_basename": "", 41 "page_num": 0, 42 "file_creation_mtime": 0, 43 "updated_on": "", 44 "created_on": "", 45 "mediatype": "audio", 46 "title": "Listen & Learn Spanish", 47 "publicdate": "2020-11-09T08:43:19Z", 48 "downloads": 2546, 49 "collection": [ 50 "album_recordings", 51 "vinyl_bostonpubliclibrary", 52 "audio_music", 53 "unlockedrecordings" 54 ], 55 "subject": [ 56 "Non-Music", 57 "Speech", 58 "Education" 59 ], 60 "addeddate": "2020-12-02T02:37:01Z", 61 "description": "T", 62 "result_in_subfile": false, 63 "__href__": "", 64 "highlight": [], 65 "_score": null, 66 "url": "https://archive.org/details/lp_listen-learn-spanish_no-artist" 67 }, 68]