Streamline your web scraping, crawling, automation or data processing workloads with specialized data storages from Apify. Easily maintain queues of URLs of web pages to crawl, store screenshots, save scraping results and export them to formats such as JSON, CSV or Excel.
Benefits
Designed especially for web scraping and crawling
Traditional database systems are not well suited for web scraping and crawling operations, and they can become prohibitively expensive for large workloads or fail to scale altogether. Apify provides low-cost storage carefully designed for these types of operations.
Enterprise-grade reliability, performance, and scalability
Store a few records or a few hundred million, with the same low latency and high reliability. Apify storages are designed according to industry's best practices and use Amazon Web Services for the underlying data storage, giving you high availability and peace of mind.
Dataset
Store results from your web scraping, crawling or data processing jobs into Apify datasets and export them to various formats like JSON, CSV, XML, RSS, Excel or HTML.
Datasets are ideal for storing a list of items, products from an online store or contact details of prospective customers. The advanced formatting and filtering options let you easily integrate datasets into your data pipeline.
Excel, HTML, CSV
- Apple iPhone X649
XML, RSS
- Apple iPhone X649
JSON
- Apple iPhone X649
Request queue
Maintain a queue of URLs of web pages in order to recursively crawl websites, starting from initial URLs and adding new links as they are found while skipping duplicates.
The request queue lets you query whether specific URLs were already found, push new URLs to the queue and fetch the next ones to process. Request queues support both breadth-first and depth-first crawling orders, and custom data attributes.
Key-value store
Store arbitrary data records along with their MIME content type. The records are accessible under a unique name and can be written and read at a rapid rate. The key-value store is ideal for saving files, such as screenshots of web pages or PDFs, or to persist the state of your actors and crawlers.