Pricing

Pay per usage

Olostep Web Scraper

Automate web search, scraping and crawling with Apify Actors using Olostep — the API to search, extract and structure web data for your workflows.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Olostep

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Olostep Web Scraper - Apify Actor

Official Apify Actor for Olostep — a Web search, scraping and crawling API; an API to search, extract and structure web data. Extract content from any website in multiple formats (Markdown, HTML, JSON, or plain text) with support for single page scraping, batch processing, website crawling, and URL mapping.

Overview

This Actor integrates Olostep's Web search, scraping and crawling capabilities into the Apify platform, allowing you to:

Scrape single websites - Extract content from any URL
Batch process URLs - Scrape up to 100,000 URLs in parallel
Crawl websites - Automatically discover and scrape linked pages
Map websites - Extract all URLs from a website for structure analysis
AI-powered Answers - Ask natural-language questions and get structured JSON answers with sources

Features

✅ Multiple output formats (Markdown, HTML, JSON, Text)
✅ JavaScript rendering support with configurable wait times
✅ Country-specific scraping
✅ Specialized parsers for popular websites (Amazon, Google, etc.)
✅ Batch processing for large-scale data extraction
✅ Website crawling with link following
✅ URL mapping and discovery
✅ Integration-ready design for Apify workflows

Input

The Actor accepts the following input parameters:

Required Fields

operation (string): Operation type - scrape, batch, crawl, or map
apiKey (string): Your Olostep API key from olostep.com/dashboard

Operation-Specific Fields

Scrape Operation

url_to_scrape (string, required): URL to scrape
formats (string): Output format - html, markdown, json, or text (default: markdown)
country (string): Country code for location-specific scraping (e.g., US, GB, CA)
wait_before_scraping (integer): Wait time in milliseconds for JavaScript rendering
parser (string): Parser ID for specialized extraction (e.g., @olostep/amazon-product)

Batch Operation

batch_array (string, required): JSON array of objects with url and optional custom_id
- Example: [{"url":"https://example.com","custom_id":"site1"}]
formats (string): Output format
country (string): Country code
wait_before_scraping (integer): Wait time in milliseconds
parser (string): Parser ID

Crawl Operation

start_url (string, required): Starting URL for the crawl
max_pages (integer): Maximum number of pages to crawl (default: 10)
follow_links (boolean): Whether to follow links (default: true)
formats (string): Output format
country (string): Country code
parser (string): Parser ID

Map Operation

website_url (string, required): Website URL to extract links from
search_query (string): Optional search query to filter URLs
top_n (integer): Limit the number of URLs returned
include_patterns (string): Glob patterns to include (e.g., /blog/**)
exclude_patterns (string): Glob patterns to exclude (e.g., /admin/**)

Output

The Actor outputs data to the default dataset. Output format varies by operation:

Scrape Output

{
  "id": "scrape_abc123",
  "url": "https://example.com",
  "status": "completed",
  "formats": "markdown",
  "markdown_content": "# Example Content\n\n...",
  "html_content": "<h1>Example Content</h1>...",
  "json_content": "{...}",
  "text_content": "Example Content...",
  "markdown_hosted_url": "https://...",
  "page_metadata": "{...}"
}

Batch Output

{
  "batch_id": "batch_xyz789",
  "status": "processing",
  "total_urls": 100,
  "formats": "markdown",
  "urls": [
    {"custom_id": "site1", "url": "https://example.com"}
  ]
}

Crawl Output

{
  "crawl_id": "crawl_def456",
  "status": "in_progress",
  "start_url": "https://example.com",
  "max_pages": 10,
  "follow_links": true,
  "formats": "markdown"
}

Map Output

{
  "map_id": "map_ghi789",
  "website_url": "https://example.com",
  "total_urls": 150,
  "urls": ["https://example.com/page1", "https://example.com/page2", ...]
}

Usage Examples

Example 1: Scrape a Single Website

{
  "operation": "scrape",
  "apiKey": "your-api-key",
  "url_to_scrape": "https://example.com",
  "formats": "markdown",
  "country": "US"
}

Example 2: Batch Scrape Multiple URLs

{
  "operation": "batch",
  "apiKey": "your-api-key",
  "batch_array": "[{\"url\":\"https://example.com\",\"custom_id\":\"site1\"},{\"url\":\"https://test.com\",\"custom_id\":\"site2\"}]",
  "formats": "json",
  "parser": "@olostep/amazon-product"
}

Example 3: Crawl a Website

{
  "operation": "crawl",
  "apiKey": "your-api-key",
  "start_url": "https://example.com",
  "max_pages": 50,
  "follow_links": true,
  "formats": "markdown"
}

Example 4: Map a Website

{
  "operation": "map",
  "apiKey": "your-api-key",
  "website_url": "https://example.com",
  "include_patterns": "/blog/**",
  "top_n": 100
}

Example 5: AI-powered Answers

{
  "operation": "answers",
  "apiKey": "your-api-key",
  "task": "What is the latest funding round of Olostep? Provide company, round, date, amount.",
  "json": "{\"company\":\"\",\"round\":\"\",\"date\":\"\",\"amount\":\"\"}"
}

Integration with Other Actors

This Actor is designed to work seamlessly with other Apify Actors:

Input from other Actors: Use the payload field to receive data from triggering actors
Output to other Actors: Output data is stored in the default dataset, accessible by other actors
Workflow Integration: Chain multiple actors together for complex data extraction workflows

Specialized Parsers

Olostep provides pre-built parsers for popular websites:

@olostep/amazon-product - Amazon product pages
@olostep/google-search - Google search results
@olostep/google-maps - Google Maps listings

Explore additional parsers in the Olostep Store: https://www.olostep.com/store

Error Handling

The Actor handles common errors:

401 Unauthorized: Invalid API key
429 Too Many Requests: Rate limit exceeded
500 Internal Server Error: Olostep service error
Network Errors: Connection issues with detailed error messages

Pricing

Olostep charges based on API usage, independent of Apify:

Scrapes: Pay per scrape
Batches: Pay per URL in batch
Crawls: Pay per page crawled
Maps: Pay per map operation

Check current pricing at olostep.com/pricing.

Support

Documentation: docs.olostep.com
Support: olostep.com/support
API Dashboard: olostep.com/dashboard

License

MIT License

Ready to scrape the web? Get your API key from olostep.com/dashboard and start extracting data today!

Olostep MCP Server

agentify/olostep-mcp-server

Olostep offers clean markdown content from websites, including geo-targeted Google search results. It handles JavaScript sites with configurable waits and provides reliable service with simple API key access and comprehensive error reporting.

agentify

AI Web Scraper

dtrungtin/ai-web-scraper

Crawl web pages and extract structured information using AI

Tin

Web Search Scraper

yesintelligent/web-search-scraper

Advanced web search scraper and data extraction API that delivers real-time search results with comprehensive content snippets. Perfect for research, competitive analysis, content discovery, and automated information gathering. Extract structured data from web searches with high accuracy and speed.

yesintelligent

Web Scraper Mcp

sovereigntaylor/web-scraper-mcp

Ricardo Akiyoshi

Tavily MCP Server

agentify/tavily-mcp-server

Advanced web search and data extraction capabilities through the Tavily API, providing real-time web search, intelligent data extraction, website mapping, and web crawling tools.

agentify

230

Web Search Scraper

akash9078/web-search-scraper

Akash Kumar Naik

Best Web Search API

crawlkit/best-web-search-api

Search the web and get structured results with AI-powered relevance. Powered by Crawlkit.

Crawlkit

Web Scraper For Llms

abotapi/web-scraper-for-llms

Stealth web scraping engine built for LLMs. Converts any web page to clean markdown or HTML

AbotAPI

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.