Olostep Web Scraper avatar
Olostep Web Scraper

Pricing

Pay per usage

Go to Apify Store
Olostep Web Scraper

Olostep Web Scraper

Automate web search, scraping and crawling with Apify Actors using Olostep — the API to search, extract and structure web data.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Olostep

Olostep

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

Olostep Web Scraper - Apify Actor

Official Apify Actor for Olostep — a Web search, scraping and crawling API; an API to search, extract and structure web data. Extract content from any website in multiple formats (Markdown, HTML, JSON, or plain text) with support for single page scraping, batch processing, website crawling, and URL mapping.

Overview

This Actor integrates Olostep's Web search, scraping and crawling capabilities into the Apify platform, allowing you to:

  • Scrape single websites - Extract content from any URL
  • Batch process URLs - Scrape up to 100,000 URLs in parallel
  • Crawl websites - Automatically discover and scrape linked pages
  • Map websites - Extract all URLs from a website for structure analysis
  • AI-powered Answers - Ask natural-language questions and get structured JSON answers with sources

Features

  • ✅ Multiple output formats (Markdown, HTML, JSON, Text)
  • ✅ JavaScript rendering support with configurable wait times
  • ✅ Country-specific scraping
  • ✅ Specialized parsers for popular websites (Amazon, LinkedIn, etc.)
  • ✅ Batch processing for large-scale data extraction
  • ✅ Website crawling with link following
  • ✅ URL mapping and discovery
  • ✅ Integration-ready design for Apify workflows

Input

The Actor accepts the following input parameters:

Required Fields

  • operation (string): Operation type - scrape, batch, crawl, or map
  • apiKey (string): Your Olostep API key from olostep.com/dashboard

Operation-Specific Fields

Scrape Operation

  • url_to_scrape (string, required): URL to scrape
  • formats (string): Output format - html, markdown, json, or text (default: markdown)
  • country (string): Country code for location-specific scraping (e.g., US, GB, CA)
  • wait_before_scraping (integer): Wait time in milliseconds for JavaScript rendering
  • parser (string): Parser ID for specialized extraction (e.g., @olostep/amazon-product)

Batch Operation

  • batch_array (string, required): JSON array of objects with url and optional custom_id
    • Example: [{"url":"https://example.com","custom_id":"site1"}]
  • formats (string): Output format
  • country (string): Country code
  • wait_before_scraping (integer): Wait time in milliseconds
  • parser (string): Parser ID

Crawl Operation

  • start_url (string, required): Starting URL for the crawl
  • max_pages (integer): Maximum number of pages to crawl (default: 10)
  • follow_links (boolean): Whether to follow links (default: true)
  • formats (string): Output format
  • country (string): Country code
  • parser (string): Parser ID

Map Operation

  • website_url (string, required): Website URL to extract links from
  • search_query (string): Optional search query to filter URLs
  • top_n (integer): Limit the number of URLs returned
  • include_patterns (string): Glob patterns to include (e.g., /blog/**)
  • exclude_patterns (string): Glob patterns to exclude (e.g., /admin/**)

Output

The Actor outputs data to the default dataset. Output format varies by operation:

Scrape Output

{
"id": "scrape_abc123",
"url": "https://example.com",
"status": "completed",
"formats": "markdown",
"markdown_content": "# Example Content\n\n...",
"html_content": "<h1>Example Content</h1>...",
"json_content": "{...}",
"text_content": "Example Content...",
"markdown_hosted_url": "https://...",
"page_metadata": "{...}"
}

Batch Output

{
"batch_id": "batch_xyz789",
"status": "processing",
"total_urls": 100,
"formats": "markdown",
"urls": [
{"custom_id": "site1", "url": "https://example.com"}
]
}

Crawl Output

{
"crawl_id": "crawl_def456",
"status": "in_progress",
"start_url": "https://example.com",
"max_pages": 10,
"follow_links": true,
"formats": "markdown"
}

Map Output

{
"map_id": "map_ghi789",
"website_url": "https://example.com",
"total_urls": 150,
"urls": ["https://example.com/page1", "https://example.com/page2", ...]
}

Usage Examples

Example 1: Scrape a Single Website

{
"operation": "scrape",
"apiKey": "your-api-key",
"url_to_scrape": "https://example.com",
"formats": "markdown",
"country": "US"
}

Example 2: Batch Scrape Multiple URLs

{
"operation": "batch",
"apiKey": "your-api-key",
"batch_array": "[{\"url\":\"https://example.com\",\"custom_id\":\"site1\"},{\"url\":\"https://test.com\",\"custom_id\":\"site2\"}]",
"formats": "json",
"parser": "@olostep/amazon-product"
}

Example 3: Crawl a Website

{
"operation": "crawl",
"apiKey": "your-api-key",
"start_url": "https://example.com",
"max_pages": 50,
"follow_links": true,
"formats": "markdown"
}

Example 4: Map a Website

{
"operation": "map",
"apiKey": "your-api-key",
"website_url": "https://example.com",
"include_patterns": "/blog/**",
"top_n": 100
}

Example 5: AI-powered Answers

{
"operation": "answers",
"apiKey": "your-api-key",
"task": "What is the latest funding round of Olostep? Provide company, round, date, amount.",
"json": "{\"company\":\"\",\"round\":\"\",\"date\":\"\",\"amount\":\"\"}"
}

Integration with Other Actors

This Actor is designed to work seamlessly with other Apify Actors:

  • Input from other Actors: Use the payload field to receive data from triggering actors
  • Output to other Actors: Output data is stored in the default dataset, accessible by other actors
  • Workflow Integration: Chain multiple actors together for complex data extraction workflows

Specialized Parsers

Olostep provides pre-built parsers for popular websites:

  • @olostep/amazon-product - Amazon product pages
  • @olostep/linkedin-profile - LinkedIn profiles
  • @olostep/linkedin-company - LinkedIn company pages
  • @olostep/google-search - Google search results
  • @olostep/google-maps - Google Maps listings
  • @olostep/instagram-profile - Instagram profiles

Error Handling

The Actor handles common errors:

  • 401 Unauthorized: Invalid API key
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Olostep service error
  • Network Errors: Connection issues with detailed error messages

Pricing

Olostep charges based on API usage, independent of Apify:

  • Scrapes: Pay per scrape
  • Batches: Pay per URL in batch
  • Crawls: Pay per page crawled
  • Maps: Pay per map operation

Check current pricing at olostep.com/pricing.

Support

License

MIT License


Ready to scrape the web? Get your API key from olostep.com/dashboard and start extracting data today!