Web Scraper avatar
Web Scraper

Pricing

$35.00 / 1,000 results

Go to Apify Store
Web Scraper

Web Scraper

Developed by

Futurize Rush

Futurize Rush

Maintained by Community

Simple web scraper. Extract titles, paragraphs, links, images, tables and more from websites. Supports custom CSS selectors and batch collection. For large needs, try Apify's Web Content Crawler.

0.0 (0)

Pricing

$35.00 / 1,000 results

2

4

4

Last modified

3 days ago

A simple and powerful web scraping tool with comprehensive data extraction capabilities.

⚠️ Usage Notice

This tool is for educational and research purposes only. Users must comply with website terms of service, robots.txt specifications, and relevant laws.

🚀 Quick Start

The simplest way to use - just provide URLs:

{
"startUrls": [
{"url": "https://example.com"}
]
}

📊 Output Fields

All data is output with English field names:

  • Basic Info: url, title, scrapedAt, processingTimeMs
  • Page Content: headings, paragraphs, links, images
  • Structured Data: tables, forms, videos, buttons, navigation
  • SEO Data: metadata, openGraph, structuredData, pageLanguage

⚙️ Main Settings

SettingDescriptionDefault
startUrlsList of URLs to scrapeRequired
maxRequestsPerCrawlMaximum pages to scrape100
maxConcurrencyConcurrent scraping2
scrollToBottomAuto-scroll to load contentfalse
blockResourcesBlock resource types for speed[]

🎯 Custom Extraction Rules

Use CSS selectors to extract specific content:

{
"extractionRules": {
"article_title": "h1.article-title",
"author": ".author-name",
"publish_date": "time.publish-date",
"content": "article.content"
}
}

💡 Usage Examples

Scrape Multiple Pages

{
"startUrls": [
{"url": "https://example.com"},
{"url": "https://example.org"}
],
"maxRequestsPerCrawl": 10
}

Dynamic Content Websites

{
"scrollToBottom": true,
"waitForSelector": ".content-loaded",
"pageLoadTimeoutSecs": 20
}

Speed Optimization

{
"blockResources": ["image", "stylesheet", "font"],
"smartMode": true,
"maxConcurrency": 1
}

📝 Output Example

{
"url": "https://example.com/article",
"title": "Article Title",
"headings": {
"h1": ["Main Title"],
"h2": ["Subtitle One", "Subtitle Two"]
},
"paragraphs": ["First paragraph content...", "Second paragraph content..."],
"links": [
{
"text": "Link text",
"href": "https://example.com/link"
}
],
"images": [
{
"src": "https://example.com/image.jpg",
"alt": "Image description",
"width": 800,
"height": 600
}
],
"processingTimeMs": 2345,
"scrapedAt": "2025-08-11T10:30:00.000Z"
}

❓ FAQ

Q: What if scraping fails?

  • Reduce maxConcurrency to 1
  • Increase pageLoadTimeoutSecs
  • Disable smartMode

Q: How to scrape dynamically loaded content?

  • Set scrollToBottom: true
  • Use waitForSelector to wait for elements

Q: How to speed up scraping?

  • Use blockResources to block unnecessary resources
  • Enable smartMode for automatic optimization

📜 License

MIT License


Version: 0.1.0
Updated: August 11, 2025
Developer: FuturizeRush