BeautifulSoup Scraper

Pricing

Pay per usage

Try for free

Go to Apify Store

BeautifulSoup Scraper

Try for free

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Pricing

Pay per usage

Rating

4.2

(5)

Developer

Apify

Maintained by Apify

Actor stats

Bookmarked

951

Total users

Monthly active users

4 months ago

Last modified

Categories

Developer tools

Open source

You can access the BeautifulSoup Scraper programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1import { ApifyClient } from 'apify-client';
2
3// Initialize the ApifyClient with your Apify API token
4// Replace the '<YOUR_API_TOKEN>' with your token
5const client = new ApifyClient({
6    token: '<YOUR_API_TOKEN>',
7});
8
9// Prepare Actor input
10const input = {
11    "startUrls": [
12        {
13            "url": "https://crawlee.dev"
14        }
15    ],
16    "maxCrawlingDepth": 1,
17    "requestTimeout": 10,
18    "linkSelector": "a[href]",
19    "linkPatterns": [
20        ".*crawlee\\.dev.*"
21    ],
22    "pageFunction": `from typing import Any
23        from crawlee.crawlers import BeautifulSoupCrawlingContext
24        
25        # See the context section in readme to find out what fields you can access 
26        # https://apify.com/apify/beautifulsoup-scraper#context    
27        def page_function(context: BeautifulSoupCrawlingContext) -> Any:
28            url = context.request.url
29            title = context.soup.title.string if context.soup.title else None
30            return {'url': url, 'title': title}`,
31    "soupFeatures": "html.parser",
32    "proxyConfiguration": {
33        "useApifyProxy": true
34    }
35};
36
37// Run the Actor and wait for it to finish
38const run = await client.actor("apify/beautifulsoup-scraper").call(input);
39
40// Fetch and print Actor results from the run's dataset (if any)
41console.log('Results from dataset');
42console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
43const { items } = await client.dataset(run.defaultDatasetId).listItems();
44items.forEach((item) => {
45    console.dir(item);
46});
47
48// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

BeautifulSoup Scraper API in JavaScript

The Apify API client for JavaScript is the official library that allows you to use BeautifulSoup Scraper API in JavaScript or TypeScript, providing convenience functions and automatic retries on errors.

Install the apify-client

$npm install apify-client

Other API clients include:

BeautifulSoup Scraper API in Python

BeautifulSoup Scraper API through CLI

BeautifulSoup Scraper OpenAPI definition

BeautifulSoup Scraper API

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

12K

5.0

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

503

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

4.2K

4.9

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

100K

4.7

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

11K

4.9

Example Code Runner (Python)

apify/example-code-runner-python

Python Actor to run code examples from the documentation via "Run on Apify" links.

Apify

842

Python BeautifulSoup template

ellustar/my-actor-5

Python BeautifulSoup Actor Template: Streamline web scraping with this ready-to-use Python template. Effortlessly extract, parse, and manage data from websites using BeautifulSoup, with clean code, reusable functions, and flexible structure for fast, efficient automation projects.

Ellustar

Universal Website Scraper (Python)

fortuitous_inch/my-actor

Scrape structured data from any website URL using Python and BeautifulSoup. Extract titles, links, and page content for research and automation.

Amol Pandgale

BizQuest Scraper

parseforge/bizquest-scraper

Supercharge your business acquisition research with our comprehensive BizQuest scraper! Get complete business information, pricing, financials, location details, and contact information from the leading business marketplace.

ParseForge

5.0

Full Site Downloader | $4.99/Site | 1-Time Crawl | All Assets

hailey_apify/Full-Website-Downloader

Full-Website-Downloader - Automatically crawls entire websites including HTML and all static assets (CSS, JS, images, etc.), preserves complete structure and exports as ZIP package. Supports depth control and same-domain resource filtering.