# University Course Catalog Scraper (`datapilot/university-course-catalog-scraper`) Actor

University Course Catalog Scraper extracts course information from university catalog websites using  and Apify. It collects course codes, titles, credits, departments, descriptions, and prerequisites, supports pagination, and outputs structured JSON for academic research and catalog analysis. 🎓📚

- **URL**: https://apify.com/datapilot/university-course-catalog-scraper.md
- **Developed by:** [Data Pilot](https://apify.com/datapilot) (community)
- **Categories:** Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.50 / 1,000 scraped results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## 🎓 University Course Catalog Scraper

An Apify Actor that extracts structured **University Course** data from any university or college catalog website. Provide a catalog URL and the actor returns clean, structured **University Course** records — including course code, title, credits, department, description, and prerequisites — across paginated catalog pages.

With  browser automation, multi-strategy extraction, and residential proxy support, this actor reliably scrapes **University Course** listings from virtually any academic institution's website.

---

### 📋 Table of Contents

- [Features](#-features)
- [How It Works](#-how-it-works)
- [Input](#-input)
- [Output](#-output)
- [Use Cases](#-use-cases)
- [Quick Start](#-quick-start)
- [Technical Stack](#-technical-stack)
- [Changelog](#-changelog)
- [Support](#-support)

---

### 🔥 Features

- ✅ **Multi-Strategy Extraction** — 4 fallback strategies to extract **University Course** data from any page layout
- ✅ ** Browser Automation** — Real Chromium browser renders JavaScript-heavy catalog pages accurately
- ✅ **Automatic Pagination** — Follows "Next" links to collect **University Course** listings across multiple pages
- ✅ **Deduplication** — Skips duplicate **University Course** entries automatically
- ✅ **Anti-Detection** — Rotates user agents and disables automation fingerprints
- ✅ **Proxy Support** — Uses Apify residential proxies to bypass IP restrictions
- ✅ **Anti-Blocking Delays** — Random delays between page requests to mimic human browsing
- ✅ **Configurable Limit** — Set a maximum number of **University Course** records to collect
- ✅ **Error Handling** — Graceful error recovery with detailed logging
- ✅ **Dataset Integration** — Pushes all **University Course** data to Apify dataset in real time

---

### ⚙️ How It Works

The actor uses **4 progressive extraction strategies** to handle different types of **University Course** catalog pages:

| Strategy | Method | Best For |
|----------|--------|----------|
| **1** | Course block `<div>` / `<li>` / `<article>` elements | Standard catalog pages with course cards |
| **2** | Single course page with `<h1>` title | Individual **University Course** detail pages |
| **3** | HTML `<table>` with course headers | Table-based **University Course** listings |
| **4** | Heading fallback (`h1`–`h4` with course code pattern) | Simple or legacy catalog pages |

**Step-by-step flow:**

1. **Input Parsing** — Read the catalog URL, limit, and proxy settings
2. **Browser Launch** — Start headless Chromium with anti-detection configuration
3. **Page Fetch** — Navigate using fallback strategies (`domcontentloaded` → `load` → `commit`)
4. **Course Extraction** — Apply the 4 strategies in order until courses are found
5. **Deduplication** — Skip any already-seen course code + title combinations
6. **Dataset Push** — Push each unique **University Course** record to Apify dataset
7. **Pagination** — Follow "Next" page links and repeat until limit is reached
8. **Completion** — Log total **University Course** records saved

---

### 📥 Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `url` | string | **Required** | URL of the **University Course** catalog page to scrape |
| `maxCourses` | integer | `100` | Maximum number of **University Course** records to collect |
| `waitSeconds` | integer | `5` | Seconds to wait after page load before extracting |
| `useApifyProxy` | boolean | `true` | Whether to use Apify proxy |
| `apifyProxyGroups` | array | `["RESIDENTIAL"]` | Proxy groups to use |

#### Example Input

```json
{
  "url": "https://catalog.university.edu/courses",
  "maxCourses": 200,
  "waitSeconds": 5,
  "useApifyProxy": true,
  "apifyProxyGroups": ["RESIDENTIAL"]
}
````

***

### 📤 Output

Each **University Course** record is pushed as a separate dataset item.

| Field | Type | Description |
|-------|------|-------------|
| `course_code` | string | Course identifier (e.g., `CS 101`, `ENG-202`) |
| `title` | string | **University Course** title/name |
| `credits` | string | Credit hours or units |
| `department` | string | Department or school offering the course |
| `description` | string | Course description (up to 400–500 characters) |
| `prerequisites` | string | Prerequisite courses or requirements |
| `source_url` | string | Page URL where the course was found |
| `scraped_at` | string | ISO 8601 UTC timestamp |

#### Example Output

```json
{
  "course_code": "CS 301",
  "title": "Data Structures and Algorithms",
  "credits": "3",
  "department": "Computer Science",
  "description": "Study of fundamental data structures including arrays, linked lists, trees, and graphs. Analysis of sorting and searching algorithms.",
  "prerequisites": "CS 101, MATH 201",
  "source_url": "https://catalog.university.edu/courses/cs",
  "scraped_at": "2025-03-22T12:34:56Z"
}
```

***

### 🎯 Use Cases

- 🎓 **Course Catalog Aggregation** — Build a searchable database of **University Course** offerings
- 📊 **Curriculum Research** — Compare **University Course** structures across institutions
- 🤖 **Academic Recommendation Systems** — Power course recommendation engines with structured data
- 📚 **EdTech Platforms** — Enrich platforms with real **University Course** metadata
- 🔬 **Higher Education Research** — Analyze trends in **University Course** offerings by department
- 🏫 **Institutional Benchmarking** — Compare credit hours, prerequisites, and departments
- 📝 **Accreditation Support** — Collect structured **University Course** data for reporting

***

### 🚀 Quick Start

1. **Open on Apify** — Visit the actor page and click **Try for free**
2. **Set Input** — Paste your university catalog URL into the `url` field
3. **Configure Limit** — Set `maxCourses` to how many courses you need
4. **Enable Proxy** — Keep `useApifyProxy` enabled for reliable scraping
5. **Run the Actor** — Click Start and monitor progress in the logs
6. **Download Results** — Export the **University Course** dataset as JSON, CSV, or Excel

#### Sample Log Output

```
Starting scrape: https://catalog.university.edu/courses | limit=200
[Page 1]
  Strategy 1: 48 course block(s) found
  Total so far: 48 courses
[Page 2]
  Strategy 1: 45 course block(s) found
  Total so far: 93 courses
Done! Total courses saved: 200
```

***

### 🧰 Technical Stack

| Component | Technology |
|-----------|------------|
| Browser Automation | (Chromium) |
| Anti-Detection | Random user agents, disabled webdriver fingerprint |
| Navigation | Multi-strategy (`domcontentloaded`, `load`, `commit`) |
| Async | `asyncio` |
| Proxy | Apify Proxy (Residential) |
| Platform | Apify Actor (serverless, scalable) |

***

### 📦 Changelog

#### v1.0.0 — Initial Release

- -based **University Course** catalog scraping
- 4-strategy extraction (blocks, single page, table, heading fallback)
- Automatic pagination with "Next" link detection
- Course code, title, credits, department, description, prerequisites extraction
- Deduplication by course code + title
- Configurable course limit (`maxCourses`)
- Configurable page wait time (`waitSeconds`)
- Residential proxy support
- Anti-detection user agent rotation
- Random anti-blocking delays (2–4 seconds)
- Real-time dataset push with ISO 8601 timestamp
- Graceful error handling and browser cleanup

***

### 🧑‍💻 Support & Feedback

- **Issues & Ideas** — Open a ticket on the Apify Actor issue tracker
- **Documentation** — Visit [Apify Docs](https://docs.apify.com) for platform guides
- **Scraping Notes** — Increase `waitSeconds` for slower university websites
- **Proxy Tips** — Always use residential proxies for university catalog scraping

***

> ⚠️ **Disclaimer:** This actor scrapes publicly visible data from university course catalog pages. Please ensure your usage complies with the terms of service of the target institution. Intended for research and informational purposes only.

# Actor input Schema

## `url` (type: `string`):

University course catalog URL to scrape.

## `maxCourses` (type: `integer`):

Maximum number of courses to scrape.

## `waitSeconds` (type: `integer`):

Seconds to wait after page load for JS to render.

## `useApifyProxy` (type: `boolean`):

Enable proxy to avoid bot detection.

## `apifyProxyGroups` (type: `array`):

RESIDENTIAL recommended for university sites.

## Actor input object example

```json
{
  "url": "https://example.edu/course-catalog",
  "maxCourses": 100,
  "waitSeconds": 5,
  "useApifyProxy": true,
  "apifyProxyGroups": [
    "RESIDENTIAL"
  ]
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://example.edu/course-catalog"
};

// Run the Actor and wait for it to finish
const run = await client.actor("datapilot/university-course-catalog-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "url": "https://example.edu/course-catalog" }

# Run the Actor and wait for it to finish
run = client.actor("datapilot/university-course-catalog-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://example.edu/course-catalog"
}' |
apify call datapilot/university-course-catalog-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=datapilot/university-course-catalog-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "University Course Catalog Scraper",
        "description": "University Course Catalog Scraper extracts course information from university catalog websites using  and Apify. It collects course codes, titles, credits, departments, descriptions, and prerequisites, supports pagination, and outputs structured JSON for academic research and catalog analysis. 🎓📚",
        "version": "0.0",
        "x-build-id": "N0fw1xNsUkxzZrGFJ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/datapilot~university-course-catalog-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-datapilot-university-course-catalog-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/datapilot~university-course-catalog-scraper/runs": {
            "post": {
                "operationId": "runs-sync-datapilot-university-course-catalog-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/datapilot~university-course-catalog-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-datapilot-university-course-catalog-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "url"
                ],
                "properties": {
                    "url": {
                        "title": "Catalog URL",
                        "type": "string",
                        "description": "University course catalog URL to scrape."
                    },
                    "maxCourses": {
                        "title": "Max Courses",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of courses to scrape.",
                        "default": 100
                    },
                    "waitSeconds": {
                        "title": "Page Wait (seconds)",
                        "minimum": 1,
                        "maximum": 30,
                        "type": "integer",
                        "description": "Seconds to wait after page load for JS to render.",
                        "default": 5
                    },
                    "useApifyProxy": {
                        "title": "Use Apify Proxy",
                        "type": "boolean",
                        "description": "Enable proxy to avoid bot detection.",
                        "default": true
                    },
                    "apifyProxyGroups": {
                        "title": "Proxy Groups",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "RESIDENTIAL recommended for university sites.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "RESIDENTIAL",
                                "DATACENTER",
                                "GOOGLE"
                            ]
                        },
                        "default": [
                            "RESIDENTIAL"
                        ]
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
