University Course Catalog Scraper
Pricing
from $3.50 / 1,000 scraped results
University Course Catalog Scraper
University Course Catalog Scraper extracts course information from university catalog websites using and Apify. It collects course codes, titles, credits, departments, descriptions, and prerequisites, supports pagination, and outputs structured JSON for academic research and catalog analysis. ππ
Pricing
from $3.50 / 1,000 scraped results
Rating
0.0
(0)
Developer
Data Pilot
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
π University Course Catalog Scraper
An Apify Actor that extracts structured University Course data from any university or college catalog website. Provide a catalog URL and the actor returns clean, structured University Course records β including course code, title, credits, department, description, and prerequisites β across paginated catalog pages.
With browser automation, multi-strategy extraction, and residential proxy support, this actor reliably scrapes University Course listings from virtually any academic institution's website.
π Table of Contents
π₯ Features
- β Multi-Strategy Extraction β 4 fallback strategies to extract University Course data from any page layout
- β ** Browser Automation** β Real Chromium browser renders JavaScript-heavy catalog pages accurately
- β Automatic Pagination β Follows "Next" links to collect University Course listings across multiple pages
- β Deduplication β Skips duplicate University Course entries automatically
- β Anti-Detection β Rotates user agents and disables automation fingerprints
- β Proxy Support β Uses Apify residential proxies to bypass IP restrictions
- β Anti-Blocking Delays β Random delays between page requests to mimic human browsing
- β Configurable Limit β Set a maximum number of University Course records to collect
- β Error Handling β Graceful error recovery with detailed logging
- β Dataset Integration β Pushes all University Course data to Apify dataset in real time
βοΈ How It Works
The actor uses 4 progressive extraction strategies to handle different types of University Course catalog pages:
| Strategy | Method | Best For |
|---|---|---|
| 1 | Course block <div> / <li> / <article> elements | Standard catalog pages with course cards |
| 2 | Single course page with <h1> title | Individual University Course detail pages |
| 3 | HTML <table> with course headers | Table-based University Course listings |
| 4 | Heading fallback (h1βh4 with course code pattern) | Simple or legacy catalog pages |
Step-by-step flow:
- Input Parsing β Read the catalog URL, limit, and proxy settings
- Browser Launch β Start headless Chromium with anti-detection configuration
- Page Fetch β Navigate using fallback strategies (
domcontentloadedβloadβcommit) - Course Extraction β Apply the 4 strategies in order until courses are found
- Deduplication β Skip any already-seen course code + title combinations
- Dataset Push β Push each unique University Course record to Apify dataset
- Pagination β Follow "Next" page links and repeat until limit is reached
- Completion β Log total University Course records saved
π₯ Input
| Field | Type | Default | Description |
|---|---|---|---|
url | string | Required | URL of the University Course catalog page to scrape |
maxCourses | integer | 100 | Maximum number of University Course records to collect |
waitSeconds | integer | 5 | Seconds to wait after page load before extracting |
useApifyProxy | boolean | true | Whether to use Apify proxy |
apifyProxyGroups | array | ["RESIDENTIAL"] | Proxy groups to use |
Example Input
{"url": "https://catalog.university.edu/courses","maxCourses": 200,"waitSeconds": 5,"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}
π€ Output
Each University Course record is pushed as a separate dataset item.
| Field | Type | Description |
|---|---|---|
course_code | string | Course identifier (e.g., CS 101, ENG-202) |
title | string | University Course title/name |
credits | string | Credit hours or units |
department | string | Department or school offering the course |
description | string | Course description (up to 400β500 characters) |
prerequisites | string | Prerequisite courses or requirements |
source_url | string | Page URL where the course was found |
scraped_at | string | ISO 8601 UTC timestamp |
Example Output
{"course_code": "CS 301","title": "Data Structures and Algorithms","credits": "3","department": "Computer Science","description": "Study of fundamental data structures including arrays, linked lists, trees, and graphs. Analysis of sorting and searching algorithms.","prerequisites": "CS 101, MATH 201","source_url": "https://catalog.university.edu/courses/cs","scraped_at": "2025-03-22T12:34:56Z"}
π― Use Cases
- π Course Catalog Aggregation β Build a searchable database of University Course offerings
- π Curriculum Research β Compare University Course structures across institutions
- π€ Academic Recommendation Systems β Power course recommendation engines with structured data
- π EdTech Platforms β Enrich platforms with real University Course metadata
- π¬ Higher Education Research β Analyze trends in University Course offerings by department
- π« Institutional Benchmarking β Compare credit hours, prerequisites, and departments
- π Accreditation Support β Collect structured University Course data for reporting
π Quick Start
- Open on Apify β Visit the actor page and click Try for free
- Set Input β Paste your university catalog URL into the
urlfield - Configure Limit β Set
maxCoursesto how many courses you need - Enable Proxy β Keep
useApifyProxyenabled for reliable scraping - Run the Actor β Click Start and monitor progress in the logs
- Download Results β Export the University Course dataset as JSON, CSV, or Excel
Sample Log Output
Starting scrape: https://catalog.university.edu/courses | limit=200[Page 1]Strategy 1: 48 course block(s) foundTotal so far: 48 courses[Page 2]Strategy 1: 45 course block(s) foundTotal so far: 93 coursesDone! Total courses saved: 200
π§° Technical Stack
| Component | Technology |
|---|---|
| Browser Automation | (Chromium) |
| Anti-Detection | Random user agents, disabled webdriver fingerprint |
| Navigation | Multi-strategy (domcontentloaded, load, commit) |
| Async | asyncio |
| Proxy | Apify Proxy (Residential) |
| Platform | Apify Actor (serverless, scalable) |
π¦ Changelog
v1.0.0 β Initial Release
- -based University Course catalog scraping
- 4-strategy extraction (blocks, single page, table, heading fallback)
- Automatic pagination with "Next" link detection
- Course code, title, credits, department, description, prerequisites extraction
- Deduplication by course code + title
- Configurable course limit (
maxCourses) - Configurable page wait time (
waitSeconds) - Residential proxy support
- Anti-detection user agent rotation
- Random anti-blocking delays (2β4 seconds)
- Real-time dataset push with ISO 8601 timestamp
- Graceful error handling and browser cleanup
π§βπ» Support & Feedback
- Issues & Ideas β Open a ticket on the Apify Actor issue tracker
- Documentation β Visit Apify Docs for platform guides
- Scraping Notes β Increase
waitSecondsfor slower university websites - Proxy Tips β Always use residential proxies for university catalog scraping
β οΈ Disclaimer: This actor scrapes publicly visible data from university course catalog pages. Please ensure your usage complies with the terms of service of the target institution. Intended for research and informational purposes only.