University Course Catalog Scraper avatar

University Course Catalog Scraper

Pricing

from $3.50 / 1,000 scraped results

Go to Apify Store
University Course Catalog Scraper

University Course Catalog Scraper

University Course Catalog Scraper extracts course information from university catalog websites using and Apify. It collects course codes, titles, credits, departments, descriptions, and prerequisites, supports pagination, and outputs structured JSON for academic research and catalog analysis. πŸŽ“πŸ“š

Pricing

from $3.50 / 1,000 scraped results

Rating

0.0

(0)

Developer

Data Pilot

Data Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

πŸŽ“ University Course Catalog Scraper

An Apify Actor that extracts structured University Course data from any university or college catalog website. Provide a catalog URL and the actor returns clean, structured University Course records β€” including course code, title, credits, department, description, and prerequisites β€” across paginated catalog pages.

With browser automation, multi-strategy extraction, and residential proxy support, this actor reliably scrapes University Course listings from virtually any academic institution's website.


πŸ“‹ Table of Contents


πŸ”₯ Features

  • βœ… Multi-Strategy Extraction β€” 4 fallback strategies to extract University Course data from any page layout
  • βœ… ** Browser Automation** β€” Real Chromium browser renders JavaScript-heavy catalog pages accurately
  • βœ… Automatic Pagination β€” Follows "Next" links to collect University Course listings across multiple pages
  • βœ… Deduplication β€” Skips duplicate University Course entries automatically
  • βœ… Anti-Detection β€” Rotates user agents and disables automation fingerprints
  • βœ… Proxy Support β€” Uses Apify residential proxies to bypass IP restrictions
  • βœ… Anti-Blocking Delays β€” Random delays between page requests to mimic human browsing
  • βœ… Configurable Limit β€” Set a maximum number of University Course records to collect
  • βœ… Error Handling β€” Graceful error recovery with detailed logging
  • βœ… Dataset Integration β€” Pushes all University Course data to Apify dataset in real time

βš™οΈ How It Works

The actor uses 4 progressive extraction strategies to handle different types of University Course catalog pages:

StrategyMethodBest For
1Course block <div> / <li> / <article> elementsStandard catalog pages with course cards
2Single course page with <h1> titleIndividual University Course detail pages
3HTML <table> with course headersTable-based University Course listings
4Heading fallback (h1–h4 with course code pattern)Simple or legacy catalog pages

Step-by-step flow:

  1. Input Parsing β€” Read the catalog URL, limit, and proxy settings
  2. Browser Launch β€” Start headless Chromium with anti-detection configuration
  3. Page Fetch β€” Navigate using fallback strategies (domcontentloaded β†’ load β†’ commit)
  4. Course Extraction β€” Apply the 4 strategies in order until courses are found
  5. Deduplication β€” Skip any already-seen course code + title combinations
  6. Dataset Push β€” Push each unique University Course record to Apify dataset
  7. Pagination β€” Follow "Next" page links and repeat until limit is reached
  8. Completion β€” Log total University Course records saved

πŸ“₯ Input

FieldTypeDefaultDescription
urlstringRequiredURL of the University Course catalog page to scrape
maxCoursesinteger100Maximum number of University Course records to collect
waitSecondsinteger5Seconds to wait after page load before extracting
useApifyProxybooleantrueWhether to use Apify proxy
apifyProxyGroupsarray["RESIDENTIAL"]Proxy groups to use

Example Input

{
"url": "https://catalog.university.edu/courses",
"maxCourses": 200,
"waitSeconds": 5,
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}

πŸ“€ Output

Each University Course record is pushed as a separate dataset item.

FieldTypeDescription
course_codestringCourse identifier (e.g., CS 101, ENG-202)
titlestringUniversity Course title/name
creditsstringCredit hours or units
departmentstringDepartment or school offering the course
descriptionstringCourse description (up to 400–500 characters)
prerequisitesstringPrerequisite courses or requirements
source_urlstringPage URL where the course was found
scraped_atstringISO 8601 UTC timestamp

Example Output

{
"course_code": "CS 301",
"title": "Data Structures and Algorithms",
"credits": "3",
"department": "Computer Science",
"description": "Study of fundamental data structures including arrays, linked lists, trees, and graphs. Analysis of sorting and searching algorithms.",
"prerequisites": "CS 101, MATH 201",
"source_url": "https://catalog.university.edu/courses/cs",
"scraped_at": "2025-03-22T12:34:56Z"
}

🎯 Use Cases

  • πŸŽ“ Course Catalog Aggregation β€” Build a searchable database of University Course offerings
  • πŸ“Š Curriculum Research β€” Compare University Course structures across institutions
  • πŸ€– Academic Recommendation Systems β€” Power course recommendation engines with structured data
  • πŸ“š EdTech Platforms β€” Enrich platforms with real University Course metadata
  • πŸ”¬ Higher Education Research β€” Analyze trends in University Course offerings by department
  • 🏫 Institutional Benchmarking β€” Compare credit hours, prerequisites, and departments
  • πŸ“ Accreditation Support β€” Collect structured University Course data for reporting

πŸš€ Quick Start

  1. Open on Apify β€” Visit the actor page and click Try for free
  2. Set Input β€” Paste your university catalog URL into the url field
  3. Configure Limit β€” Set maxCourses to how many courses you need
  4. Enable Proxy β€” Keep useApifyProxy enabled for reliable scraping
  5. Run the Actor β€” Click Start and monitor progress in the logs
  6. Download Results β€” Export the University Course dataset as JSON, CSV, or Excel

Sample Log Output

Starting scrape: https://catalog.university.edu/courses | limit=200
[Page 1]
Strategy 1: 48 course block(s) found
Total so far: 48 courses
[Page 2]
Strategy 1: 45 course block(s) found
Total so far: 93 courses
Done! Total courses saved: 200

🧰 Technical Stack

ComponentTechnology
Browser Automation(Chromium)
Anti-DetectionRandom user agents, disabled webdriver fingerprint
NavigationMulti-strategy (domcontentloaded, load, commit)
Asyncasyncio
ProxyApify Proxy (Residential)
PlatformApify Actor (serverless, scalable)

πŸ“¦ Changelog

v1.0.0 β€” Initial Release

  • -based University Course catalog scraping
  • 4-strategy extraction (blocks, single page, table, heading fallback)
  • Automatic pagination with "Next" link detection
  • Course code, title, credits, department, description, prerequisites extraction
  • Deduplication by course code + title
  • Configurable course limit (maxCourses)
  • Configurable page wait time (waitSeconds)
  • Residential proxy support
  • Anti-detection user agent rotation
  • Random anti-blocking delays (2–4 seconds)
  • Real-time dataset push with ISO 8601 timestamp
  • Graceful error handling and browser cleanup

πŸ§‘β€πŸ’» Support & Feedback

  • Issues & Ideas β€” Open a ticket on the Apify Actor issue tracker
  • Documentation β€” Visit Apify Docs for platform guides
  • Scraping Notes β€” Increase waitSeconds for slower university websites
  • Proxy Tips β€” Always use residential proxies for university catalog scraping

⚠️ Disclaimer: This actor scrapes publicly visible data from university course catalog pages. Please ensure your usage complies with the terms of service of the target institution. Intended for research and informational purposes only.