University Course Catalog Scraper: Edu Data Intelligence avatar

University Course Catalog Scraper: Edu Data Intelligence

Pricing

$7.99/month + usage

Go to Apify Store
University Course Catalog Scraper: Edu Data Intelligence

University Course Catalog Scraper: Edu Data Intelligence

University Courses: Scrape course listings from MIT OpenCourseWare, Harvard, Stanford, Yale and Cornell. Returns course code, title, credits, department, instructor, description and syllabus link. Filter by keyword and department. Demo mode included.

Pricing

$7.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

8 days ago

Last modified

Share

๐ŸŽ“ University Course Catalog Scraper โ€” Extract Courses from Harvard, MIT, Stanford & Any University

The most complete University Course Catalog Scraper on Apify. Extract real course data from Harvard, MIT, Stanford, Yale, Cornell, and any university course catalog โ€” course code, title, credits, instructor, department, description, syllabus link, level, and schedule. Demo mode returns 10 real courses instantly.


๐Ÿ†“ Try FREE for 2 hours โ€” no credit card needed. Then just $7.99/month for unlimited university course catalog scraping from any university, any department, any keyword.


๐Ÿ“Œ Table of Contents


๐Ÿ” What Is This Actor?

University Course Catalog Scraper is a production-grade Apify actor that extracts real, structured course data from public university course catalogs โ€” including Harvard, MIT OpenCourseWare, Stanford Online, Yale Open Courses, Cornell, and any other university whose catalog is publicly accessible on the web.

This university course catalog scraper returns clean, consistent records for every course: course code, title, credit hours, semester, department, instructor name, course description, syllabus link, academic level, and class schedule โ€” all in structured JSON ready for immediate use.

The scraper includes a Demo Mode that instantly returns 10 real sample courses from Harvard, MIT, Stanford, Yale, and Cornell โ€” no scraping required, no proxy needed โ€” so you can verify the output format and integrate it into your pipeline before committing to a live run.

Whether you are building an education discovery platform, conducting academic research on curriculum trends, tracking course offerings across universities, or aggregating online learning resources โ€” this university course catalog scraper gives you the structured data you need at a fraction of the cost of manual data collection.


๐Ÿš€ Why This Is the Best University Course Catalog Scraper

FeatureThis ActorManual ResearchGeneric ScrapersPaid Data APIs
Harvard + MIT + Stanford built-inโœ…โŒ Hours of browsingโš ๏ธโœ… Expensive
Any university URLโœ… Generic parserโŒโš ๏ธโŒ Fixed sources
Demo Mode โ€” instant dataโœ… 10 coursesโŒโŒโŒ
Course code extractionโœ…โœ… Manualโš ๏ธโœ…
Credit hours extractedโœ…โœ… ManualโŒโœ…
Instructor nameโœ…โœ… Manualโš ๏ธโœ…
Academic level detectionโœ… AutoโŒโŒโš ๏ธ
Syllabus link extractedโœ…โœ… Manualโš ๏ธโš ๏ธ
Keyword + department filterโœ… Built-inโŒโŒโœ…
Price$7.99/moFree (hours wasted)Varies$200+/mo

This university course catalog scraper gives you real course data from the world's top universities โ€” structured, clean, and at a price that makes manual research unnecessary.


๐Ÿ’ฐ Pricing

๐Ÿ†“ Free Trial โ€” 2 Hours, No Credit Card

Start using this university course catalog scraper immediately with a full 2-hour free trial. No credit card. No form. Click Try for free โ€” then enable Demo Mode to see 10 real course records instantly, or paste your first university catalog URL.

During the free trial you get:

  • โœ… Demo Mode โ€” 10 real courses from Harvard, MIT, Stanford, Yale, Cornell instantly
  • โœ… All supported universities โ€” Harvard, MIT OCW, Stanford Online, Yale, Cornell
  • โœ… Custom URL support โ€” any public university course catalog
  • โœ… All output fields โ€” course code, credits, instructor, syllabus, level, schedule
  • โœ… Keyword and department filtering

After the free trial, continue with $7.99/month โ€” less than a single university textbook. You get:

  • โœ… Unlimited runs โ€” scrape any university catalog, any time, as often as you need
  • โœ… All supported sources โ€” Harvard, MIT, Stanford, Yale, Cornell, and any custom URL
  • โœ… Full data โ€” all 12 output fields per course record
  • โœ… Apify scheduling โ€” automate semester-by-semester course catalog monitoring
  • โœ… Webhooks โ€” push new course data to your database, CMS, or app automatically
  • โœ… API access โ€” integrate university course data into any platform via Apify API

๐Ÿ’ก What $7.99/Month Gets You vs Alternatives

ToolPriceSourcesStructured DataAuto-Level
This University Course Catalog Scraper$7.99/moHarvard + MIT + Stanford + Any URLโœ… Full JSONโœ…
CourseReport / Coursera APIFree/variesOnline courses onlyโš ๏ธ LimitedโŒ
Custom data scraping service$500+/projectOne-timeโœ…โŒ
Manual catalog researchFreeAnyโŒ UnstructuredโŒ

๐ŸŽฏ $7.99/month for unlimited structured university course data โ€” start with the 2-hour free trial and Demo Mode to see exactly what you get.


๐Ÿ›๏ธ Supported Universities

This university course catalog scraper has built-in optimized parsers for the following universities, plus a universal generic parser for any other institution:

Built-in University Parsers

UniversityCatalog SourceParser Type
Harvard Universitycatalog.college.harvard.eduJSON-LD + HTML table
MIT OpenCourseWareocw.mit.eduCourse card + slug parsing
Stanford Onlineonline.stanford.edu/coursesGeneric + JSON-LD
Yale Open Coursesoyc.yale.eduGeneric + JSON-LD
Cornell Universitycourses.cornell.eduGeneric + course code regex

Auto-Detected University Names

The scraper automatically detects the university name from the domain for these institutions:

UC Berkeley, Columbia University, Princeton University, University of Oxford, University of Cambridge, UCL, New York University (NYU), UCLA, Carnegie Mellon University (CMU), Caltech, and many more.

Generic Parser โ€” Any Public University

For any university not listed above, the universal generic parser:

  • Reads JSON-LD Course, EducationalEvent, and LearningResource schema markup
  • Detects course code patterns (e.g. CS101, MATH 301, ECON115)
  • Extracts titles, credits, instructor names, and descriptions from HTML structure
  • Works on most modern university catalog websites built with standard HTML

Just paste any public university course catalog URL โ€” the scraper handles the rest.


๐ŸŽฏ Use Cases

๐Ÿ—๏ธ Education Platform Development

Build a course discovery or aggregation platform with real data from multiple top universities. This university course catalog scraper provides the structured JSON you need to populate your database โ€” course codes, credits, instructors, departments, and syllabus links all included.

๐Ÿ“Š Academic Research & Curriculum Analysis

Study curriculum trends across universities โ€” which courses are offered, how credit hours are structured, what topics are emphasized by department. Compare course offerings between institutions for academic policy research or accreditation studies.

๐Ÿ” Competitive Analysis for EdTech Companies

Track what Harvard, MIT, and Stanford are offering each semester. Identify curriculum gaps or emerging topic areas by monitoring new course additions and departmental shifts over time using scheduled scraper runs.

๐ŸŽ“ Student & Academic Advising Tools

Build course recommendation tools or academic planning apps with real course catalog data. Extract course codes, credits, prerequisites (from descriptions), and level information to power degree planning features.

๐Ÿ“ˆ HR & Workforce Development Research

Analyze what skills and subjects top universities are teaching to understand emerging workforce trends. Track which departments are growing their course offerings in AI, sustainability, healthcare, and other evolving fields.

๐Ÿค– AI Training Data โ€” Education Domain

Build training datasets for educational AI โ€” course recommendation engines, adaptive learning systems, or NLP models for academic text. This university course catalog scraper provides structured course metadata at scale.

๐ŸŒ International Education Research

Collect and compare course catalogs from universities across countries. Use the generic parser to scrape Oxford, Cambridge, and other international institutions alongside US universities.

๐Ÿ“š Open Educational Resource Aggregation

Aggregate publicly available course materials from MIT OpenCourseWare, Yale Open Courses, and Stanford Online into one unified resource directory โ€” with direct syllabus links for each course.


๐Ÿ“‹ Output Fields (Full Reference)

Every record from this university course catalog scraper contains the following 12 fields:

FieldTypeDescriptionExample
Course_CodestringCourse code or number"CS50", "MATH101", "6.034"
TitlestringFull course title"Introduction to Computer Science"
CreditsstringCredit hours or units"4.0", "12 units", "3-4 units"
SemesterstringTerm when course is offered"Fall 2026", "Spring 2026", "Multiple Terms"
DepartmentstringAcademic department"Computer Science", "Mathematics"
InstructorstringProfessor or instructor name"David J. Malan", "Andrew Ng"
DescriptionstringCourse description (max 500 chars)"A broad and robust understanding of computer science..."
Syllabus_LinkstringDirect URL to course syllabus or page"https://cs50.harvard.edu/college/2026/fall/"
LevelstringAcademic level (auto-detected)"Undergraduate", "Graduate", "PhD"
SchedulestringClass days and times"Mon/Wed 10:00โ€“11:30 AM"
universitystringUniversity name"Harvard University", "MIT", "Stanford University"
TimestampstringISO extraction timestamp"2026-03-28T21:20:00Z"

โš™๏ธ Input Parameters

{
"demo_mode": false,
"university_urls": [
"https://ocw.mit.edu/search/?q=python&type=course",
"https://online.stanford.edu/courses"
],
"university_url": "",
"keyword": "machine learning",
"department": "Computer Science",
"max_results": 30,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
ParameterTypeDefaultDescription
demo_modebooleanfalseSet to true to instantly receive 10 real sample courses from Harvard, MIT, Stanford, Yale, and Cornell โ€” no scraping, no proxy needed
university_urlsarray or string[]List of university course catalog URLs to scrape. One per item or newline-separated string.
university_urlstring""Single university catalog URL shortcut โ€” automatically added to the list
keywordstring""Filter courses by keyword in title or department โ€” e.g. "machine learning", "calculus", "economics"
departmentstring""Filter by department name โ€” e.g. "Computer Science", "Mathematics", "Economics"
max_resultsinteger30Maximum total course records to return across all URLs
proxyConfigurationobjectOffApify proxy settings โ€” recommended for Harvard, MIT, and Stanford catalog pages

๐Ÿ“ฆ Example Inputs & Outputs

Example 1: Demo Mode โ€” 10 Real Courses Instantly

Input:

{
"demo_mode": true
}

Output (10 records, no scraping required):

[
{
"Course_Code": "CS50",
"Title": "Introduction to Computer Science",
"Credits": "4.0",
"Semester": "Fall 2026",
"Department": "Computer Science",
"Instructor": "David J. Malan",
"Description": "A broad and robust understanding of computer science and programming. Tracks include Python, SQL, HTML, CSS, and JavaScript.",
"Syllabus_Link":"https://cs50.harvard.edu/college/2026/fall/",
"Level": "Undergraduate",
"Schedule": "Mon/Wed 10:00โ€“11:30 AM",
"university": "Harvard University",
"Timestamp": "2026-03-28T21:20:00Z"
},
{
"Course_Code": "CS229",
"Title": "Machine Learning",
"Credits": "3-4 units",
"Semester": "Fall 2026",
"Department": "Computer Science",
"Instructor": "Andrew Ng",
"Description": "Covers supervised, unsupervised, and reinforcement learning. Topics: linear regression, neural networks, SVMs, clustering, PCA.",
"Syllabus_Link":"https://online.stanford.edu/courses/cs229-machine-learning",
"Level": "Graduate",
"Schedule": "Mon/Wed 4:30โ€“5:50 PM",
"university": "Stanford University",
"Timestamp": "2026-03-28T21:20:00Z"
}
]

Example 2: MIT OpenCourseWare โ€” Python Courses

Input:

{
"university_urls": ["https://ocw.mit.edu/search/?q=python&type=course"],
"keyword": "python",
"max_results": 15
}

Output: Up to 15 MIT OCW Python-related courses โ€” with course codes (6.006, 6.034), titles, departments, MIT faculty instructors, description excerpts, and direct syllabus links.


Example 3: Stanford Online โ€” Graduate AI Courses

Input:

{
"university_urls": ["https://online.stanford.edu/courses"],
"keyword": "artificial intelligence",
"department": "Computer Science",
"max_results": 10
}

Output: Up to 10 Stanford AI courses filtered by Computer Science department โ€” with credit hours, instructor names, academic level, and course descriptions.


Example 4: Multiple Universities โ€” One Run

Input:

{
"university_urls": [
"https://ocw.mit.edu/search/?q=machine+learning&type=course",
"https://online.stanford.edu/courses",
"https://oyc.yale.edu/courses"
],
"keyword": "machine learning",
"max_results": 30
}

Output: Up to 30 machine learning course records from MIT, Stanford, and Yale โ€” unified in one structured dataset with consistent fields regardless of each university's catalog format.


Example 5: Custom University URL

Input:

{
"university_url": "https://courses.cornell.edu/content.php?catoid=55&navoid=13552",
"keyword": "data science",
"max_results": 20
}

Output: Up to 20 data science courses from Cornell's course catalog โ€” extracted using the generic course code pattern detector and HTML parser.


Example 6: Department Filter

Input:

{
"university_urls": ["https://ocw.mit.edu/search/?type=course"],
"department": "Physics",
"max_results": 15
}

Output: Up to 15 MIT courses where the department field contains "Physics" โ€” filtered after extraction.


๐ŸŽฎ Demo Mode โ€” 10 Real Courses Instantly

Set demo_mode: true to receive 10 pre-loaded real course records from top universities โ€” returned in under 5 seconds without any web scraping.

Demo includes real courses from:

  • Harvard University โ€” CS50 (Intro to CS), MATH101 (Calculus I)
  • MIT โ€” 6.034 (Artificial Intelligence), 6.006 (Introduction to Algorithms), PHYS8.01 (Classical Mechanics)
  • Stanford University โ€” CS221 (AI Principles), CS229 (Machine Learning with Andrew Ng)
  • Yale University โ€” ECON115 (Economics of the Environment), HIST1255 (The American Revolution)
  • Cornell University โ€” LING3201 (Introduction to Linguistics)

When to use Demo Mode:

  • Verifying output structure before building a pipeline
  • Testing CRM or database import with real course data
  • Showing stakeholders what the scraper produces
  • Development and integration testing

โš™๏ธ How the University Course Catalog Scraper Works

This university course catalog scraper uses source-specific parsers for known universities and a universal generic parser for everything else:

Harvard University Parser

Reads JSON-LD Course and EducationalEvent schema markup first โ€” extracting course code, title, credits, department, and description from structured data. Falls back to HTML table parsing for catalogs using traditional table layouts, detecting course codes and credit hour patterns automatically.

MIT OpenCourseWare Parser

Extracts course cards from MIT OCW's listing pages. Course codes are derived from the URL slug (e.g. /courses/6-006-introduction-to-algorithms โ†’ 6.006). Credits default to MIT's standard 12-unit system. Department and instructor data are extracted from card metadata elements.

Stanford Online Parser

Uses JSON-LD schema detection as the primary method, with HTML card fallback. Course level (Graduate vs Undergraduate) is auto-detected from title keywords and course number ranges.

Generic Parser (Any University)

For any URL not matching a known university:

  1. JSON-LD Detection: Reads Course, EducationalEvent, and LearningResource schema.org markup โ€” the most reliable extraction method for modern university websites
  2. Course Code Pattern Matching: Scans page text for standard course code patterns (CS101, MATH 301, ECON115, 6.034) using regex detection
  3. Credit Extraction: Finds credit/unit mentions near each course code using pattern matching
  4. Instructor Detection: Identifies instructor names following "Instructor:", "Professor:", or "Taught by:" labels
  5. Description Extraction: Pulls description text from <p> tags and description class elements near each course entry

๐Ÿ” Keyword & Department Filtering

Keyword Filter (keyword)

Applied during extraction โ€” only courses where the keyword appears in the title or department are returned. Case-insensitive partial match.

keyword: "machine learning" โ†’ Machine Learning, Introduction to Machine Learning, Applied ML
keyword: "calculus" โ†’ Calculus I, Multivariable Calculus, Calculus for Engineers
keyword: "organic chemistry" โ†’ Organic Chemistry, Advanced Organic Chemistry
keyword: "algorithms" โ†’ Introduction to Algorithms, Algorithm Design

Department Filter (department)

Applied after extraction โ€” only courses where the department field contains the filter string are included. Case-insensitive partial match.

department: "Computer Science" โ†’ CS courses only
department: "Mathematics" โ†’ Math courses only
department: "Economics" โ†’ Economics courses only
department: "Electrical" โ†’ Electrical Engineering and Computer Science

Both filters can be used together for precise results โ€” e.g. keyword: "AI" + department: "Computer Science" returns only AI-related CS courses.


๐ŸŽ“ Course Level Detection

This university course catalog scraper automatically assigns an academic level to every course based on keywords in the title and course description:

LevelDetection Keywords
Undergraduate(default โ€” no explicit graduate-level keyword detected)
Advanced Undergraduateadvanced, upper-level, junior, senior, 300-level, 400-level, 500-level
Graduategraduate, grad, master, MBA, MSc
PhDPhD, doctoral, dissertation, graduate seminar

Level auto-detection means you can filter your results by academic level after extraction โ€” for example, keeping only Graduate and PhD courses for a research tool targeting postgraduate students.


๐ŸŒ Proxy Configuration

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

When to Use Proxy

  • Harvard catalog pages โ€” Harvard's catalog uses Cloudflare and benefits from residential IPs
  • Stanford and MIT โ€” residential proxy improves reliability for repeated runs
  • Large-scale runs โ€” any run with 10+ URLs benefits from proxy rotation
  • Production scheduling โ€” daily or weekly automated runs should always use residential proxy

When Proxy Is Optional

  • Demo Mode โ€” proxy is never needed in demo mode (no scraping occurs)
  • MIT OpenCourseWare search โ€” OCW is lightly protected and often works without proxy
  • Development testing โ€” small test runs on 1โ€“2 URLs

โšก Performance & Speed

Speed Benchmarks (with residential proxy)

ModeURLsMax ResultsEstimated Time
Demo Modeโ€”10~3โ€“5 seconds
MIT OCW (keyword search)120~30โ€“60 seconds
Stanford Online120~30โ€“60 seconds
Harvard catalog120~45โ€“90 seconds
Multiple universities330~2โ€“4 minutes
Multiple universities550~4โ€“7 minutes

Fallback Behavior

If a university URL fails to return valid course data (e.g. access denied, empty page, or no parseable course patterns), the scraper automatically falls back to the Demo Mode data โ€” ensuring you always receive some output rather than an empty dataset.

Scheduling for Semester Monitoring

Use Apify's built-in scheduler to run this university course catalog scraper at the start of each academic semester (September, January, June) to automatically capture updated course offerings. Use webhooks to push new course additions to your database or notification system.


โ“ FAQ

Q: Does this scraper require a university login or student account? A: No. This university course catalog scraper only accesses publicly available catalog pages โ€” the same pages visible to anyone browsing the university website without an account. Private course management systems (Blackboard, Canvas, Moodle) that require login are not supported.

Q: What is Demo Mode and when should I use it? A: Demo Mode instantly returns 10 real pre-loaded course records from Harvard, MIT, Stanford, Yale, and Cornell โ€” without any scraping. Use it to test your integration pipeline, verify the output format, or show stakeholders what the data looks like before running a live scrape.

Q: Can I scrape any university โ€” not just the ones listed? A: Yes. Add any public university course catalog URL to university_urls. The generic parser handles most modern university websites by detecting JSON-LD schema markup and course code patterns. Success rate depends on how the catalog page is built โ€” structured data markup gives the best results.

Q: Why does the scraper sometimes return "N/A" for credits or schedule? A: Some university catalog pages do not display credit hours or schedule information on their listing pages โ€” this varies by university and page type. Credit and schedule data is most reliably extracted from individual course detail pages. Future versions will include deeper detail page extraction.

Q: How accurate is the Course Level detection? A: Level detection is keyword-based and works well for English-language catalogs. Graduate and PhD level courses are typically clearly labeled in their titles or descriptions. For catalogs that use numeric course levels (e.g. 100โ€“400 level) without text labels, the generic "Undergraduate" level is assigned as default.

Q: Can I filter results to only show graduate courses? A: Not directly via input โ€” but the Level field is returned for every course. After the run, you can filter the JSON/CSV output to keep only records where Level == "Graduate" or Level == "PhD". A direct level filter input parameter is planned for a future version.

Q: Can I use this to scrape online course platforms like Coursera or edX? A: No โ€” this scraper is designed specifically for university course catalog pages (formal academic offerings). Online learning platforms like Coursera, edX, and Udemy have their own different structures and are not supported by this actor.

Q: What does the 2-hour free trial include? A: Everything โ€” Demo Mode, all supported universities, custom URL support, keyword and department filtering, full JSON output. After 2 hours, subscribe for $7.99/month to continue with unlimited runs.

Q: Can I export results to Excel or Google Sheets? A: Yes โ€” download the dataset as CSV from the Apify Output tab (opens correctly in Excel) or use Apify's Google Sheets integration to push results automatically after each run.

Q: What happens if a university changes their catalog URL or page structure? A: The generic parser handles most structural changes through JSON-LD detection and course code pattern matching. For significant redesigns of known universities (Harvard, MIT, Stanford), the actor is updated to maintain compatibility. Report broken URLs via the actor page.


๐Ÿ“œ Changelog

v1.0.0 (Current)

  • โœ… Harvard University catalog parser โ€” JSON-LD + HTML table fallback
  • โœ… MIT OpenCourseWare parser โ€” course card extraction + URL slug-to-code conversion
  • โœ… Stanford Online parser โ€” generic + JSON-LD
  • โœ… Yale Open Courses support
  • โœ… Cornell University course catalog support
  • โœ… Generic universal parser โ€” JSON-LD โ†’ course code regex โ†’ HTML structure (any public university)
  • โœ… Demo Mode โ€” 10 real pre-loaded courses from 5 universities, instant (no scraping)
  • โœ… Auto university name detection from 15+ university domains
  • โœ… Course level auto-detection (Undergraduate / Advanced Undergraduate / Graduate / PhD)
  • โœ… Keyword filter โ€” applied during extraction
  • โœ… Department filter โ€” applied after extraction
  • โœ… Automatic fallback to Demo Mode when live scraping returns no results
  • โœ… Credit hour extraction with unit format normalization
  • โœ… Instructor name extraction from structured data and label patterns
  • โœ… Syllabus link extraction for every course record

This university course catalog scraper accesses publicly available course catalog information from university websites โ€” the same data visible to anyone browsing these pages in a browser without an account.

Usage guidelines:

  • Use extracted course data for legitimate education research, platform development, academic analysis, and business intelligence purposes
  • Respect each university's Terms of Service regarding automated access to their websites
  • Course catalog data (titles, codes, descriptions, instructor names) is factual public information โ€” its use for research and aggregation is broadly accepted
  • Do not use this tool to scrape gated course content, student records, or any information requiring authentication
  • Syllabus links and course pages belong to their respective universities โ€” always attribute the source appropriately when displaying course data

๐Ÿค Support & Feedback

  • Bug or broken catalog? Contact via the Apify actor page โ€” we fix broken parsers fast
  • Need a specific university added? Request via the Apify Community forum or actor page
  • Loving it? Please leave a โญ review โ€” it helps other educators and researchers find this university course catalog scraper!

๐Ÿ†“ 2-hour free trial โ†’ ๐Ÿ’ณ $7.99/month after โ€” the most complete university course catalog scraper available on the Apify platform.


Built with โค๏ธ on Apify ยท University Course Catalog Scraper โ€” Harvard, MIT, Stanford & More
Extract real course data from any university catalog โ€” structured, clean, and ready to use