University Course Catalog Scraper: Edu Data Intelligence
Pricing
$7.99/month + usage
University Course Catalog Scraper: Edu Data Intelligence
University Courses: Scrape course listings from MIT OpenCourseWare, Harvard, Stanford, Yale and Cornell. Returns course code, title, credits, department, instructor, description and syllabus link. Filter by keyword and department. Demo mode included.
Pricing
$7.99/month + usage
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
8 days ago
Last modified
Categories
Share
๐ University Course Catalog Scraper โ Extract Courses from Harvard, MIT, Stanford & Any University
The most complete University Course Catalog Scraper on Apify. Extract real course data from Harvard, MIT, Stanford, Yale, Cornell, and any university course catalog โ course code, title, credits, instructor, department, description, syllabus link, level, and schedule. Demo mode returns 10 real courses instantly.
๐ Try FREE for 2 hours โ no credit card needed. Then just $7.99/month for unlimited university course catalog scraping from any university, any department, any keyword.
๐ Table of Contents
- What Is This Actor?
- Why This Is the Best University Course Catalog Scraper
- ๐ฐ Pricing
- Supported Universities
- Use Cases
- Output Fields (Full Reference)
- Input Parameters
- Example Inputs & Outputs
- Demo Mode โ 10 Real Courses Instantly
- How the University Course Catalog Scraper Works
- Keyword & Department Filtering
- Course Level Detection
- Proxy Configuration
- Performance & Speed
- FAQ
- Changelog
- Legal & Terms of Use
๐ What Is This Actor?
University Course Catalog Scraper is a production-grade Apify actor that extracts real, structured course data from public university course catalogs โ including Harvard, MIT OpenCourseWare, Stanford Online, Yale Open Courses, Cornell, and any other university whose catalog is publicly accessible on the web.
This university course catalog scraper returns clean, consistent records for every course: course code, title, credit hours, semester, department, instructor name, course description, syllabus link, academic level, and class schedule โ all in structured JSON ready for immediate use.
The scraper includes a Demo Mode that instantly returns 10 real sample courses from Harvard, MIT, Stanford, Yale, and Cornell โ no scraping required, no proxy needed โ so you can verify the output format and integrate it into your pipeline before committing to a live run.
Whether you are building an education discovery platform, conducting academic research on curriculum trends, tracking course offerings across universities, or aggregating online learning resources โ this university course catalog scraper gives you the structured data you need at a fraction of the cost of manual data collection.
๐ Why This Is the Best University Course Catalog Scraper
| Feature | This Actor | Manual Research | Generic Scrapers | Paid Data APIs |
|---|---|---|---|---|
| Harvard + MIT + Stanford built-in | โ | โ Hours of browsing | โ ๏ธ | โ Expensive |
| Any university URL | โ Generic parser | โ | โ ๏ธ | โ Fixed sources |
| Demo Mode โ instant data | โ 10 courses | โ | โ | โ |
| Course code extraction | โ | โ Manual | โ ๏ธ | โ |
| Credit hours extracted | โ | โ Manual | โ | โ |
| Instructor name | โ | โ Manual | โ ๏ธ | โ |
| Academic level detection | โ Auto | โ | โ | โ ๏ธ |
| Syllabus link extracted | โ | โ Manual | โ ๏ธ | โ ๏ธ |
| Keyword + department filter | โ Built-in | โ | โ | โ |
| Price | $7.99/mo | Free (hours wasted) | Varies | $200+/mo |
This university course catalog scraper gives you real course data from the world's top universities โ structured, clean, and at a price that makes manual research unnecessary.
๐ฐ Pricing
๐ Free Trial โ 2 Hours, No Credit Card
Start using this university course catalog scraper immediately with a full 2-hour free trial. No credit card. No form. Click Try for free โ then enable Demo Mode to see 10 real course records instantly, or paste your first university catalog URL.
During the free trial you get:
- โ Demo Mode โ 10 real courses from Harvard, MIT, Stanford, Yale, Cornell instantly
- โ All supported universities โ Harvard, MIT OCW, Stanford Online, Yale, Cornell
- โ Custom URL support โ any public university course catalog
- โ All output fields โ course code, credits, instructor, syllabus, level, schedule
- โ Keyword and department filtering
๐ณ Paid Plan โ $7.99/Month
After the free trial, continue with $7.99/month โ less than a single university textbook. You get:
- โ Unlimited runs โ scrape any university catalog, any time, as often as you need
- โ All supported sources โ Harvard, MIT, Stanford, Yale, Cornell, and any custom URL
- โ Full data โ all 12 output fields per course record
- โ Apify scheduling โ automate semester-by-semester course catalog monitoring
- โ Webhooks โ push new course data to your database, CMS, or app automatically
- โ API access โ integrate university course data into any platform via Apify API
๐ก What $7.99/Month Gets You vs Alternatives
| Tool | Price | Sources | Structured Data | Auto-Level |
|---|---|---|---|---|
| This University Course Catalog Scraper | $7.99/mo | Harvard + MIT + Stanford + Any URL | โ Full JSON | โ |
| CourseReport / Coursera API | Free/varies | Online courses only | โ ๏ธ Limited | โ |
| Custom data scraping service | $500+/project | One-time | โ | โ |
| Manual catalog research | Free | Any | โ Unstructured | โ |
๐ฏ $7.99/month for unlimited structured university course data โ start with the 2-hour free trial and Demo Mode to see exactly what you get.
๐๏ธ Supported Universities
This university course catalog scraper has built-in optimized parsers for the following universities, plus a universal generic parser for any other institution:
Built-in University Parsers
| University | Catalog Source | Parser Type |
|---|---|---|
| Harvard University | catalog.college.harvard.edu | JSON-LD + HTML table |
| MIT OpenCourseWare | ocw.mit.edu | Course card + slug parsing |
| Stanford Online | online.stanford.edu/courses | Generic + JSON-LD |
| Yale Open Courses | oyc.yale.edu | Generic + JSON-LD |
| Cornell University | courses.cornell.edu | Generic + course code regex |
Auto-Detected University Names
The scraper automatically detects the university name from the domain for these institutions:
UC Berkeley, Columbia University, Princeton University, University of Oxford, University of Cambridge, UCL, New York University (NYU), UCLA, Carnegie Mellon University (CMU), Caltech, and many more.
Generic Parser โ Any Public University
For any university not listed above, the universal generic parser:
- Reads JSON-LD
Course,EducationalEvent, andLearningResourceschema markup - Detects course code patterns (e.g.
CS101,MATH 301,ECON115) - Extracts titles, credits, instructor names, and descriptions from HTML structure
- Works on most modern university catalog websites built with standard HTML
Just paste any public university course catalog URL โ the scraper handles the rest.
๐ฏ Use Cases
๐๏ธ Education Platform Development
Build a course discovery or aggregation platform with real data from multiple top universities. This university course catalog scraper provides the structured JSON you need to populate your database โ course codes, credits, instructors, departments, and syllabus links all included.
๐ Academic Research & Curriculum Analysis
Study curriculum trends across universities โ which courses are offered, how credit hours are structured, what topics are emphasized by department. Compare course offerings between institutions for academic policy research or accreditation studies.
๐ Competitive Analysis for EdTech Companies
Track what Harvard, MIT, and Stanford are offering each semester. Identify curriculum gaps or emerging topic areas by monitoring new course additions and departmental shifts over time using scheduled scraper runs.
๐ Student & Academic Advising Tools
Build course recommendation tools or academic planning apps with real course catalog data. Extract course codes, credits, prerequisites (from descriptions), and level information to power degree planning features.
๐ HR & Workforce Development Research
Analyze what skills and subjects top universities are teaching to understand emerging workforce trends. Track which departments are growing their course offerings in AI, sustainability, healthcare, and other evolving fields.
๐ค AI Training Data โ Education Domain
Build training datasets for educational AI โ course recommendation engines, adaptive learning systems, or NLP models for academic text. This university course catalog scraper provides structured course metadata at scale.
๐ International Education Research
Collect and compare course catalogs from universities across countries. Use the generic parser to scrape Oxford, Cambridge, and other international institutions alongside US universities.
๐ Open Educational Resource Aggregation
Aggregate publicly available course materials from MIT OpenCourseWare, Yale Open Courses, and Stanford Online into one unified resource directory โ with direct syllabus links for each course.
๐ Output Fields (Full Reference)
Every record from this university course catalog scraper contains the following 12 fields:
| Field | Type | Description | Example |
|---|---|---|---|
Course_Code | string | Course code or number | "CS50", "MATH101", "6.034" |
Title | string | Full course title | "Introduction to Computer Science" |
Credits | string | Credit hours or units | "4.0", "12 units", "3-4 units" |
Semester | string | Term when course is offered | "Fall 2026", "Spring 2026", "Multiple Terms" |
Department | string | Academic department | "Computer Science", "Mathematics" |
Instructor | string | Professor or instructor name | "David J. Malan", "Andrew Ng" |
Description | string | Course description (max 500 chars) | "A broad and robust understanding of computer science..." |
Syllabus_Link | string | Direct URL to course syllabus or page | "https://cs50.harvard.edu/college/2026/fall/" |
Level | string | Academic level (auto-detected) | "Undergraduate", "Graduate", "PhD" |
Schedule | string | Class days and times | "Mon/Wed 10:00โ11:30 AM" |
university | string | University name | "Harvard University", "MIT", "Stanford University" |
Timestamp | string | ISO extraction timestamp | "2026-03-28T21:20:00Z" |
โ๏ธ Input Parameters
{"demo_mode": false,"university_urls": ["https://ocw.mit.edu/search/?q=python&type=course","https://online.stanford.edu/courses"],"university_url": "","keyword": "machine learning","department": "Computer Science","max_results": 30,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
| Parameter | Type | Default | Description |
|---|---|---|---|
demo_mode | boolean | false | Set to true to instantly receive 10 real sample courses from Harvard, MIT, Stanford, Yale, and Cornell โ no scraping, no proxy needed |
university_urls | array or string | [] | List of university course catalog URLs to scrape. One per item or newline-separated string. |
university_url | string | "" | Single university catalog URL shortcut โ automatically added to the list |
keyword | string | "" | Filter courses by keyword in title or department โ e.g. "machine learning", "calculus", "economics" |
department | string | "" | Filter by department name โ e.g. "Computer Science", "Mathematics", "Economics" |
max_results | integer | 30 | Maximum total course records to return across all URLs |
proxyConfiguration | object | Off | Apify proxy settings โ recommended for Harvard, MIT, and Stanford catalog pages |
๐ฆ Example Inputs & Outputs
Example 1: Demo Mode โ 10 Real Courses Instantly
Input:
{"demo_mode": true}
Output (10 records, no scraping required):
[{"Course_Code": "CS50","Title": "Introduction to Computer Science","Credits": "4.0","Semester": "Fall 2026","Department": "Computer Science","Instructor": "David J. Malan","Description": "A broad and robust understanding of computer science and programming. Tracks include Python, SQL, HTML, CSS, and JavaScript.","Syllabus_Link":"https://cs50.harvard.edu/college/2026/fall/","Level": "Undergraduate","Schedule": "Mon/Wed 10:00โ11:30 AM","university": "Harvard University","Timestamp": "2026-03-28T21:20:00Z"},{"Course_Code": "CS229","Title": "Machine Learning","Credits": "3-4 units","Semester": "Fall 2026","Department": "Computer Science","Instructor": "Andrew Ng","Description": "Covers supervised, unsupervised, and reinforcement learning. Topics: linear regression, neural networks, SVMs, clustering, PCA.","Syllabus_Link":"https://online.stanford.edu/courses/cs229-machine-learning","Level": "Graduate","Schedule": "Mon/Wed 4:30โ5:50 PM","university": "Stanford University","Timestamp": "2026-03-28T21:20:00Z"}]
Example 2: MIT OpenCourseWare โ Python Courses
Input:
{"university_urls": ["https://ocw.mit.edu/search/?q=python&type=course"],"keyword": "python","max_results": 15}
Output: Up to 15 MIT OCW Python-related courses โ with course codes (6.006, 6.034), titles, departments, MIT faculty instructors, description excerpts, and direct syllabus links.
Example 3: Stanford Online โ Graduate AI Courses
Input:
{"university_urls": ["https://online.stanford.edu/courses"],"keyword": "artificial intelligence","department": "Computer Science","max_results": 10}
Output: Up to 10 Stanford AI courses filtered by Computer Science department โ with credit hours, instructor names, academic level, and course descriptions.
Example 4: Multiple Universities โ One Run
Input:
{"university_urls": ["https://ocw.mit.edu/search/?q=machine+learning&type=course","https://online.stanford.edu/courses","https://oyc.yale.edu/courses"],"keyword": "machine learning","max_results": 30}
Output: Up to 30 machine learning course records from MIT, Stanford, and Yale โ unified in one structured dataset with consistent fields regardless of each university's catalog format.
Example 5: Custom University URL
Input:
{"university_url": "https://courses.cornell.edu/content.php?catoid=55&navoid=13552","keyword": "data science","max_results": 20}
Output: Up to 20 data science courses from Cornell's course catalog โ extracted using the generic course code pattern detector and HTML parser.
Example 6: Department Filter
Input:
{"university_urls": ["https://ocw.mit.edu/search/?type=course"],"department": "Physics","max_results": 15}
Output: Up to 15 MIT courses where the department field contains "Physics" โ filtered after extraction.
๐ฎ Demo Mode โ 10 Real Courses Instantly
Set demo_mode: true to receive 10 pre-loaded real course records from top universities โ returned in under 5 seconds without any web scraping.
Demo includes real courses from:
- Harvard University โ CS50 (Intro to CS), MATH101 (Calculus I)
- MIT โ 6.034 (Artificial Intelligence), 6.006 (Introduction to Algorithms), PHYS8.01 (Classical Mechanics)
- Stanford University โ CS221 (AI Principles), CS229 (Machine Learning with Andrew Ng)
- Yale University โ ECON115 (Economics of the Environment), HIST1255 (The American Revolution)
- Cornell University โ LING3201 (Introduction to Linguistics)
When to use Demo Mode:
- Verifying output structure before building a pipeline
- Testing CRM or database import with real course data
- Showing stakeholders what the scraper produces
- Development and integration testing
โ๏ธ How the University Course Catalog Scraper Works
This university course catalog scraper uses source-specific parsers for known universities and a universal generic parser for everything else:
Harvard University Parser
Reads JSON-LD Course and EducationalEvent schema markup first โ extracting course code, title, credits, department, and description from structured data. Falls back to HTML table parsing for catalogs using traditional table layouts, detecting course codes and credit hour patterns automatically.
MIT OpenCourseWare Parser
Extracts course cards from MIT OCW's listing pages. Course codes are derived from the URL slug (e.g. /courses/6-006-introduction-to-algorithms โ 6.006). Credits default to MIT's standard 12-unit system. Department and instructor data are extracted from card metadata elements.
Stanford Online Parser
Uses JSON-LD schema detection as the primary method, with HTML card fallback. Course level (Graduate vs Undergraduate) is auto-detected from title keywords and course number ranges.
Generic Parser (Any University)
For any URL not matching a known university:
- JSON-LD Detection: Reads
Course,EducationalEvent, andLearningResourceschema.org markup โ the most reliable extraction method for modern university websites - Course Code Pattern Matching: Scans page text for standard course code patterns (
CS101,MATH 301,ECON115,6.034) using regex detection - Credit Extraction: Finds credit/unit mentions near each course code using pattern matching
- Instructor Detection: Identifies instructor names following
"Instructor:","Professor:", or"Taught by:"labels - Description Extraction: Pulls description text from
<p>tags and description class elements near each course entry
๐ Keyword & Department Filtering
Keyword Filter (keyword)
Applied during extraction โ only courses where the keyword appears in the title or department are returned. Case-insensitive partial match.
keyword: "machine learning" โ Machine Learning, Introduction to Machine Learning, Applied MLkeyword: "calculus" โ Calculus I, Multivariable Calculus, Calculus for Engineerskeyword: "organic chemistry" โ Organic Chemistry, Advanced Organic Chemistrykeyword: "algorithms" โ Introduction to Algorithms, Algorithm Design
Department Filter (department)
Applied after extraction โ only courses where the department field contains the filter string are included. Case-insensitive partial match.
department: "Computer Science" โ CS courses onlydepartment: "Mathematics" โ Math courses onlydepartment: "Economics" โ Economics courses onlydepartment: "Electrical" โ Electrical Engineering and Computer Science
Both filters can be used together for precise results โ e.g. keyword: "AI" + department: "Computer Science" returns only AI-related CS courses.
๐ Course Level Detection
This university course catalog scraper automatically assigns an academic level to every course based on keywords in the title and course description:
| Level | Detection Keywords |
|---|---|
| Undergraduate | (default โ no explicit graduate-level keyword detected) |
| Advanced Undergraduate | advanced, upper-level, junior, senior, 300-level, 400-level, 500-level |
| Graduate | graduate, grad, master, MBA, MSc |
| PhD | PhD, doctoral, dissertation, graduate seminar |
Level auto-detection means you can filter your results by academic level after extraction โ for example, keeping only Graduate and PhD courses for a research tool targeting postgraduate students.
๐ Proxy Configuration
Recommended Setup
{"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
When to Use Proxy
- Harvard catalog pages โ Harvard's catalog uses Cloudflare and benefits from residential IPs
- Stanford and MIT โ residential proxy improves reliability for repeated runs
- Large-scale runs โ any run with 10+ URLs benefits from proxy rotation
- Production scheduling โ daily or weekly automated runs should always use residential proxy
When Proxy Is Optional
- Demo Mode โ proxy is never needed in demo mode (no scraping occurs)
- MIT OpenCourseWare search โ OCW is lightly protected and often works without proxy
- Development testing โ small test runs on 1โ2 URLs
โก Performance & Speed
Speed Benchmarks (with residential proxy)
| Mode | URLs | Max Results | Estimated Time |
|---|---|---|---|
| Demo Mode | โ | 10 | ~3โ5 seconds |
| MIT OCW (keyword search) | 1 | 20 | ~30โ60 seconds |
| Stanford Online | 1 | 20 | ~30โ60 seconds |
| Harvard catalog | 1 | 20 | ~45โ90 seconds |
| Multiple universities | 3 | 30 | ~2โ4 minutes |
| Multiple universities | 5 | 50 | ~4โ7 minutes |
Fallback Behavior
If a university URL fails to return valid course data (e.g. access denied, empty page, or no parseable course patterns), the scraper automatically falls back to the Demo Mode data โ ensuring you always receive some output rather than an empty dataset.
Scheduling for Semester Monitoring
Use Apify's built-in scheduler to run this university course catalog scraper at the start of each academic semester (September, January, June) to automatically capture updated course offerings. Use webhooks to push new course additions to your database or notification system.
โ FAQ
Q: Does this scraper require a university login or student account? A: No. This university course catalog scraper only accesses publicly available catalog pages โ the same pages visible to anyone browsing the university website without an account. Private course management systems (Blackboard, Canvas, Moodle) that require login are not supported.
Q: What is Demo Mode and when should I use it? A: Demo Mode instantly returns 10 real pre-loaded course records from Harvard, MIT, Stanford, Yale, and Cornell โ without any scraping. Use it to test your integration pipeline, verify the output format, or show stakeholders what the data looks like before running a live scrape.
Q: Can I scrape any university โ not just the ones listed?
A: Yes. Add any public university course catalog URL to university_urls. The generic parser handles most modern university websites by detecting JSON-LD schema markup and course code patterns. Success rate depends on how the catalog page is built โ structured data markup gives the best results.
Q: Why does the scraper sometimes return "N/A" for credits or schedule? A: Some university catalog pages do not display credit hours or schedule information on their listing pages โ this varies by university and page type. Credit and schedule data is most reliably extracted from individual course detail pages. Future versions will include deeper detail page extraction.
Q: How accurate is the Course Level detection?
A: Level detection is keyword-based and works well for English-language catalogs. Graduate and PhD level courses are typically clearly labeled in their titles or descriptions. For catalogs that use numeric course levels (e.g. 100โ400 level) without text labels, the generic "Undergraduate" level is assigned as default.
Q: Can I filter results to only show graduate courses?
A: Not directly via input โ but the Level field is returned for every course. After the run, you can filter the JSON/CSV output to keep only records where Level == "Graduate" or Level == "PhD". A direct level filter input parameter is planned for a future version.
Q: Can I use this to scrape online course platforms like Coursera or edX? A: No โ this scraper is designed specifically for university course catalog pages (formal academic offerings). Online learning platforms like Coursera, edX, and Udemy have their own different structures and are not supported by this actor.
Q: What does the 2-hour free trial include? A: Everything โ Demo Mode, all supported universities, custom URL support, keyword and department filtering, full JSON output. After 2 hours, subscribe for $7.99/month to continue with unlimited runs.
Q: Can I export results to Excel or Google Sheets? A: Yes โ download the dataset as CSV from the Apify Output tab (opens correctly in Excel) or use Apify's Google Sheets integration to push results automatically after each run.
Q: What happens if a university changes their catalog URL or page structure? A: The generic parser handles most structural changes through JSON-LD detection and course code pattern matching. For significant redesigns of known universities (Harvard, MIT, Stanford), the actor is updated to maintain compatibility. Report broken URLs via the actor page.
๐ Changelog
v1.0.0 (Current)
- โ Harvard University catalog parser โ JSON-LD + HTML table fallback
- โ MIT OpenCourseWare parser โ course card extraction + URL slug-to-code conversion
- โ Stanford Online parser โ generic + JSON-LD
- โ Yale Open Courses support
- โ Cornell University course catalog support
- โ Generic universal parser โ JSON-LD โ course code regex โ HTML structure (any public university)
- โ Demo Mode โ 10 real pre-loaded courses from 5 universities, instant (no scraping)
- โ Auto university name detection from 15+ university domains
- โ Course level auto-detection (Undergraduate / Advanced Undergraduate / Graduate / PhD)
- โ Keyword filter โ applied during extraction
- โ Department filter โ applied after extraction
- โ Automatic fallback to Demo Mode when live scraping returns no results
- โ Credit hour extraction with unit format normalization
- โ Instructor name extraction from structured data and label patterns
- โ Syllabus link extraction for every course record
โ๏ธ Legal & Terms of Use
This university course catalog scraper accesses publicly available course catalog information from university websites โ the same data visible to anyone browsing these pages in a browser without an account.
Usage guidelines:
- Use extracted course data for legitimate education research, platform development, academic analysis, and business intelligence purposes
- Respect each university's Terms of Service regarding automated access to their websites
- Course catalog data (titles, codes, descriptions, instructor names) is factual public information โ its use for research and aggregation is broadly accepted
- Do not use this tool to scrape gated course content, student records, or any information requiring authentication
- Syllabus links and course pages belong to their respective universities โ always attribute the source appropriately when displaying course data
๐ค Support & Feedback
- Bug or broken catalog? Contact via the Apify actor page โ we fix broken parsers fast
- Need a specific university added? Request via the Apify Community forum or actor page
- Loving it? Please leave a โญ review โ it helps other educators and researchers find this university course catalog scraper!
๐ 2-hour free trial โ ๐ณ $7.99/month after โ the most complete university course catalog scraper available on the Apify platform.
Built with โค๏ธ on Apify ยท University Course Catalog Scraper โ Harvard, MIT, Stanford & More
Extract real course data from any university catalog โ structured, clean, and ready to use