edX Course Scraper
Pricing
from $3.00 / 1,000 results
edX Course Scraper
Scrape edX - the world's leading MOOC platform. Search courses, browse by subject or university, fetch specific course URLs. Extracts title, institution, level, duration, effort, pricing, enrollment count, rating, skills, and more.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Crawler Bros
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
Scrape online courses from edX — one of the world's leading MOOC platforms featuring courses from MIT, Harvard, Stanford, Berkeley, IBM, Microsoft, and 200+ top institutions worldwide. Acquired by 2U, edX hosts thousands of courses, professional certificates, MicroMasters programs, and degree programs across dozens of subjects.
Features
- Search mode — find courses by keyword (e.g., "python programming", "machine learning", "data science")
- Browse by subject — get all courses for a specific topic (computer science, data analysis, AI, finance, etc.)
- Browse by university — get all courses from a specific institution (harvardx, mitx, berkeleyx, ibm, etc.)
- Single course URL — scrape one or more specific edX course pages for complete details
- Filters: level (Introductory / Intermediate / Advanced), course type (Course / Professional Certificate / MicroMasters / etc.), free-only, language
- Output includes: title, institution, subject, level, duration, effort, pricing, enrollment count, skills, availability, pacing, image URL, course URL, and more
Input
| Field | Type | Description |
|---|---|---|
mode | select | search (default), bySubject, byUniversity, byUrl |
searchQuery | string | Text search query (mode=search) |
subject | select | Subject slug — e.g. computer-science, data-analysis (mode=bySubject) |
universitySlug | string | University slug — e.g. harvardx, mitx, ibm (mode=byUniversity) |
courseUrls | array | List of edX course URLs to scrape (mode=byUrl) |
filterLevel | select | Filter by level: Introductory / Intermediate / Advanced |
filterCourseType | select | Filter by type: Course / Professional Certificate / MicroMasters / etc. |
filterIsFree | boolean | Only return free courses |
filterLanguage | string | ISO 639-1 language code (e.g. en, es, fr) |
maxItems | integer | Max number of records to return (1–1000, default 50) |
Example Inputs
Search for Python courses:
{"mode": "search","searchQuery": "python programming","maxItems": 20}
Browse CS courses from Harvard:
{"mode": "byUniversity","universitySlug": "harvardx","filterCourseType": "Course","maxItems": 50}
Browse AI/ML courses:
{"mode": "bySubject","subject": "artificial-intelligence","filterLevel": "Intermediate","maxItems": 30}
Output
Each record contains:
| Field | Type | Description |
|---|---|---|
courseId | string | edX product UUID |
title | string | Course title |
description | string | Full course description (HTML stripped) |
institution | string | Primary university/institution name |
institutions | array | All institution names (for multi-partner courses) |
subject | string | Primary subject |
subjects | array | All subject categories |
level | string | Introductory / Intermediate / Advanced |
durationWeeks | integer | Total duration in weeks |
durationWeeksMin | integer | Minimum weeks to complete |
durationWeeksMax | integer | Maximum weeks to complete |
effortHoursPerWeekMin | number | Min hours per week effort |
effortHoursPerWeekMax | number | Max hours per week effort |
pacing | string | self_paced or instructor_paced |
language | string | Primary language |
availability | string | Current / Archived / etc. |
isFree | boolean | Whether the course is free to audit |
price | number | Full price in USD |
originalPrice | number | Original price before discount |
currency | string | Currency code (e.g. USD) |
enrollmentCount | integer | Recent enrollment count |
skills | array | Skills covered in the course |
courseType | string | Course / Professional Certificate / MicroMasters / etc. |
imageUrl | string | Course thumbnail URL |
courseUrl | string | Full edX course URL |
startDate | string | Go-live date (ISO format) |
productSource | string | edX |
sourceUrl | string | URL that was scraped |
scrapedAt | string | UTC timestamp of scrape |
recordType | string | Always "course" |
Sample Output
{"courseId": "b3c02aea-cbf6-4fc4-a730-0433860e2a35","title": "Python for Data Science","description": "Learn to use powerful, open-source, Python tools including Pandas, Git and Matplotlib...","institution": "The University of California, San Diego","subject": "Computer Science","subjects": ["Computer Science", "Data Analysis & Statistics"],"level": "Intermediate","durationWeeks": 10,"effortHoursPerWeekMin": 3,"effortHoursPerWeekMax": 5,"pacing": "self_paced","language": "English","availability": "Current","isFree": false,"price": 149.0,"currency": "USD","enrollmentCount": 333911,"skills": ["Python (Programming Language)", "Data Science", "Pandas"],"courseType": "Course","imageUrl": "https://prod-discovery.edx-cdn.org/media/course/image/b3c02aea.jpg","courseUrl": "https://www.edx.org/learn/python/the-university-of-california-san-diego-python-for-data-science","sourceUrl": "https://www.edx.org/learn/python","scrapedAt": "2026-06-06T12:00:00+00:00","recordType": "course"}
FAQ
What subjects are supported? All edX subjects are supported including Computer Science, Data Analysis, AI, Business & Management, Engineering, Healthcare, Math, Physics, Social Sciences, Language, and more.
Can I get courses by a specific university?
Yes — use mode=byUniversity with a slug like harvardx, mitx, berkeleyx, microsoft, ibm, googlecloud, etc.
Are free courses available?
Many edX courses are free to audit. Use filterIsFree: true to return only free courses.
Does this include boot camps and degree programs?
Yes — edX hosts Boot Camps, MicroBachelors, MicroMasters, Professional Certificates, and full Degrees. Use filterCourseType to narrow down.
What is the daily test prefill?
{"mode": "search", "searchQuery": "python programming", "maxItems": 5}
Data Source
edX embeds course data in its Next.js server-rendered HTML pages (RSC JSON payloads). This actor parses that embedded data without requiring authentication or API keys. Data is publicly accessible and does not require a proxy.