EdX Course Scraper π
Pricing
Pay per usage
EdX Course Scraper π
Power your edtech insights with this ultimate EdX Course Scraper. Instantly extract detailed online course data, including syllabi, instructors, pricing, and reviews. Perfect for e-learning aggregators and market researchers. Streamline your education data collection today!
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
edX Course Data Scraper
Extract comprehensive course data from edX quickly and reliably. Collect titles, organizations, durations, levels, languages, and more at scale. Ideal for research, analysis, and monitoring educational offerings over time.
Features
- Keyword search β Gather course data by topic or skill
- Language filtering β Focus on specific languages as needed
- Automatic pagination β Collect large result sets without manual paging
- Structured output β Ready for analytics, dashboards, and exports
- Scalable collection β Designed for small tests or large datasets
Use Cases
Market Research
Track trends across subjects, providers, and course levels to understand demand and gaps in the catalog.
Curriculum Planning
Compare offerings across institutions to design programs that align with current learning needs.
Competitive Intelligence
Monitor course listings over time and identify shifts in topics, focus areas, and availability.
Data Analysis
Build datasets for BI, reporting, or custom data science workflows.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrl | String | No | β | Optional starting search URL |
keyword | String | No | "python" | Search keyword for courses |
language | String | No | "none" | Filter courses by language |
results_wanted | Integer | No | 20 | Maximum number of courses to collect |
max_pages | Integer | No | 10 | Safety cap on pages to visit |
proxyConfiguration | Object | No | { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] } | Proxy settings |
Output Data
Each item in the dataset contains:
| Field | Type | Description |
|---|---|---|
title | String | Course title |
url | String | Course URL |
organization | String | Institution or organization |
level | Array | Course level(s) |
language | Array | Course language(s) |
short_description | String | Short summary |
image_url | String | Course image URL |
subjects | Array | Subject areas |
weeks_to_complete | Number | Estimated duration in weeks |
min_effort | Number | Minimum effort per week |
max_effort | Number | Maximum effort per week |
enrollment_count | Number | Recent enrollment count |
marketing_url | String | Public course link |
external_url | String | External course link if available |
partner | Array | Partner organization names |
partner_keys | Array | Partner identifiers |
program_type | Array | Program types |
learning_type | Array | Learning types |
tags | Array | Tags or keywords |
skills | Array | Skills associated with the course |
Usage Examples
Basic Extraction
{"keyword": "data science","results_wanted": 50}
Language-Specific Collection
{"keyword": "python","language": "English","results_wanted": 100}
Large Dataset with Safety Cap
{"keyword": "business","results_wanted": 200,"max_pages": 5}
Sample Output
{"title": "CS50's Introduction to Programming with Python","url": "https://www.edx.org/learn/python/harvard-university-cs50-s-introduction-to-programming-with-python","organization": "Harvard University","level": ["Introductory"],"language": ["English"],"short_description": "An introduction to programming using Python, a popular language for general-purpose programming, data science, web programming, and more.","image_url": "https://prod-discovery.edx-cdn.org/media/course/image/2cc794d0-316d-42f7-bbfd-25c34e4cd5df-033e46d516c0.png","subjects": ["Computer Science", "Data Analysis & Statistics"],"weeks_to_complete": 10,"min_effort": 3,"max_effort": 9,"enrollment_count": 147611,"marketing_url": "https://www.edx.org/learn/python/harvard-university-cs50-s-introduction-to-programming-with-python","external_url": null,"partner": ["Harvard University"],"partner_keys": ["HarvardX"],"program_type": ["Professional Certificate", "Professional Certificate"],"learning_type": ["Course", "Professional Certificate", "Professional Certificate"],"tags": [],"skills": [{ "skill": "Debugging", "category": "Information Technology", "subcategory": "Software Development" }]}
Tips for Best Results
Choose Clear Keywords
- Use specific topics for more relevant results
- Try variations of terms for broader coverage
Optimize Collection Size
- Start with 20β50 items for testing
- Increase
results_wantedfor larger datasets
Use Proxies for Stability
- Residential proxies improve reliability on large runs
Integrations
Connect your data with:
- Google Sheets β Export for analysis
- Airtable β Build searchable databases
- Slack β Get notifications
- Webhooks β Send to custom endpoints
- Make β Create automated workflows
- Zapier β Trigger actions
Export Formats
- JSON β For developers and APIs
- CSV β For spreadsheet analysis
- Excel β For business reporting
- XML β For system integrations
Frequently Asked Questions
How many courses can I collect?
You can collect all available results. The practical limit depends on your requested size and run duration.
Can I filter by language?
Yes, use the language parameter to target specific languages.
Does the actor handle pagination automatically?
Yes, pagination is handled automatically up to your requested limit.
What if some fields are missing?
Some courses may not provide all fields, so empty values are possible.
Can I run this on a schedule?
Yes, schedule runs in the Apify Console for continuous monitoring.
Support
For issues or feature requests, contact support through the Apify Console.
Resources
Legal Notice
This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with website terms of service and applicable laws. Use data responsibly and respect rate limits.