edX Course Scraper
Pricing
Pay per usage
edX Course Scraper
Scrape courses from edX. Extract title, provider, price, instructors, subjects, level, and more. Export to JSON, CSV, Excel.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Glass Ventures
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Scrape course data from edX, the leading online learning platform. Extract titles, providers, prices, instructors, subjects, difficulty levels, and more.
What does edX Course Scraper do?
edX Course Scraper extracts structured data from edX's online course catalog. It uses edX's public catalog APIs to efficiently gather comprehensive course information without needing a browser.
The actor supports searching by keywords, filtering by subjects, and scraping specific course URLs. It automatically handles pagination and deduplication, making it easy to build datasets of thousands of courses.
Whether you're researching the online education market, comparing course offerings across universities, or building a course recommendation system, this actor provides the data you need in a clean, structured format.
Use Cases
- Market researchers -- analyze the online education landscape, track course offerings and pricing trends across universities
- Data analysts -- build datasets of courses for analysis, compare subjects, pricing, and enrollment across providers
- EdTech companies -- monitor competitor course offerings, identify gaps in the market
- Developers -- integrate edX course data into apps, build course comparison tools or recommendation engines
Features
- Search courses by keyword or subject area
- Scrape specific edX course URLs directly
- Extract rich metadata: title, provider, price, instructors, level, duration, effort
- Automatic pagination through large result sets
- Deduplication of courses across multiple searches
- Proxy support with automatic rotation
- Handles pagination and large datasets automatically
- Exports to JSON, CSV, Excel, or connect via API
How much will it cost?
| Results | Estimated Cost |
|---|---|
| 100 | ~$0.10 |
| 1,000 | ~$0.50 |
| 10,000 | ~$3.00 |
| Cost Component | Per 1,000 Results |
|---|---|
| Platform compute | ~$0.25 |
| Proxy (datacenter) | ~$0.25 |
| Total | ~$0.50 |
edX has a public API, so scraping is very efficient and inexpensive. Datacenter proxies work well.
How to use
- Go to the edX Course Scraper page on Apify Store
- Click "Start" or "Try for free"
- Enter search terms (e.g., "machine learning") or edX course URLs
- Set the maximum number of courses to scrape
- Click "Start" and wait for the results
Input parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| startUrls | array | edX course URLs to scrape | - |
| searchTerms | array | Search queries to find courses | - |
| subjects | array | Filter by subject area | - |
| maxItems | number | Max courses to return | 100 |
| maxConcurrency | number | Parallel requests | 10 |
| proxyConfig | object | Proxy settings | Apify Proxy |
Output
The actor produces a dataset with the following fields:
{"url": "https://www.edx.org/learn/machine-learning/stanford-university-machine-learning","title": "Machine Learning","provider": "Stanford University","shortDescription": "Learn about the most effective machine learning techniques...","fullDescription": "This course provides a broad introduction to machine learning...","subjects": ["Computer Science", "Data Analysis & Statistics"],"level": "Intermediate","language": "en-us","startDate": "2024-01-15T00:00:00Z","endDate": null,"enrollmentCount": 4500000,"price": 79,"isFree": true,"instructors": ["Andrew Ng"],"imageUrl": "https://prod-discovery.edx-cdn.org/media/course/image/...","duration": "11 weeks","effort": "5-7 hours per week","scrapedAt": "2024-01-15T10:30:00.000Z"}
| Field | Type | Description |
|---|---|---|
| url | string | Course page URL |
| title | string | Course title |
| provider | string | University or institution |
| shortDescription | string | Brief course description |
| fullDescription | string | Detailed course description |
| subjects | array | Subject areas |
| level | string | Introductory, Intermediate, or Advanced |
| language | string | Course language code |
| startDate | string | Course start date (ISO 8601) |
| endDate | string | Course end date (ISO 8601) |
| enrollmentCount | number | Number of enrolled students |
| price | number | Price in USD for verified certificate |
| isFree | boolean | Whether audit track is free |
| instructors | array | List of instructor names |
| imageUrl | string | Course thumbnail image |
| duration | string | Course duration (e.g., "6 weeks") |
| effort | string | Weekly time commitment |
| scrapedAt | string | ISO 8601 scrape timestamp |
Integrations
Connect edX Course Scraper with other tools:
- Apify API -- REST API for programmatic access
- Webhooks -- get notified when a run finishes
- Zapier / Make -- connect to 5,000+ apps
- Google Sheets -- export directly to spreadsheets
API Example (Node.js)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('YOUR_USERNAME/edx-course-scraper').call({searchTerms: ['machine learning', 'python programming'],maxItems: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();
API Example (Python)
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('YOUR_USERNAME/edx-course-scraper').call(run_input={'searchTerms': ['machine learning', 'python programming'],'maxItems': 100,})items = client.dataset(run['defaultDatasetId']).list_items().items
API Example (cURL)
curl "https://api.apify.com/v2/acts/YOUR_USERNAME~edx-course-scraper/runs" \-X POST \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_TOKEN" \-d '{"searchTerms": ["machine learning"], "maxItems": 100}'
Tips and tricks
- Start with a small
maxItems(10-20) to test before running large scrapes - edX's API is public and fast -- datacenter proxies work well, no need for residential
- Use
searchTermsfor broad discovery andstartUrlsfor specific courses - The actor tries the discovery API first for richer data (instructors, price, enrollment count)
FAQ
Q: Does this actor require login credentials? A: No. edX has public catalog APIs that don't require authentication.
Q: How fast is the scraping? A: Approximately 50-200 courses per minute depending on API response times and concurrency settings.
Q: What should I do if I get blocked? A: Switch to residential proxies in the Proxy Configuration settings. However, edX's API rarely blocks requests.
Q: Does this scrape course content/videos? A: No. This actor only scrapes course metadata (title, description, instructors, etc.), not the actual course content.
Is it legal to scrape edX?
Web scraping of publicly available data is generally legal based on precedents like the LinkedIn v. HiQ Labs case. This actor only accesses publicly available API endpoints that do not require authentication. Always review and respect the target site's Terms of Service and robots.txt. For more information, see Apify's blog on web scraping legality.
Related Actors
- Coursera Course Scraper -- Scrape courses from Coursera
- Udemy Course Scraper -- Scrape courses from Udemy
Limitations
- Course enrollment counts may not be available for all courses
- Some course details (full description, instructors) depend on the discovery API being accessible
- The actor extracts metadata only, not course content or video materials
- Pricing information reflects the verified certificate track; audit is typically free
Changelog
- v0.1 (2026-04-23) -- Initial release