Pricing

Pay per usage

edX Course Scraper

Scrape courses from edX. Extract title, provider, price, instructors, subjects, level, and more. Export to JSON, CSV, Excel.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Glass Ventures

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What does edX Course Scraper do?

edX Course Scraper extracts structured data from edX's online course catalog. It uses edX's public catalog APIs to efficiently gather comprehensive course information without needing a browser.

The actor supports searching by keywords, filtering by subjects, and scraping specific course URLs. It automatically handles pagination and deduplication, making it easy to build datasets of thousands of courses.

Whether you're researching the online education market, comparing course offerings across universities, or building a course recommendation system, this actor provides the data you need in a clean, structured format.

Use Cases

Market researchers -- analyze the online education landscape, track course offerings and pricing trends across universities
Data analysts -- build datasets of courses for analysis, compare subjects, pricing, and enrollment across providers
EdTech companies -- monitor competitor course offerings, identify gaps in the market
Developers -- integrate edX course data into apps, build course comparison tools or recommendation engines

Features

Search courses by keyword or subject area
Scrape specific edX course URLs directly
Extract rich metadata: title, provider, price, instructors, level, duration, effort
Automatic pagination through large result sets
Deduplication of courses across multiple searches
Proxy support with automatic rotation
Handles pagination and large datasets automatically
Exports to JSON, CSV, Excel, or connect via API

How much will it cost?

Results	Estimated Cost
100	~$0.10
1,000	~$0.50
10,000	~$3.00

Cost Component	Per 1,000 Results
Platform compute	~$0.25
Proxy (datacenter)	~$0.25
Total	~$0.50

edX has a public API, so scraping is very efficient and inexpensive. Datacenter proxies work well.

How to use

Go to the edX Course Scraper page on Apify Store
Click "Start" or "Try for free"
Enter search terms (e.g., "machine learning") or edX course URLs
Set the maximum number of courses to scrape
Click "Start" and wait for the results

Input parameters

Parameter	Type	Description	Default
startUrls	array	edX course URLs to scrape	-
searchTerms	array	Search queries to find courses	-
subjects	array	Filter by subject area	-
maxItems	number	Max courses to return	100
maxConcurrency	number	Parallel requests	10
proxyConfig	object	Proxy settings	Apify Proxy

Output

The actor produces a dataset with the following fields:

{
    "url": "https://www.edx.org/learn/machine-learning/stanford-university-machine-learning",
    "title": "Machine Learning",
    "provider": "Stanford University",
    "shortDescription": "Learn about the most effective machine learning techniques...",
    "fullDescription": "This course provides a broad introduction to machine learning...",
    "subjects": ["Computer Science", "Data Analysis & Statistics"],
    "level": "Intermediate",
    "language": "en-us",
    "startDate": "2024-01-15T00:00:00Z",
    "endDate": null,
    "enrollmentCount": 4500000,
    "price": 79,
    "isFree": true,
    "instructors": ["Andrew Ng"],
    "imageUrl": "https://prod-discovery.edx-cdn.org/media/course/image/...",
    "duration": "11 weeks",
    "effort": "5-7 hours per week",
    "scrapedAt": "2024-01-15T10:30:00.000Z"
}

Field	Type	Description
url	string	Course page URL
title	string	Course title
provider	string	University or institution
shortDescription	string	Brief course description
fullDescription	string	Detailed course description
subjects	array	Subject areas
level	string	Introductory, Intermediate, or Advanced
language	string	Course language code
startDate	string	Course start date (ISO 8601)
endDate	string	Course end date (ISO 8601)
enrollmentCount	number	Number of enrolled students
price	number	Price in USD for verified certificate
isFree	boolean	Whether audit track is free
instructors	array	List of instructor names
imageUrl	string	Course thumbnail image
duration	string	Course duration (e.g., "6 weeks")
effort	string	Weekly time commitment
scrapedAt	string	ISO 8601 scrape timestamp

Integrations

Connect edX Course Scraper with other tools:

Apify API -- REST API for programmatic access
Webhooks -- get notified when a run finishes
Zapier / Make -- connect to 5,000+ apps
Google Sheets -- export directly to spreadsheets

API Example (Node.js)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('YOUR_USERNAME/edx-course-scraper').call({
    searchTerms: ['machine learning', 'python programming'],
    maxItems: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

API Example (Python)

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('YOUR_USERNAME/edx-course-scraper').call(run_input={
    'searchTerms': ['machine learning', 'python programming'],
    'maxItems': 100,
})
items = client.dataset(run['defaultDatasetId']).list_items().items

API Example (cURL)

curl "https://api.apify.com/v2/acts/YOUR_USERNAME~edx-course-scraper/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"searchTerms": ["machine learning"], "maxItems": 100}'

Tips and tricks

Start with a small maxItems (10-20) to test before running large scrapes
edX's API is public and fast -- datacenter proxies work well, no need for residential
Use searchTerms for broad discovery and startUrls for specific courses
The actor tries the discovery API first for richer data (instructors, price, enrollment count)

FAQ

Q: Does this actor require login credentials? A: No. edX has public catalog APIs that don't require authentication.

Q: How fast is the scraping? A: Approximately 50-200 courses per minute depending on API response times and concurrency settings.

Q: What should I do if I get blocked? A: Switch to residential proxies in the Proxy Configuration settings. However, edX's API rarely blocks requests.

Q: Does this scrape course content/videos? A: No. This actor only scrapes course metadata (title, description, instructors, etc.), not the actual course content.

Is it legal to scrape edX?

Web scraping of publicly available data is generally legal based on precedents like the LinkedIn v. HiQ Labs case. This actor only accesses publicly available API endpoints that do not require authentication. Always review and respect the target site's Terms of Service and robots.txt. For more information, see Apify's blog on web scraping legality.

Coursera Course Scraper -- Scrape courses from Coursera
Udemy Course Scraper -- Scrape courses from Udemy

Limitations

Course enrollment counts may not be available for all courses
Some course details (full description, instructors) depend on the discovery API being accessible
The actor extracts metadata only, not course content or video materials
Pricing information reflects the verified certificate track; audit is typically free

Changelog

v0.1 (2026-04-23) -- Initial release

EDX Discovery Scraper

getdataforme/edx-discovery-scraper

The EDX Discovery Scraper extracts detailed course data from EDX, including descriptions, pricing, and organization info, aiding market research and competitive analysis....

GetDataForMe

edX Courses Scraper - Online Course Data

benthepythondev/edx-courses-scraper

Scrape edX course search pages and extract course names, URLs, providers, descriptions and online-learning metadata.

Ben

edX Course Scraper

crawlerbros/edx-scraper

Scrape edX - the world's leading MOOC platform. Search courses, browse by subject or university, fetch specific course URLs. Extracts title, institution, level, duration, effort, pricing, enrollment count, rating, skills, and more.

Crawler Bros

edX Online Course Data Extractor

epctex/edx-scraper

Effortlessly scrape thousands of online courses from edX. Extract titles, images, details, owners, and all other course details. Customize your search with filters like language and more for precise results.

epctex

164

5.0

Edx Allcourse Details Spider

getdataforme/Edx-allcourse-details-spider

The Edx Allcourse Details Spider is an Apify Actor that scrapes comprehensive details on all edX courses, including titles, descriptions, partners, subjects, levels, and skills....

GetDataForMe

EdX Course Scraper 🎓

shahidirfan/edx-course-scraper

Power your edtech insights with this ultimate EdX Course Scraper. Instantly extract detailed online course data, including syllabi, instructors, pricing, and reviews. Perfect for e-learning aggregators and market researchers. Streamline your education data collection today!

Shahid Irfan

edX Scraper | University Courses and Programs

parseforge/edx-scraper

Extract edX course catalog data including title, university, instructors, level, duration, price, language, subject, prerequisites, and full description. Track MicroMasters, professional certificates, and degree programs for education analytics, lead generation, and market research.

ParseForge

Online Course Scraper — edX + Coursera, Multi-Platform

commanding_hotdog/online-course-scraper

Scrape online courses from edX & Coursera — prices, reviews, skills, duration. 20+ fields. Course comparison, MOOC research, education market analysis. Free API.

qingwa

edX Scraper | All In One | $3 / 1k

fatihtahta/edx-scraper

Scrape edX into clean, structured course and program data. Capture titles, partners, descriptions, skills, level, language, pacing, duration, availability and enrollment signals. Perfect for curriculum research, catalog building, market analysis and competitive tracking.

Fatih Tahta

Class Central Scraper

crawlerbros/class-central-scraper

Scrape Class Central's course meta-search index - search or browse courses from Coursera, edX, Udemy, YouTube and 100+ providers. Get ratings, reviews, level, language, pricing, syllabus, and provider detail. No login required.

Crawler Bros