edX Course Scraper avatar

edX Course Scraper

Pricing

Pay per usage

Go to Apify Store
edX Course Scraper

edX Course Scraper

Scrape courses from edX. Extract title, provider, price, instructors, subjects, level, and more. Export to JSON, CSV, Excel.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Glass Ventures

Glass Ventures

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Scrape course data from edX, the leading online learning platform. Extract titles, providers, prices, instructors, subjects, difficulty levels, and more.

What does edX Course Scraper do?

edX Course Scraper extracts structured data from edX's online course catalog. It uses edX's public catalog APIs to efficiently gather comprehensive course information without needing a browser.

The actor supports searching by keywords, filtering by subjects, and scraping specific course URLs. It automatically handles pagination and deduplication, making it easy to build datasets of thousands of courses.

Whether you're researching the online education market, comparing course offerings across universities, or building a course recommendation system, this actor provides the data you need in a clean, structured format.

Use Cases

  • Market researchers -- analyze the online education landscape, track course offerings and pricing trends across universities
  • Data analysts -- build datasets of courses for analysis, compare subjects, pricing, and enrollment across providers
  • EdTech companies -- monitor competitor course offerings, identify gaps in the market
  • Developers -- integrate edX course data into apps, build course comparison tools or recommendation engines

Features

  • Search courses by keyword or subject area
  • Scrape specific edX course URLs directly
  • Extract rich metadata: title, provider, price, instructors, level, duration, effort
  • Automatic pagination through large result sets
  • Deduplication of courses across multiple searches
  • Proxy support with automatic rotation
  • Handles pagination and large datasets automatically
  • Exports to JSON, CSV, Excel, or connect via API

How much will it cost?

ResultsEstimated Cost
100~$0.10
1,000~$0.50
10,000~$3.00
Cost ComponentPer 1,000 Results
Platform compute~$0.25
Proxy (datacenter)~$0.25
Total~$0.50

edX has a public API, so scraping is very efficient and inexpensive. Datacenter proxies work well.

How to use

  1. Go to the edX Course Scraper page on Apify Store
  2. Click "Start" or "Try for free"
  3. Enter search terms (e.g., "machine learning") or edX course URLs
  4. Set the maximum number of courses to scrape
  5. Click "Start" and wait for the results

Input parameters

ParameterTypeDescriptionDefault
startUrlsarrayedX course URLs to scrape-
searchTermsarraySearch queries to find courses-
subjectsarrayFilter by subject area-
maxItemsnumberMax courses to return100
maxConcurrencynumberParallel requests10
proxyConfigobjectProxy settingsApify Proxy

Output

The actor produces a dataset with the following fields:

{
"url": "https://www.edx.org/learn/machine-learning/stanford-university-machine-learning",
"title": "Machine Learning",
"provider": "Stanford University",
"shortDescription": "Learn about the most effective machine learning techniques...",
"fullDescription": "This course provides a broad introduction to machine learning...",
"subjects": ["Computer Science", "Data Analysis & Statistics"],
"level": "Intermediate",
"language": "en-us",
"startDate": "2024-01-15T00:00:00Z",
"endDate": null,
"enrollmentCount": 4500000,
"price": 79,
"isFree": true,
"instructors": ["Andrew Ng"],
"imageUrl": "https://prod-discovery.edx-cdn.org/media/course/image/...",
"duration": "11 weeks",
"effort": "5-7 hours per week",
"scrapedAt": "2024-01-15T10:30:00.000Z"
}
FieldTypeDescription
urlstringCourse page URL
titlestringCourse title
providerstringUniversity or institution
shortDescriptionstringBrief course description
fullDescriptionstringDetailed course description
subjectsarraySubject areas
levelstringIntroductory, Intermediate, or Advanced
languagestringCourse language code
startDatestringCourse start date (ISO 8601)
endDatestringCourse end date (ISO 8601)
enrollmentCountnumberNumber of enrolled students
pricenumberPrice in USD for verified certificate
isFreebooleanWhether audit track is free
instructorsarrayList of instructor names
imageUrlstringCourse thumbnail image
durationstringCourse duration (e.g., "6 weeks")
effortstringWeekly time commitment
scrapedAtstringISO 8601 scrape timestamp

Integrations

Connect edX Course Scraper with other tools:

  • Apify API -- REST API for programmatic access
  • Webhooks -- get notified when a run finishes
  • Zapier / Make -- connect to 5,000+ apps
  • Google Sheets -- export directly to spreadsheets

API Example (Node.js)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('YOUR_USERNAME/edx-course-scraper').call({
searchTerms: ['machine learning', 'python programming'],
maxItems: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

API Example (Python)

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('YOUR_USERNAME/edx-course-scraper').call(run_input={
'searchTerms': ['machine learning', 'python programming'],
'maxItems': 100,
})
items = client.dataset(run['defaultDatasetId']).list_items().items

API Example (cURL)

curl "https://api.apify.com/v2/acts/YOUR_USERNAME~edx-course-scraper/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"searchTerms": ["machine learning"], "maxItems": 100}'

Tips and tricks

  • Start with a small maxItems (10-20) to test before running large scrapes
  • edX's API is public and fast -- datacenter proxies work well, no need for residential
  • Use searchTerms for broad discovery and startUrls for specific courses
  • The actor tries the discovery API first for richer data (instructors, price, enrollment count)

FAQ

Q: Does this actor require login credentials? A: No. edX has public catalog APIs that don't require authentication.

Q: How fast is the scraping? A: Approximately 50-200 courses per minute depending on API response times and concurrency settings.

Q: What should I do if I get blocked? A: Switch to residential proxies in the Proxy Configuration settings. However, edX's API rarely blocks requests.

Q: Does this scrape course content/videos? A: No. This actor only scrapes course metadata (title, description, instructors, etc.), not the actual course content.

Web scraping of publicly available data is generally legal based on precedents like the LinkedIn v. HiQ Labs case. This actor only accesses publicly available API endpoints that do not require authentication. Always review and respect the target site's Terms of Service and robots.txt. For more information, see Apify's blog on web scraping legality.

Limitations

  • Course enrollment counts may not be available for all courses
  • Some course details (full description, instructors) depend on the discovery API being accessible
  • The actor extracts metadata only, not course content or video materials
  • Pricing information reflects the verified certificate track; audit is typically free

Changelog

  • v0.1 (2026-04-23) -- Initial release