EdX Course Scraper πŸŽ“ avatar
EdX Course Scraper πŸŽ“

Pricing

Pay per usage

Go to Apify Store
EdX Course Scraper πŸŽ“

EdX Course Scraper πŸŽ“

Power your edtech insights with this ultimate EdX Course Scraper. Instantly extract detailed online course data, including syllabi, instructors, pricing, and reviews. Perfect for e-learning aggregators and market researchers. Streamline your education data collection today!

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

edX Course Data Scraper

Extract comprehensive course data from edX quickly and reliably. Collect titles, organizations, durations, levels, languages, and more at scale. Ideal for research, analysis, and monitoring educational offerings over time.

Features

  • Keyword search β€” Gather course data by topic or skill
  • Language filtering β€” Focus on specific languages as needed
  • Automatic pagination β€” Collect large result sets without manual paging
  • Structured output β€” Ready for analytics, dashboards, and exports
  • Scalable collection β€” Designed for small tests or large datasets

Use Cases

Market Research

Track trends across subjects, providers, and course levels to understand demand and gaps in the catalog.

Curriculum Planning

Compare offerings across institutions to design programs that align with current learning needs.

Competitive Intelligence

Monitor course listings over time and identify shifts in topics, focus areas, and availability.

Data Analysis

Build datasets for BI, reporting, or custom data science workflows.


Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlStringNoβ€”Optional starting search URL
keywordStringNo"python"Search keyword for courses
languageStringNo"none"Filter courses by language
results_wantedIntegerNo20Maximum number of courses to collect
max_pagesIntegerNo10Safety cap on pages to visit
proxyConfigurationObjectNo{ "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }Proxy settings

Output Data

Each item in the dataset contains:

FieldTypeDescription
titleStringCourse title
urlStringCourse URL
organizationStringInstitution or organization
levelArrayCourse level(s)
languageArrayCourse language(s)
short_descriptionStringShort summary
image_urlStringCourse image URL
subjectsArraySubject areas
weeks_to_completeNumberEstimated duration in weeks
min_effortNumberMinimum effort per week
max_effortNumberMaximum effort per week
enrollment_countNumberRecent enrollment count
marketing_urlStringPublic course link
external_urlStringExternal course link if available
partnerArrayPartner organization names
partner_keysArrayPartner identifiers
program_typeArrayProgram types
learning_typeArrayLearning types
tagsArrayTags or keywords
skillsArraySkills associated with the course

Usage Examples

Basic Extraction

{
"keyword": "data science",
"results_wanted": 50
}

Language-Specific Collection

{
"keyword": "python",
"language": "English",
"results_wanted": 100
}

Large Dataset with Safety Cap

{
"keyword": "business",
"results_wanted": 200,
"max_pages": 5
}

Sample Output

{
"title": "CS50's Introduction to Programming with Python",
"url": "https://www.edx.org/learn/python/harvard-university-cs50-s-introduction-to-programming-with-python",
"organization": "Harvard University",
"level": ["Introductory"],
"language": ["English"],
"short_description": "An introduction to programming using Python, a popular language for general-purpose programming, data science, web programming, and more.",
"image_url": "https://prod-discovery.edx-cdn.org/media/course/image/2cc794d0-316d-42f7-bbfd-25c34e4cd5df-033e46d516c0.png",
"subjects": ["Computer Science", "Data Analysis & Statistics"],
"weeks_to_complete": 10,
"min_effort": 3,
"max_effort": 9,
"enrollment_count": 147611,
"marketing_url": "https://www.edx.org/learn/python/harvard-university-cs50-s-introduction-to-programming-with-python",
"external_url": null,
"partner": ["Harvard University"],
"partner_keys": ["HarvardX"],
"program_type": ["Professional Certificate", "Professional Certificate"],
"learning_type": ["Course", "Professional Certificate", "Professional Certificate"],
"tags": [],
"skills": [
{ "skill": "Debugging", "category": "Information Technology", "subcategory": "Software Development" }
]
}

Tips for Best Results

Choose Clear Keywords

  • Use specific topics for more relevant results
  • Try variations of terms for broader coverage

Optimize Collection Size

  • Start with 20–50 items for testing
  • Increase results_wanted for larger datasets

Use Proxies for Stability

  • Residential proxies improve reliability on large runs

Integrations

Connect your data with:

  • Google Sheets β€” Export for analysis
  • Airtable β€” Build searchable databases
  • Slack β€” Get notifications
  • Webhooks β€” Send to custom endpoints
  • Make β€” Create automated workflows
  • Zapier β€” Trigger actions

Export Formats

  • JSON β€” For developers and APIs
  • CSV β€” For spreadsheet analysis
  • Excel β€” For business reporting
  • XML β€” For system integrations

Frequently Asked Questions

How many courses can I collect?

You can collect all available results. The practical limit depends on your requested size and run duration.

Can I filter by language?

Yes, use the language parameter to target specific languages.

Does the actor handle pagination automatically?

Yes, pagination is handled automatically up to your requested limit.

What if some fields are missing?

Some courses may not provide all fields, so empty values are possible.

Can I run this on a schedule?

Yes, schedule runs in the Apify Console for continuous monitoring.


Support

For issues or feature requests, contact support through the Apify Console.

Resources


This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with website terms of service and applicable laws. Use data responsibly and respect rate limits.