# MIT OpenCourseWare Scraper (`crawlerbros/mit-open-course-ware-scraper`) Actor

Scrape MIT OpenCourseWare (ocw.mit.edu) - 2,500+ free MIT courses with full metadata: title, department, level, instructors, topics, resource types, descriptions, and image URLs. Search by keyword, browse by department or level, or fetch a single course by URL.

- **URL**: https://apify.com/crawlerbros/mit-open-course-ware-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## MIT OpenCourseWare Scraper

Scrape course metadata from [MIT OpenCourseWare](https://ocw.mit.edu) — the free, open publication of virtually all MIT course content. The actor extracts structured data on 2,500+ courses across all MIT departments, with no authentication or proxy required.

### What You Get

Each course record contains:

| Field | Description |
|---|---|
| `courseId` | MIT course number (e.g. `6-006`, `18-900`) |
| `title` | Full course title |
| `department` | Department name (e.g. `Mathematics`) |
| `departmentUrl` | OCW department search URL |
| `level` | `Undergraduate` or `Graduate` |
| `term` | When taught (e.g. `Spring 2020`) |
| `instructors` | List of instructor names |
| `description` | Course description |
| `topics` | List of topic tags (e.g. `Algorithms and Data Structures`) |
| `imageUrl` | Course thumbnail URL |
| `courseUrl` | Canonical OCW URL |
| `hasLectureVideos` | `true` if lecture videos are available |
| `hasLectureNotes` | `true` if lecture notes are available |
| `resourceTypes` | List of available material types |
| `sourceUrl` | Same as `courseUrl` |
| `scrapedAt` | UTC ISO timestamp |
| `recordType` | Always `course` |

### Modes

#### `search` (default)
Full-text search across all OCW courses by keyword.

**Input:**
```json
{
  "mode": "search",
  "searchQuery": "machine learning",
  "maxItems": 20
}
````

#### `byDepartment`

Browse all courses in a specific MIT department.

**Input:**

```json
{
  "mode": "byDepartment",
  "department": "Mathematics",
  "maxItems": 50
}
```

#### `byLevel`

Browse all Undergraduate or Graduate courses.

**Input:**

```json
{
  "mode": "byLevel",
  "level": "Undergraduate",
  "maxItems": 100
}
```

#### `byUrl`

Fetch a single course by its OCW URL.

**Input:**

```json
{
  "mode": "byUrl",
  "courseUrl": "https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/"
}
```

### Filters (all modes)

| Filter | Type | Description |
|---|---|---|
| `filterDepartment` | string | Substring match on department name |
| `filterLevel` | select | `Undergraduate` or `Graduate` |
| `minYear` | integer | Only courses taught in or after this year |
| `maxYear` | integer | Only courses taught in or before this year |
| `containsKeyword` | string | Keyword must appear in title or description |
| `maxItems` | integer | Maximum records to output (default 50, max 3000) |

### Example Output

```json
{
  "courseId": "6-867",
  "title": "Machine Learning",
  "department": "Electrical Engineering and Computer Science",
  "level": "Graduate",
  "term": "Fall 2006",
  "instructors": ["Prof. Tommi Jaakkola", "Ali Mohammad", "Rohit Singh"],
  "description": "An introductory course on machine learning covering classification, regression, boosting, SVMs, and Bayesian networks.",
  "topics": ["Computer Science", "Artificial Intelligence", "Mathematics"],
  "imageUrl": "https://ocw.mit.edu/courses/6-867-machine-learning-fall-2006/image.jpg",
  "courseUrl": "https://ocw.mit.edu/courses/6-867-machine-learning-fall-2006/",
  "hasLectureVideos": false,
  "hasLectureNotes": true,
  "resourceTypes": ["Lecture Notes", "Problem Sets", "Exams"],
  "sourceUrl": "https://ocw.mit.edu/courses/6-867-machine-learning-fall-2006/",
  "scrapedAt": "2026-06-07T10:00:00+00:00",
  "recordType": "course"
}
```

### Data Source

MIT OpenCourseWare (`ocw.mit.edu`) is a free and open MIT activity that publishes course materials. All data is publicly available with no authentication. The scraper uses the internal Elasticsearch API at `open.mit.edu/api/v0/search/` — the same API the OCW website uses to power its search page.

### FAQs

**Q: Does this require login or API keys?**
A: No. All OCW content is freely public. No credentials needed.

**Q: How many courses are available?**
A: MIT OCW has approximately 2,500 published courses spanning 37 departments.

**Q: Can I filter by topic?**
A: Use `mode=search` with a topic keyword in `searchQuery`, or combine with `containsKeyword` for post-filter matching.

**Q: What departments are available?**
A: All 37 MIT departments including Electrical Engineering & Computer Science, Mathematics, Physics, Economics, Mechanical Engineering, and more. See the `department` input dropdown for the full list.

**Q: Are lecture videos included?**
A: The `hasLectureVideos` field indicates if a course includes lecture videos. `resourceTypes` lists all available material types (lecture notes, problem sets, exams, etc.).

**Q: How fresh is the data?**
A: Data is fetched live from the OCW search API on each run. MIT typically publishes new courses each semester.

**Q: What is the daily test prefill?**
A: `{"mode": "search", "searchQuery": "machine learning", "maxItems": 5}` — always returns results as ML courses are among OCW's most popular.

# Actor input Schema

## `mode` (type: `string`):

What to scrape. 'search' — text search across all courses; 'byDepartment' — all courses in a department; 'byLevel' — filter by course level; 'byUrl' — single course by its OCW URL.

## `searchQuery` (type: `string`):

Free-text keyword search (mode=search). Examples: 'machine learning', 'quantum mechanics', 'algorithms'.

## `department` (type: `string`):

Department name to browse. Use exact name from the dropdown or partial name for substring match.

## `level` (type: `string`):

Course level to browse.

## `courseUrl` (type: `string`):

Direct OCW course URL, e.g. https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/

## `filterDepartment` (type: `string`):

Narrow results to a specific department when using search or byLevel mode (substring match on department name).

## `filterLevel` (type: `string`):

Narrow results to 'Undergraduate' or 'Graduate' when using search or byDepartment mode.

## `minYear` (type: `integer`):

Only include courses taught in or after this year (parsed from the term, e.g. 2010).

## `maxYear` (type: `integer`):

Only include courses taught in or before this year.

## `containsKeyword` (type: `string`):

Case-insensitive substring that must appear in the course title or description.

## `maxItems` (type: `integer`):

Maximum number of courses to output. MIT OCW has ~2,500 courses total.

## Actor input object example

```json
{
  "mode": "search",
  "searchQuery": "machine learning",
  "department": "Mathematics",
  "level": "Undergraduate",
  "courseUrl": "https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/",
  "maxItems": 50
}
```

# Actor output Schema

## `courses` (type: `string`):

Dataset containing all scraped MIT OCW course records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "search",
    "searchQuery": "machine learning",
    "department": "Mathematics",
    "level": "Undergraduate",
    "courseUrl": "https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/",
    "maxItems": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/mit-open-course-ware-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "search",
    "searchQuery": "machine learning",
    "department": "Mathematics",
    "level": "Undergraduate",
    "courseUrl": "https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/",
    "maxItems": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/mit-open-course-ware-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "search",
  "searchQuery": "machine learning",
  "department": "Mathematics",
  "level": "Undergraduate",
  "courseUrl": "https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/",
  "maxItems": 50
}' |
apify call crawlerbros/mit-open-course-ware-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/mit-open-course-ware-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "MIT OpenCourseWare Scraper",
        "description": "Scrape MIT OpenCourseWare (ocw.mit.edu) - 2,500+ free MIT courses with full metadata: title, department, level, instructors, topics, resource types, descriptions, and image URLs. Search by keyword, browse by department or level, or fetch a single course by URL.",
        "version": "0.1",
        "x-build-id": "OQ7phXMBLbha6n6jt"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~mit-open-course-ware-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-mit-open-course-ware-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~mit-open-course-ware-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-mit-open-course-ware-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~mit-open-course-ware-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-mit-open-course-ware-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "search",
                            "byDepartment",
                            "byLevel",
                            "byUrl"
                        ],
                        "type": "string",
                        "description": "What to scrape. 'search' — text search across all courses; 'byDepartment' — all courses in a department; 'byLevel' — filter by course level; 'byUrl' — single course by its OCW URL.",
                        "default": "search"
                    },
                    "searchQuery": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Free-text keyword search (mode=search). Examples: 'machine learning', 'quantum mechanics', 'algorithms'."
                    },
                    "department": {
                        "title": "Department (mode=byDepartment)",
                        "enum": [
                            "Aeronautics and Astronautics",
                            "Anthropology",
                            "Architecture",
                            "Athletics, Physical Education and Recreation",
                            "Biological Engineering",
                            "Biology",
                            "Brain and Cognitive Sciences",
                            "Chemical Engineering",
                            "Chemistry",
                            "Civil and Environmental Engineering",
                            "Comparative Media Studies/Writing",
                            "Concourse",
                            "Earth, Atmospheric, and Planetary Sciences",
                            "Economics",
                            "Edgerton Center",
                            "Electrical Engineering and Computer Science",
                            "Engineering Systems Division",
                            "Experimental Study Group",
                            "Global Studies and Languages",
                            "Health Sciences and Technology",
                            "History",
                            "Institute for Data, Systems, and Society",
                            "Linguistics and Philosophy",
                            "Literature",
                            "Materials Science and Engineering",
                            "Mathematics",
                            "Mechanical Engineering",
                            "Media Arts and Sciences",
                            "Music and Theater Arts",
                            "Nuclear Science and Engineering",
                            "Physics",
                            "Political Science",
                            "Science, Technology, and Society",
                            "Sloan School of Management",
                            "Special Programs",
                            "Urban Studies and Planning",
                            "Women's and Gender Studies"
                        ],
                        "type": "string",
                        "description": "Department name to browse. Use exact name from the dropdown or partial name for substring match."
                    },
                    "level": {
                        "title": "Level (mode=byLevel)",
                        "enum": [
                            "Undergraduate",
                            "Graduate"
                        ],
                        "type": "string",
                        "description": "Course level to browse."
                    },
                    "courseUrl": {
                        "title": "Course URL (mode=byUrl)",
                        "type": "string",
                        "description": "Direct OCW course URL, e.g. https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/"
                    },
                    "filterDepartment": {
                        "title": "Filter by department (search/byLevel modes)",
                        "type": "string",
                        "description": "Narrow results to a specific department when using search or byLevel mode (substring match on department name)."
                    },
                    "filterLevel": {
                        "title": "Filter by level (search/byDepartment modes)",
                        "enum": [
                            "",
                            "Undergraduate",
                            "Graduate"
                        ],
                        "type": "string",
                        "description": "Narrow results to 'Undergraduate' or 'Graduate' when using search or byDepartment mode."
                    },
                    "minYear": {
                        "title": "Min year",
                        "minimum": 1990,
                        "maximum": 2030,
                        "type": "integer",
                        "description": "Only include courses taught in or after this year (parsed from the term, e.g. 2010)."
                    },
                    "maxYear": {
                        "title": "Max year",
                        "minimum": 1990,
                        "maximum": 2030,
                        "type": "integer",
                        "description": "Only include courses taught in or before this year."
                    },
                    "containsKeyword": {
                        "title": "Contains keyword (post-filter)",
                        "type": "string",
                        "description": "Case-insensitive substring that must appear in the course title or description."
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 3000,
                        "type": "integer",
                        "description": "Maximum number of courses to output. MIT OCW has ~2,500 courses total."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
