# Baseball Reference Scraper - Player Stats (`lulzasaur/baseball-reference-scraper`) Actor

Scrape Baseball-Reference.com for player stats, career data, and team rosters. Extract batting, pitching, and biographical info using structured data-stat attributes and microdata.

- **URL**: https://apify.com/lulzasaur/baseball-reference-scraper.md
- **Developed by:** [lulz bot](https://apify.com/lulzasaur) (community)
- **Categories:** Other
- **Stats:** 1 total users, 0 monthly users, 50.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Baseball Reference Scraper

Scrape MLB player statistics, career data, and team rosters from [Baseball-Reference.com](https://www.baseball-reference.com). Extract batting stats, pitching stats, biographical information, and full team rosters using Baseball-Reference's structured `data-stat` attributes and schema.org microdata.

### Features

- **Player Search** - Search for players by name and get links to their profile pages
- **Player Stats** - Extract full career batting and pitching stats for any player
- **Team Roster** - Get complete team rosters with player stats for any season
- **Structured Data** - Leverages `data-stat` attributes and JSON-LD schema.org markup for reliable extraction
- **Biographical Info** - Player name, position, bats/throws, height, weight, birth date, birth place, debut date
- **Career Totals** - Includes career total rows alongside season-by-season breakdowns
- **Two-Way Players** - Handles players with both batting and pitching stats (e.g., Shohei Ohtani)

### Modes

#### Player Search (`playerSearch`)

Search for players by name. Returns matching player names and profile URLs.

**Input:**
- `query` - Player name to search (e.g., "Mike Trout", "Rodriguez")
- `limit` - Maximum number of results (default: 50)

**Output:**
```json
{
  "type": "searchResult",
  "playerName": "Mike Trout",
  "url": "https://www.baseball-reference.com/players/t/troutmi01.shtml",
  "directMatch": true,
  "query": "Mike Trout"
}
````

#### Player Stats (`playerStats`)

Get full career statistics for a specific player. Returns batting and/or pitching stats by season plus career totals.

**Input:**

- `playerUrl` - Full Baseball-Reference player URL
- `season` - Optional year to filter to a specific season
- `limit` - Maximum number of season rows to return

**Output:**

```json
{
  "type": "playerStats",
  "playerName": "Mike Trout",
  "position": "Centerfielder",
  "bats": "Right",
  "throws": "Right",
  "height": "6-1",
  "weight": "235 lbs",
  "birthDate": "1991-08-07",
  "team": "Los Angeles Angels",
  "battingSeasons": [
    {
      "season": "2012",
      "team": "LAA",
      "gamesPlayed": 139,
      "atBats": 559,
      "hits": 182,
      "homeRuns": 30,
      "rbi": 83,
      "battingAverage": ".326",
      "ops": ".963"
    }
  ],
  "battingCareerTotals": {
    "gamesPlayed": 1240,
    "homeRuns": 378,
    "battingAverage": ".299"
  }
}
```

#### Team Roster (`teamRoster`)

Get the full roster and stats for a team in a given season.

**Input:**

- `teamUrl` - Full Baseball-Reference team page URL (e.g., `https://www.baseball-reference.com/teams/LAA/2024.shtml`)
- `limit` - Maximum number of players to return

**Output:**

```json
{
  "type": "rosterPlayer",
  "playerName": "Mike Trout",
  "position": "CF",
  "team": "Los Angeles Angels",
  "season": "2024",
  "gamesPlayed": 29,
  "homeRuns": 10,
  "battingAverage": ".220",
  "url": "https://www.baseball-reference.com/players/t/troutmi01.shtml"
}
```

### Common Team URL Codes

| Team | Code | Example URL |
|------|------|-------------|
| New York Yankees | NYY | `/teams/NYY/2024.shtml` |
| Los Angeles Dodgers | LAD | `/teams/LAD/2024.shtml` |
| Los Angeles Angels | LAA | `/teams/LAA/2024.shtml` |
| Boston Red Sox | BOS | `/teams/BOS/2024.shtml` |
| Chicago Cubs | CHC | `/teams/CHC/2024.shtml` |
| Houston Astros | HOU | `/teams/HOU/2024.shtml` |
| Atlanta Braves | ATL | `/teams/ATL/2024.shtml` |
| San Francisco Giants | SFG | `/teams/SFG/2024.shtml` |

### Batting Stats Extracted

Games (G), Plate Appearances (PA), At Bats (AB), Runs (R), Hits (H), Doubles (2B), Triples (3B), Home Runs (HR), RBI, Stolen Bases (SB), Caught Stealing (CS), Walks (BB), Strikeouts (SO), Batting Average (AVG), On-Base Percentage (OBP), Slugging Percentage (SLG), OPS, OPS+, Total Bases (TB), GDP, HBP, Sacrifice Hits, Sacrifice Flies, Intentional Walks

### Pitching Stats Extracted

Wins (W), Losses (L), Win-Loss %, ERA, Games (G), Games Started (GS), Complete Games (CG), Shutouts (SHO), Saves (SV), Innings Pitched (IP), Hits Allowed, Earned Runs, Home Runs Allowed, Walks, Strikeouts, WHIP, ERA+, H/9, HR/9, BB/9, SO/9, SO/W

### Proxy Configuration

Baseball-Reference may rate-limit aggressive scraping. Residential proxies are recommended for large-scale runs. The default configuration uses Apify residential proxies.

### Technical Notes

- Uses `got-scraping` + `cheerio` for fast HTML parsing (no browser needed)
- Extracts data from `data-stat` attributes on table cells for reliable field mapping
- Parses JSON-LD `schema.org/Person` data for biographical information
- Handles HTML comments (Baseball-Reference hides some tables inside comments)
- Respects rate limits with built-in delays and retry logic
- Pay-per-event charging: billed per result returned

# Actor input Schema

## `mode` (type: `string`):

What to scrape: search for players by name, get full stats for a specific player, or get a team roster.

## `query` (type: `string`):

Player name to search for (e.g., 'Mike Trout', 'Rodriguez', 'Ohtani'). Used in playerSearch mode.

## `playerUrl` (type: `string`):

Full Baseball-Reference player URL (e.g., 'https://www.baseball-reference.com/players/t/troutmi01.shtml'). Used in playerStats mode.

## `teamUrl` (type: `string`):

Full Baseball-Reference team page URL (e.g., 'https://www.baseball-reference.com/teams/LAA/2024.shtml'). Used in teamRoster mode.

## `season` (type: `integer`):

Optional season year to filter stats (e.g., 2024). If not set, returns all seasons for playerStats or latest for teamRoster.

## `limit` (type: `integer`):

Maximum number of results to return. For playerSearch, limits search results. For playerStats, limits season rows.

## `proxyConfiguration` (type: `object`):

Proxy settings. Recommended for large-scale runs to avoid rate limiting by Baseball-Reference.

## Actor input object example

```json
{
  "mode": "playerSearch",
  "query": "Mike Trout",
  "playerUrl": "https://www.baseball-reference.com/players/t/troutmi01.shtml",
  "teamUrl": "https://www.baseball-reference.com/teams/LAA/2024.shtml",
  "limit": 50,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "Mike Trout",
    "playerUrl": "https://www.baseball-reference.com/players/t/troutmi01.shtml",
    "teamUrl": "https://www.baseball-reference.com/teams/LAA/2024.shtml",
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("lulzasaur/baseball-reference-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "Mike Trout",
    "playerUrl": "https://www.baseball-reference.com/players/t/troutmi01.shtml",
    "teamUrl": "https://www.baseball-reference.com/teams/LAA/2024.shtml",
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("lulzasaur/baseball-reference-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "Mike Trout",
  "playerUrl": "https://www.baseball-reference.com/players/t/troutmi01.shtml",
  "teamUrl": "https://www.baseball-reference.com/teams/LAA/2024.shtml",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call lulzasaur/baseball-reference-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=lulzasaur/baseball-reference-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Baseball Reference Scraper - Player Stats",
        "description": "Scrape Baseball-Reference.com for player stats, career data, and team rosters. Extract batting, pitching, and biographical info using structured data-stat attributes and microdata.",
        "version": "1.0",
        "x-build-id": "kWNR2FZHmLVcG1daz"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/lulzasaur~baseball-reference-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-lulzasaur-baseball-reference-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/lulzasaur~baseball-reference-scraper/runs": {
            "post": {
                "operationId": "runs-sync-lulzasaur-baseball-reference-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/lulzasaur~baseball-reference-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-lulzasaur-baseball-reference-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "playerSearch",
                            "playerStats",
                            "teamRoster"
                        ],
                        "type": "string",
                        "description": "What to scrape: search for players by name, get full stats for a specific player, or get a team roster.",
                        "default": "playerSearch"
                    },
                    "query": {
                        "title": "Search Query (Player Search mode)",
                        "type": "string",
                        "description": "Player name to search for (e.g., 'Mike Trout', 'Rodriguez', 'Ohtani'). Used in playerSearch mode.",
                        "default": "Mike Trout"
                    },
                    "playerUrl": {
                        "title": "Player URL (Player Stats mode)",
                        "type": "string",
                        "description": "Full Baseball-Reference player URL (e.g., 'https://www.baseball-reference.com/players/t/troutmi01.shtml'). Used in playerStats mode."
                    },
                    "teamUrl": {
                        "title": "Team URL (Team Roster mode)",
                        "type": "string",
                        "description": "Full Baseball-Reference team page URL (e.g., 'https://www.baseball-reference.com/teams/LAA/2024.shtml'). Used in teamRoster mode."
                    },
                    "season": {
                        "title": "Season Year",
                        "minimum": 1871,
                        "maximum": 2030,
                        "type": "integer",
                        "description": "Optional season year to filter stats (e.g., 2024). If not set, returns all seasons for playerStats or latest for teamRoster."
                    },
                    "limit": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of results to return. For playerSearch, limits search results. For playerStats, limits season rows.",
                        "default": 50
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings. Recommended for large-scale runs to avoid rate limiting by Baseball-Reference.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
