GitHub Issues Scraper avatar

GitHub Issues Scraper

Pricing

Pay per usage

Go to Apify Store
GitHub Issues Scraper

GitHub Issues Scraper

Scrape GitHub issues from repos, orgs, or search queries. Extract titles, labels, assignees, comments, reactions. Export to JSON, CSV, Excel.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Glass Ventures

Glass Ventures

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Scrape GitHub issues from any repository, organization, or search query using the GitHub REST API. Extract titles, bodies, labels, assignees, comments, reactions, milestones, and more.

What does GitHub Issues Scraper do?

GitHub Issues Scraper lets you extract structured issue data from GitHub repositories at scale. Instead of manually browsing through hundreds of issues or writing custom API scripts, this actor handles pagination, rate limiting, and data normalization automatically.

It works with individual repositories, entire organizations (scrapes issues from all public repos), and GitHub's powerful issue search. The actor hits the official GitHub REST API directly, so the data is always accurate and complete.

Whether you need to track open bugs across a competitor's repos, analyze issue trends for market research, or export issues for project management dashboards, this actor delivers clean, structured data ready for analysis.

Use Cases

  • Open source maintainers -- Export all issues from your repositories for analysis and triage in spreadsheets or dashboards
  • Market researchers -- Track competitor product issues and feature requests to identify market gaps
  • Data analysts -- Analyze issue trends, response times, and community engagement across repositories
  • Developers -- Monitor libraries and frameworks for bugs and breaking changes
  • Project managers -- Bulk export issues with labels, milestones, and assignees for reporting

Features

  • Scrape issues from any public GitHub repository
  • Scrape all repos in an organization automatically
  • Search issues across all of GitHub with search queries
  • Filter by issue state (open, closed, all)
  • Optionally fetch all comments for each issue
  • Extract reactions, labels, assignees, milestones
  • Support for GitHub personal access tokens (5,000 requests/hour vs 60)
  • Automatic pagination and rate limit handling
  • Exports to JSON, CSV, Excel, or connect via API

How much will it cost?

GitHub Issues Scraper uses the GitHub REST API, which is very efficient. Most runs complete quickly with minimal compute.

ResultsEstimated Cost
100~$0.01
1,000~$0.05
10,000~$0.25
Cost ComponentPer 1,000 Results
Platform compute~$0.03
Proxy (optional)~$0.00
Total~$0.03

Note: Fetching comments increases API calls and run time. Without a GitHub token, the rate limit is 60 requests/hour, which limits throughput.

How to use

  1. Go to the GitHub Issues Scraper page on Apify Store
  2. Click "Start" or "Try for free"
  3. Enter GitHub repository URLs (e.g., https://github.com/apify/crawlee) or search terms
  4. Optionally set the issue state filter and whether to include comments
  5. Set the maximum number of issues to scrape
  6. Click "Start" and wait for the results

Input parameters

ParameterTypeDescriptionDefault
startUrlsarrayGitHub repo or org URLs-
searchTermsarraySearch queries for GitHub issues-
issueStatestringFilter: all, open, or closedall
includeCommentsbooleanFetch comments for each issuefalse
githubTokenstringPersonal access token for higher rate limits-
maxItemsnumberMax issues to return100
maxConcurrencynumberParallel API requests5
proxyConfigobjectProxy settingsApify Proxy

Output

The actor produces a dataset with the following fields:

{
"url": "https://github.com/apify/crawlee/issues/1234",
"issueNumber": 1234,
"title": "Bug: CheerioCrawler timeout on large pages",
"body": "## Description\nWhen crawling pages larger than 5MB...",
"state": "open",
"author": "username",
"authorUrl": "https://github.com/username",
"labels": ["bug", "priority-high"],
"assignees": ["maintainer1"],
"commentsCount": 5,
"reactionsCount": 3,
"reactions": { "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 },
"milestone": "v3.5.0",
"isPullRequest": false,
"repository": "apify/crawlee",
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-20T14:00:00Z",
"closedAt": null,
"comments": null,
"scrapedAt": "2024-01-25T12:00:00.000Z"
}
FieldTypeDescription
urlstringIssue URL on GitHub
issueNumbernumberIssue number in the repository
titlestringIssue title
bodystringIssue body content (Markdown)
statestringopen or closed
authorstringUsername of issue creator
authorUrlstringProfile URL of issue creator
labelsarrayList of label names
assigneesarrayList of assigned usernames
commentsCountnumberNumber of comments
reactionsCountnumberTotal reaction count
reactionsobjectReaction breakdown by type
milestonestringMilestone name
isPullRequestbooleanWhether entry is a pull request
repositorystringRepository full name (owner/repo)
createdAtstringISO 8601 creation date
updatedAtstringISO 8601 last update date
closedAtstringISO 8601 close date
commentsarrayFull comment data (when includeComments is true)
scrapedAtstringISO 8601 scrape timestamp

Integrations

Connect GitHub Issues Scraper with other tools:

  • Apify API -- REST API for programmatic access
  • Webhooks -- get notified when a run finishes
  • Zapier / Make -- connect to 5,000+ apps
  • Google Sheets -- export directly to spreadsheets

API Example (Node.js)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('YOUR_USERNAME/github-issues-scraper').call({
startUrls: [{ url: 'https://github.com/apify/crawlee' }],
maxItems: 100,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

API Example (Python)

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('YOUR_USERNAME/github-issues-scraper').call(run_input={
'startUrls': [{'url': 'https://github.com/apify/crawlee'}],
'maxItems': 100,
})
items = client.dataset(run['defaultDatasetId']).list_items().items

API Example (cURL)

curl "https://api.apify.com/v2/acts/YOUR_USERNAME~github-issues-scraper/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"startUrls": [{"url": "https://github.com/apify/crawlee"}], "maxItems": 100}'

Tips and tricks

  • Start with a small maxItems (10-20) to test before running large scrapes
  • Add a GitHub personal access token to increase rate limits from 60 to 5,000 requests/hour
  • Use issueState: "open" to only get active issues and reduce data volume
  • Enable includeComments only when you need comment data, as it significantly increases API calls
  • For organization URLs, all public repos will be scraped -- combine with maxItems to control volume
  • GitHub search is limited to 1,000 results per query -- use specific search terms for best results

FAQ

Q: Does this actor require login credentials? A: No. GitHub's REST API is publicly accessible. However, adding a personal access token increases the rate limit from 60 to 5,000 requests per hour.

Q: How fast is the scraping? A: Without a token: ~50-60 issues per hour (rate limited). With a token: ~5,000-10,000 issues per hour depending on whether comments are included.

Q: What should I do if I get rate limited? A: Add a GitHub personal access token in the Authentication section. You can create one at github.com/settings/tokens with no special permissions needed for public repos.

Q: Does it scrape pull requests too? A: GitHub's issues API includes pull requests. Each item has an isPullRequest field so you can filter them out if needed.

Q: Can I scrape private repositories? A: Yes, if you provide a GitHub personal access token that has access to the private repository.

This actor uses the official GitHub REST API, which is a public API designed for programmatic access. It respects rate limits and follows GitHub's API usage guidelines. GitHub's API Terms of Service permit accessing public data. Always review GitHub's Terms of Service for your specific use case. For more information, see Apify's blog on web scraping legality.

Limitations

  • Without a GitHub token, rate limit is 60 API requests per hour
  • GitHub search API returns a maximum of 1,000 results per query
  • Only public repositories are accessible without authentication
  • Pull requests are included in the issues API (filterable via isPullRequest field)

Changelog

  • v0.1 (2026-04-23) -- Initial release