GitHub Repository & Issue Scraper
Under maintenancePricing
Pay per usage
GitHub Repository & Issue Scraper
Under maintenanceExtract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. Perfect for developer lead generation, competitive analysis, and open-source research.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Automly
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Extract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. This actor is ideal for developer lead generation, competitive open-source analysis, building talent pipelines, and monitoring repository health.
Why use this actor?
- No scraping complexity — Uses the official GitHub REST API for reliable, structured data.
- Developer lead generation — Find repositories by language, stars, or topic and extract contributor contact details.
- Competitive research — Track open issues and pull requests across competitor projects.
- Talent sourcing — Extract contributor profiles with public emails, company, and location.
- RAG & AI pipelines — Feed repository descriptions, issues, and documentation into vector databases.
Features
- Search repositories by GitHub query syntax (language, stars, topics, etc.)
- Scrape specific repositories by URL or
owner/repostring - Extract open issues with labels, comments, and author details
- Extract open pull requests with merge and draft status
- Extract contributor profiles with email, company, location, and bio
- Configurable per-repository limits to control run scope
- Optional GitHub token for 5000 requests/hour (vs 60/hour unauthenticated)
Input
| Field | Type | Default | Description |
|---|---|---|---|
| searchQuery | string | — | GitHub search query, e.g. language:python stars:>1000 |
| repoUrls | array | — | List of repository URLs or owner/repo strings |
| extractIssues | boolean | false | Extract open issues per repository |
| extractPullRequests | boolean | false | Extract open pull requests per repository |
| extractUsers | boolean | false | Extract contributor profiles per repository |
| maxResults | integer | 100 | Maximum total records to return (1–1000) |
| githubToken | string | — | GitHub personal access token for higher rate limits |
| maxIssuesPerRepo | integer | 30 | Max issues per repository |
| maxPullRequestsPerRepo | integer | 30 | Max pull requests per repository |
| maxUsersPerRepo | integer | 30 | Max contributors per repository |
Example input
{"searchQuery": "language:typescript stars:>5000","extractIssues": true,"extractUsers": true,"maxResults": 50,"maxIssuesPerRepo": 10,"maxUsersPerRepo": 10}
Output
Each record includes a type field to distinguish entities.
Repository
| Field | Type | Description |
|---|---|---|
| type | string | repository |
| url | string | GitHub repository URL |
| owner | string | Repository owner |
| name | string | Repository name |
| fullName | string | owner/name |
| description | string | Repository description |
| stars | integer | Stargazer count |
| forks | integer | Fork count |
| openIssues | integer | Open issue count |
| language | string | Primary language |
| license | string | SPDX license identifier |
| createdAt | string | ISO 8601 creation timestamp |
| updatedAt | string | ISO 8601 update timestamp |
| topics | array | Repository topics |
Issue
| Field | Type | Description |
|---|---|---|
| type | string | issue |
| repository | string | Parent repository |
| url | string | Issue URL |
| number | integer | Issue number |
| title | string | Issue title |
| state | string | Issue state |
| author | string | Author username |
| labels | array | Label names |
| createdAt | string | ISO 8601 timestamp |
| updatedAt | string | ISO 8601 timestamp |
| comments | integer | Comment count |
Pull Request
| Field | Type | Description |
|---|---|---|
| type | string | pullRequest |
| repository | string | Parent repository |
| url | string | PR URL |
| number | integer | PR number |
| title | string | PR title |
| state | string | PR state |
| author | string | Author username |
| createdAt | string | ISO 8601 timestamp |
| updatedAt | string | ISO 8601 timestamp |
| merged | boolean | Merged status |
| draft | boolean | Draft status |
User
| Field | Type | Description |
|---|---|---|
| type | string | user |
| repository | string | Source repository |
| url | string | Profile URL |
| username | string | GitHub username |
| name | string | Display name |
| company | string | Company |
| blog | string | Blog URL |
| location | string | Location |
| string | Public email | |
| bio | string | Bio |
| publicRepos | integer | Public repository count |
| followers | integer | Follower count |
| following | integer | Following count |
| createdAt | string | ISO 8601 timestamp |
Limits and caveats
- Unauthenticated requests are limited to 60 per hour. Provide a
githubTokenfor 5000 per hour. - GitHub Search API returns up to 1000 results per query.
- Only public repositories are accessible without additional permissions.
- User emails are only returned if the user has chosen to make them public.
- The actor respects
maxResultsas a hard cap across all entity types.
Pricing
This actor uses Pay Per Event pricing. You are charged only for successfully extracted data.
| Event | Price | Description |
|---|---|---|
| Repository scraped | $0.005 | Each repository successfully extracted |
| Issue scraped | $0.002 | Each issue successfully extracted |
| Pull request scraped | $0.002 | Each pull request successfully extracted |
| User scraped | $0.005 | Each contributor profile successfully extracted |
Tiered discounts apply based on your Apify subscription level. A small actor-start fee may also apply.
FAQ
Do I need a GitHub account? No, but providing a GitHub personal access token dramatically increases your rate limit from 60 to 5000 requests per hour.
Can I scrape private repositories? No. This actor only accesses public data available through the GitHub REST API.
What happens if I hit the rate limit? The actor will log a warning and stop gracefully. Provide a token to avoid this.
Is the data real-time? Data reflects the current state of GitHub at the time of the run.