GitHub Repository & Issue Scraper avatar

GitHub Repository & Issue Scraper

Under maintenance

Pricing

Pay per usage

Go to Apify Store
GitHub Repository & Issue Scraper

GitHub Repository & Issue Scraper

Under maintenance

Extract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. Perfect for developer lead generation, competitive analysis, and open-source research.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Automly

Automly

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Extract repository metadata, issues, pull requests, and contributor profiles from GitHub using the official REST API. This actor is ideal for developer lead generation, competitive open-source analysis, building talent pipelines, and monitoring repository health.

Why use this actor?

  • No scraping complexity — Uses the official GitHub REST API for reliable, structured data.
  • Developer lead generation — Find repositories by language, stars, or topic and extract contributor contact details.
  • Competitive research — Track open issues and pull requests across competitor projects.
  • Talent sourcing — Extract contributor profiles with public emails, company, and location.
  • RAG & AI pipelines — Feed repository descriptions, issues, and documentation into vector databases.

Features

  • Search repositories by GitHub query syntax (language, stars, topics, etc.)
  • Scrape specific repositories by URL or owner/repo string
  • Extract open issues with labels, comments, and author details
  • Extract open pull requests with merge and draft status
  • Extract contributor profiles with email, company, location, and bio
  • Configurable per-repository limits to control run scope
  • Optional GitHub token for 5000 requests/hour (vs 60/hour unauthenticated)

Input

FieldTypeDefaultDescription
searchQuerystringGitHub search query, e.g. language:python stars:>1000
repoUrlsarrayList of repository URLs or owner/repo strings
extractIssuesbooleanfalseExtract open issues per repository
extractPullRequestsbooleanfalseExtract open pull requests per repository
extractUsersbooleanfalseExtract contributor profiles per repository
maxResultsinteger100Maximum total records to return (1–1000)
githubTokenstringGitHub personal access token for higher rate limits
maxIssuesPerRepointeger30Max issues per repository
maxPullRequestsPerRepointeger30Max pull requests per repository
maxUsersPerRepointeger30Max contributors per repository

Example input

{
"searchQuery": "language:typescript stars:>5000",
"extractIssues": true,
"extractUsers": true,
"maxResults": 50,
"maxIssuesPerRepo": 10,
"maxUsersPerRepo": 10
}

Output

Each record includes a type field to distinguish entities.

Repository

FieldTypeDescription
typestringrepository
urlstringGitHub repository URL
ownerstringRepository owner
namestringRepository name
fullNamestringowner/name
descriptionstringRepository description
starsintegerStargazer count
forksintegerFork count
openIssuesintegerOpen issue count
languagestringPrimary language
licensestringSPDX license identifier
createdAtstringISO 8601 creation timestamp
updatedAtstringISO 8601 update timestamp
topicsarrayRepository topics

Issue

FieldTypeDescription
typestringissue
repositorystringParent repository
urlstringIssue URL
numberintegerIssue number
titlestringIssue title
statestringIssue state
authorstringAuthor username
labelsarrayLabel names
createdAtstringISO 8601 timestamp
updatedAtstringISO 8601 timestamp
commentsintegerComment count

Pull Request

FieldTypeDescription
typestringpullRequest
repositorystringParent repository
urlstringPR URL
numberintegerPR number
titlestringPR title
statestringPR state
authorstringAuthor username
createdAtstringISO 8601 timestamp
updatedAtstringISO 8601 timestamp
mergedbooleanMerged status
draftbooleanDraft status

User

FieldTypeDescription
typestringuser
repositorystringSource repository
urlstringProfile URL
usernamestringGitHub username
namestringDisplay name
companystringCompany
blogstringBlog URL
locationstringLocation
emailstringPublic email
biostringBio
publicReposintegerPublic repository count
followersintegerFollower count
followingintegerFollowing count
createdAtstringISO 8601 timestamp

Limits and caveats

  • Unauthenticated requests are limited to 60 per hour. Provide a githubToken for 5000 per hour.
  • GitHub Search API returns up to 1000 results per query.
  • Only public repositories are accessible without additional permissions.
  • User emails are only returned if the user has chosen to make them public.
  • The actor respects maxResults as a hard cap across all entity types.

Pricing

This actor uses Pay Per Event pricing. You are charged only for successfully extracted data.

EventPriceDescription
Repository scraped$0.005Each repository successfully extracted
Issue scraped$0.002Each issue successfully extracted
Pull request scraped$0.002Each pull request successfully extracted
User scraped$0.005Each contributor profile successfully extracted

Tiered discounts apply based on your Apify subscription level. A small actor-start fee may also apply.

FAQ

Do I need a GitHub account? No, but providing a GitHub personal access token dramatically increases your rate limit from 60 to 5000 requests per hour.

Can I scrape private repositories? No. This actor only accesses public data available through the GitHub REST API.

What happens if I hit the rate limit? The actor will log a warning and stop gracefully. Provide a token to avoid this.

Is the data real-time? Data reflects the current state of GitHub at the time of the run.