Media Bias/Fact Check Source Credibility Scraper
Pricing
Pay per event
Media Bias/Fact Check Source Credibility Scraper
Pull structured source-credibility records from Media Bias/Fact Check (MBFC) -- the largest media-source reliability database (~7,000+ profiles). Returns bias rating, factual-reporting tier, MBFC credibility rating, country press-freedom, media type, traffic, and full History/Funding/Analysis prose.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Pull structured source-credibility records from Media Bias/Fact Check (MBFC) — the largest media-source reliability database on the internet (~7,000+ profiles).
For each outlet, returns:
- Bias rating (normalized slug + verbatim MBFC label)
- Factual reporting tier
- MBFC credibility rating
- Country and country press-freedom rating
- Media type and traffic tier
- Full History, Funded by / Ownership, and Analysis / Bias prose sections
Use Cases
- Fact-checking pipelines — source-level trust layer alongside claim-level fact-checkers (PolitiFact, Snopes)
- Disinformation research — identify and filter conspiracy-pseudoscience / questionable sources
- Brand safety — screen media sources before advertising placement
- RAG source-filtering — weight or exclude sources by credibility rating before ingestion
- OSINT / media-literacy — annotate article URLs with source bias and credibility metadata
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string (required) | all | all = full corpus; category = selected bias categories; seed = explicit profile URLs |
categories | string[] | — | Bias categories when mode=category. Options: center, left-center, left, right-center, right, pro-science, conspiracy-pseudoscience, questionable, satire |
seedUrls | string[] | — | Explicit MBFC profile URLs when mode=seed |
maxItems | integer | 200 | Maximum source profiles to return (0 = unlimited) |
includeBody | boolean | true | Include History / Funding / Analysis prose sections |
proxyConfiguration | object | no proxy | Optional proxy (MBFC does not require proxy on plain UA) |
Output
Each record contains:
{"sourceName": "247Sports","sourceUrl": "https://mediabiasfactcheck.com/247sports-bias-and-credibility/","sourceHomepage": "https://247sports.com","biasRating": "center","rawBiasRating": "LEAST BIASED","factualReporting": "HIGH","credibilityRating": "HIGH CREDIBILITY","country": "United States","countryFreedomRating": "MOSTLY FREE","mediaType": "Website","trafficPopularity": "High Traffic","categoryIndex": "center","history": "247Sports, established in 2010 by Shannon Terry...","fundedByOwnership": "247Sports is owned by CBS Interactive...","analysisBias": "247Sports focuses on sports news...","lastUpdated": "April 16, 2024","reviewedBy": "","articleJsonLd": { "...": "..." },"bodyMarkdown": "## History\n\n...","status": "success","errorMsg": ""}
Bias Rating Normalization
| MBFC Label | Normalized Slug |
|---|---|
| LEAST BIASED | center |
| LEFT-CENTER BIAS | left-center |
| LEFT BIAS | left |
| RIGHT-CENTER BIAS | right-center |
| RIGHT BIAS | right |
| PRO-SCIENCE | pro-science |
| CONSPIRACY-PSEUDOSCIENCE | conspiracy-pseudoscience |
| QUESTIONABLE SOURCE | questionable |
| SATIRE | satire |
Predefined Dataset Views
- Source Credibility Table — sourceName, sourceHomepage, biasRating, factualReporting, credibilityRating, country
- Low-Credibility Sources — focused view for conspiracy-pseudoscience, questionable, and low-credibility sources
Architecture
Pure HTTP two-level hierarchical crawl using CoreCrawler. No browser, no proxy required.
Level 1 (category): Walk each MBFC bias-category index page(/center/ /leftcenter/ /left/ /right-center/ /right//pro-science/ /conspiracy/ /questionable/ /satire/)→ parse <table> of source profile links→ link text "Source Name (domain.com)" → extract sourceHomepageLevel 2 (profile): Fetch each source profile page→ parse "Detailed Report" block for structured fields→ extract History / Funded by / Analysis sections→ extract JSON-LD Article metadata
Rate-limit handling: CoreCrawler detects 429 responses and backs off exponentially. MBFC enforces per-IP rate limits on aggressive crawlers; the actor uses polite concurrency (3 concurrent requests max).
Crawl Modes
mode=all (default): Walks all 9 bias-category index pages and fetches every source profile in the corpus (~7,000 profiles total). Suitable for full-archive downloads.
mode=category: Walks only the specified bias categories. Useful for targeted pulls (e.g., all conspiracy-pseudoscience sources for a disinfo pipeline).
mode=seed: Fetches only the explicitly provided MBFC profile URLs. Suitable for spot-lookups or updating specific source records.
Performance
- Default memory: 512 MB
- Full corpus run: ~7,000 profiles at polite 1-3 req/s ≈ 2-4 hours
- Category run (e.g., center ~500 profiles): ~15-30 minutes
- Seed run (single URL): under 1 minute