Media Bias/Fact Check Source Credibility Scraper avatar

Media Bias/Fact Check Source Credibility Scraper

Pricing

Pay per event

Go to Apify Store
Media Bias/Fact Check Source Credibility Scraper

Media Bias/Fact Check Source Credibility Scraper

Pull structured source-credibility records from Media Bias/Fact Check (MBFC) -- the largest media-source reliability database (~7,000+ profiles). Returns bias rating, factual-reporting tier, MBFC credibility rating, country press-freedom, media type, traffic, and full History/Funding/Analysis prose.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Pull structured source-credibility records from Media Bias/Fact Check (MBFC) — the largest media-source reliability database on the internet (~7,000+ profiles).

For each outlet, returns:

  • Bias rating (normalized slug + verbatim MBFC label)
  • Factual reporting tier
  • MBFC credibility rating
  • Country and country press-freedom rating
  • Media type and traffic tier
  • Full History, Funded by / Ownership, and Analysis / Bias prose sections

Use Cases

  • Fact-checking pipelines — source-level trust layer alongside claim-level fact-checkers (PolitiFact, Snopes)
  • Disinformation research — identify and filter conspiracy-pseudoscience / questionable sources
  • Brand safety — screen media sources before advertising placement
  • RAG source-filtering — weight or exclude sources by credibility rating before ingestion
  • OSINT / media-literacy — annotate article URLs with source bias and credibility metadata

Input

FieldTypeDefaultDescription
modestring (required)allall = full corpus; category = selected bias categories; seed = explicit profile URLs
categoriesstring[]Bias categories when mode=category. Options: center, left-center, left, right-center, right, pro-science, conspiracy-pseudoscience, questionable, satire
seedUrlsstring[]Explicit MBFC profile URLs when mode=seed
maxItemsinteger200Maximum source profiles to return (0 = unlimited)
includeBodybooleantrueInclude History / Funding / Analysis prose sections
proxyConfigurationobjectno proxyOptional proxy (MBFC does not require proxy on plain UA)

Output

Each record contains:

{
"sourceName": "247Sports",
"sourceUrl": "https://mediabiasfactcheck.com/247sports-bias-and-credibility/",
"sourceHomepage": "https://247sports.com",
"biasRating": "center",
"rawBiasRating": "LEAST BIASED",
"factualReporting": "HIGH",
"credibilityRating": "HIGH CREDIBILITY",
"country": "United States",
"countryFreedomRating": "MOSTLY FREE",
"mediaType": "Website",
"trafficPopularity": "High Traffic",
"categoryIndex": "center",
"history": "247Sports, established in 2010 by Shannon Terry...",
"fundedByOwnership": "247Sports is owned by CBS Interactive...",
"analysisBias": "247Sports focuses on sports news...",
"lastUpdated": "April 16, 2024",
"reviewedBy": "",
"articleJsonLd": { "...": "..." },
"bodyMarkdown": "## History\n\n...",
"status": "success",
"errorMsg": ""
}

Bias Rating Normalization

MBFC LabelNormalized Slug
LEAST BIASEDcenter
LEFT-CENTER BIASleft-center
LEFT BIASleft
RIGHT-CENTER BIASright-center
RIGHT BIASright
PRO-SCIENCEpro-science
CONSPIRACY-PSEUDOSCIENCEconspiracy-pseudoscience
QUESTIONABLE SOURCEquestionable
SATIREsatire

Predefined Dataset Views

  • Source Credibility Table — sourceName, sourceHomepage, biasRating, factualReporting, credibilityRating, country
  • Low-Credibility Sources — focused view for conspiracy-pseudoscience, questionable, and low-credibility sources

Architecture

Pure HTTP two-level hierarchical crawl using CoreCrawler. No browser, no proxy required.

Level 1 (category): Walk each MBFC bias-category index page
(/center/ /leftcenter/ /left/ /right-center/ /right/
/pro-science/ /conspiracy/ /questionable/ /satire/)
→ parse <table> of source profile links
→ link text "Source Name (domain.com)" → extract sourceHomepage
Level 2 (profile): Fetch each source profile page
→ parse "Detailed Report" block for structured fields
→ extract History / Funded by / Analysis sections
→ extract JSON-LD Article metadata

Rate-limit handling: CoreCrawler detects 429 responses and backs off exponentially. MBFC enforces per-IP rate limits on aggressive crawlers; the actor uses polite concurrency (3 concurrent requests max).

Crawl Modes

mode=all (default): Walks all 9 bias-category index pages and fetches every source profile in the corpus (~7,000 profiles total). Suitable for full-archive downloads.

mode=category: Walks only the specified bias categories. Useful for targeted pulls (e.g., all conspiracy-pseudoscience sources for a disinfo pipeline).

mode=seed: Fetches only the explicitly provided MBFC profile URLs. Suitable for spot-lookups or updating specific source records.

Performance

  • Default memory: 512 MB
  • Full corpus run: ~7,000 profiles at polite 1-3 req/s ≈ 2-4 hours
  • Category run (e.g., center ~500 profiles): ~15-30 minutes
  • Seed run (single URL): under 1 minute