Afrocentric Data Collector & Annotator avatar
Afrocentric Data Collector & Annotator

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Afrocentric Data Collector & Annotator

Afrocentric Data Collector & Annotator

Intelligent web scraper that collects and annotates African-focused content from news sites, business platforms, and cultural resources. Automatically categorizes data by region, country, topics, and themes with configurable AI-powered insights.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Kayode Balogun

Kayode Balogun

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

AfroScrape Intelligence ๐ŸŒ

Intelligent web scraper that collects and annotates African-focused content with automated categorization and AI-powered insights.

Apify LICENSE

๐ŸŽฏ Overview

AfroScrape Intelligence is a specialized web scraping actor designed to collect, analyze, and annotate content related to African culture, business, innovation, and news from across the continent and diaspora. It automatically categorizes content by region, country, topics, and themes with optional AI-powered deep analysis.

โœจ Features

  • ๐ŸŒ 54 African Countries - Comprehensive coverage across all 5 regions
  • ๐Ÿค– Smart Annotation - Automatic categorization by topics, themes, and regions
  • ๐ŸŽฏ Relevance Scoring - Filter high-quality Afrocentric content (0-10 scale)
  • ๐Ÿง  AI-Powered Analysis - Optional Claude AI integration for deep insights
  • ๐Ÿ“Š Multiple Data Types - News, business, culture, and directory content
  • โšก Fast & Efficient - Built on Crawlee for optimal performance

๐Ÿš€ Quick Start

Input Configuration

{
"dataType": "news",
"maxRequestsPerCrawl": 100,
"annotateWithAI": false,
"minRelevanceScore": 2,
"startUrls": [
{
"url": "https://africanews.com"
}
]
}

Input Parameters

ParameterTypeDefaultDescription
startUrlsArray[]URLs to start crawling from
dataTypeString"news"Type of content: news, business, culture, directory
maxRequestsPerCrawlInteger100Maximum number of pages to crawl
annotateWithAIBooleanfalseEnable Claude AI for advanced annotation
minRelevanceScoreNumber1Minimum score (0-10) to save items

๐Ÿ“ฆ Output Format

{
"url": "https://example.com/article",
"title": "African Tech Startups Raise $500M in Q1 2024",
"description": "Venture capital investment in African startups reaches new heights...",
"content": "Full article content...",
"author": "John Doe",
"publishDate": "2024-01-15T10:30:00Z",
"category": "Business",
"tags": ["startups", "funding", "technology"],
"images": [
{
"src": "https://example.com/image.jpg",
"alt": "African tech entrepreneurs"
}
],
"annotations": {
"regions": ["West Africa", "East Africa"],
"countries": ["Nigeria", "Kenya"],
"topics": ["Business", "Technology", "Innovation"],
"themes": ["Entrepreneurship", "Economic Development"],
"languages": ["English"],
"mentions": {
"Nigeria": 3,
"Kenya": 2
},
"relevanceScore": 8.5,
"timestamp": "2024-01-18T12:00:00Z"
},
"collectedAt": "2024-01-18T12:00:00Z"
}

๐ŸŽฏ Use Cases

Market Research & Investment

// Track African startup ecosystem
{
"dataType": "business",
"minRelevanceScore": 5,
"maxRequestsPerCrawl": 200
}

Academic Research

// Collect data on African politics and culture
{
"dataType": "culture",
"annotateWithAI": true,
"minRelevanceScore": 3
}

News Aggregation

// Aggregate African news from multiple sources
{
"dataType": "news",
"startUrls": [
{ "url": "https://africanews.com" },
{ "url": "https://www.africa.com/news/" }
],
"maxRequestsPerCrawl": 500
}

๐Ÿ”ง Advanced Usage

Enable AI Annotation

To use AI-powered annotation with Claude, set annotateWithAI: true. This provides:

  • Cultural significance analysis
  • Key insights extraction
  • Enhanced topic detection
  • Deeper thematic analysis

Note: Requires Claude API access (handled automatically in Apify environment)

Custom URL Lists

{
"startUrls": [
{ "url": "https://techcabal.com" },
{ "url": "https://www.businessdailyafrica.com" },
{ "url": "https://www.okayafrica.com" }
],
"dataType": "business",
"maxRequestsPerCrawl": 300
}

Filter by Relevance

Set minRelevanceScore to filter content:

  • 1-3: Minimal African mentions
  • 4-6: Moderate African focus
  • 7-10: Strong Afrocentric content

๐Ÿ“Š Data Categories

Regions (5)

  • North Africa
  • West Africa
  • East Africa
  • Central Africa
  • Southern Africa

Countries (54)

All African nations included with automatic detection

Topics (13)

Business, Technology, Culture, Politics, Education, Health, Arts, Music, Sports, Innovation, Finance, Agriculture, Environment

Themes (9)

Diaspora, Pan-Africanism, Decolonization, Economic Development, Cultural Heritage, Youth Empowerment, Women Leadership, Digital Transformation, Entrepreneurship

Languages (10)

English, French, Arabic, Swahili, Hausa, Yoruba, Amharic, Zulu, Portuguese, Somali

๐Ÿ”Œ API Integration

Using Apify API Client (Node.js)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN',
});
const input = {
dataType: "news",
maxRequestsPerCrawl: 100,
minRelevanceScore: 5
};
const run = await client.actor("YOUR_USERNAME/afroscrape-intelligence").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Collected ${items.length} Afrocentric articles`);

Using REST API

curl -X POST https://api.apify.com/v2/acts/YOUR_USERNAME~afroscrape-intelligence/runs \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"dataType": "business",
"maxRequestsPerCrawl": 50,
"minRelevanceScore": 4
}'

Python Integration

from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run_input = {
"dataType": "news",
"maxRequestsPerCrawl": 100,
"minRelevanceScore": 5
}
run = client.actor('YOUR_USERNAME/afroscrape-intelligence').call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"Title: {item['title']}")
print(f"Countries: {item['annotations']['countries']}")
print(f"Score: {item['annotations']['relevanceScore']}")
print("---")

๐Ÿ“ˆ Performance Tips

  1. Start Small: Begin with maxRequestsPerCrawl: 50 to test
  2. Use Relevance Filtering: Set minRelevanceScore: 3+ to reduce noise
  3. Choose Specific Sources: Provide targeted startUrls for better results
  4. Monitor Costs: AI annotation increases compute time and costs
  5. Batch Processing: For large datasets, run multiple smaller jobs

๐Ÿ› ๏ธ Troubleshooting

Low Data Quality

  • Increase minRelevanceScore to 5+
  • Use more specific startUrls
  • Enable annotateWithAI for better filtering

Too Few Results

  • Decrease minRelevanceScore
  • Increase maxRequestsPerCrawl
  • Check if source websites are accessible

Slow Performance

  • Reduce maxRequestsPerCrawl
  • Disable annotateWithAI for faster runs
  • Use Apify proxy for better speeds

๐Ÿ“š Resources

๐Ÿค Support

๐Ÿ“„ License

Apache-2.0 License - see LICENSE file for details

๐Ÿ™ Acknowledgments

Built with:


Made with โค๏ธ for the African community and diaspora

## **2. Logo Concept**
Here's a description for a logo designer or AI image generator:

Logo Concept for "AfroScrape Intelligence":

Design Elements:

  • Central icon: Stylized African continent silhouette
  • Overlay: Data points/nodes connected by lines (representing data collection)
  • Color scheme:
    • Primary: Deep orange/gold (#FF6B35) - representing African sunset
    • Secondary: Teal/cyan (#00D9FF) - representing technology/data
    • Accent: Dark navy (#1A2238) - professional touch

Style: Modern, clean, geometric Typography: Bold sans-serif for "AfroScrape", lighter weight for "Intelligence" Icon treatment: Gradient from orange to teal, with connecting lines flowing from continent shape

Alternative versions:

  1. Square icon (for app icons)
  2. Horizontal lockup (for headers)
  3. Monochrome version (for dark/light backgrounds)