# Hespress News Scraper (`scraper_guru/hespress-scraper`) Actor

The Hespress News Scraper is a high-performance, robust data extraction tool designed to gather news articles from Morocco's leading news portal, Hespress. It supports full multilingual extraction across all of Hespress's regional subdomains.

- **URL**: https://apify.com/scraper\_guru/hespress-scraper.md
- **Developed by:** [LIAICHI MUSTAPHA](https://apify.com/scraper_guru) (community)
- **Categories:** AI, News, SEO tools
- **Stats:** 2 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Hespress News Scraper: The Gateway to Moroccan & MENA Intelligence 🇲🇦📰

![Hespress Scraper Logo](https://raw.githubusercontent.com/apify/actor-templates/master/templates/typescript-cheerio/logo.png) *(Replace with actual logo URL)*

Unlock the pulse of Morocco and North Africa with the most robust, high-performance scraper built for **Hespress**—Morocco's #1 digital news portal. This tool doesn't just scrape web pages; it extracts **structured, AI-ready intelligence** from the epicenter of the MENA region's geopolitical, economic, and social discourse.

---

### 💎 The Value Proposition: Why Hespress Data Matters

In the age of AI and real-time analytics, **data is the new oil**, but regional data is often locked behind complex DOM structures, anti-bot walls (like Cloudflare), and multi-language fragmentation. 

Hespress is the undisputed leader in Moroccan digital news, publishing thousands of articles that shape public opinion and report on critical developments across Africa and the Middle East. By utilizing this scraper, you gain immediate access to:

- **Real-Time Market Sentiment:** Track how Moroccan consumers and businesses are reacting to global and local economic shifts.
- **Geopolitical Intelligence:** Monitor developments in North Africa, sub-Saharan relations, and MENA diplomacy directly from the source.
- **High-Quality Training Data:** Arabic data is historically scarce for training Large Language Models (LLMs). This scraper provides a massive, clean corpus of modern standard Arabic, alongside French and English equivalents.

---

### 🌍 The "MENA Data Mine" Vision

We are on a mission to build the **ultimate Data Mine for the Middle East and North Africa (MENA) region.**

Historically, the MENA region has been underserved by global data providers, creating a massive blind spot for researchers, AI developers, and multinational businesses. **The Hespress News Scraper is a foundational pillar of this vision.** 

By bridging the data gap, we empower developers and enterprises to:
1. **Train smarter, culturally-aware AI agents** that understand the nuances of Moroccan and Arab dialects.
2. **Execute cross-border market research** with localized, high-fidelity data.
3. **Build predictive models** for the African and Middle Eastern markets based on structured news cycles rather than guesswork.

This isn't just a scraper; it's an infrastructure layer for the next generation of MENA-focused technology.

---

### 🚀 Use Cases: Driving ROI with Structured News

#### 🤖 For AI Engineers & LLM Builders
- **RAG (Retrieval-Augmented Generation):** Feed clean, tag-enriched articles into vector databases to build highly accurate, context-aware chatbots that know what is happening in Morocco today.
- **Dataset Generation:** Construct massive, multilingual (Arabic, French, English) datasets for fine-tuning open-source LLMs like Llama 3 or Mistral.

#### 📊 For Market Researchers & Quants
- **Trend Spotting:** Analyze the frequency of specific keywords (e.g., "Inflation", "Phosphate", "Startups") across publication dates to predict market movements.
- **Competitor & Policy Tracking:** Automatically alert your team when specific government policies, ministries, or competitors are mentioned in the news.

#### 🏢 For PR & Media Agencies
- **Brand Monitoring:** Track brand mentions, measure sentiment over time, and analyze the authors and categories driving the narrative.

---

### ⚡ Technical Superiority

Scraping modern media sites is notoriously difficult. We built this actor to be flawless:

- **Anti-Bot Resilient:** Seamlessly integrates with Apify's **Residential Proxies** to completely bypass Cloudflare's 403 Forbidden errors and CAPTCHAs.
- **True Multilingual Routing:** It doesn't just scrape one site; it concurrently navigates and normalizes data from:
  - 🇲🇦 `www.hespress.com` (Arabic)
  - 🇫🇷 `fr.hespress.com` (French)
  - 🇬🇧 `en.hespress.com` (English)
- **Ultra-Fast Engine:** Built on Crawlee's `CheerioCrawler`, skipping heavy browser rendering to extract thousands of articles at lightning speed with minimal compute costs.

---

### 🛠️ Extracted Data Schema

For every article scraped, you receive a perfectly structured JSON object ready for your database or data pipeline:

```json
{
  "url": "https://en.hespress.com/136587-parkinsons-disease-in-morocco.html",
  "title": "Parkinson's disease in Morocco: Rising challenges in diagnosis, treatment, and coverage",
  "author": "Hespress EN",
  "publishedAt": "Sunday 25 April 2026 - 14:30",
  "category": "Health",
  "tags": ["Morocco", "Health", "Parkinson"],
  "coverImage": "https://en.hespress.com/wp-content/uploads/example.jpg",
  "content": "Despite agricultural abundance, rural areas in Morocco are facing severe challenges regarding..."
}
````

***

### ⚙️ Input Configuration

Easily control the scale and scope of your extraction:

| Parameter | Type | Description |
|-----------|------|-------------|
| **Start URLs** | `Array` | Define exact sections to scrape (e.g., only the "Economy" page) or leave the default homepages to scrape everything. |
| **Max Items** | `Integer` | Set a hard limit on the number of articles to extract, allowing you to perfectly manage your Apify compute units (CUs). |
| **Proxy Configuration** | `Object` | *Crucial:* Always enable Apify Proxies. **Residential IPs** are highly recommended to ensure a 100% success rate against Hespress's security walls. |

***

*Unlock the data of tomorrow, today. Welcome to the MENA Data Mine.*

# Actor input Schema

## `startUrls` (type: `array`):

List of Hespress URLs to start scraping from. Supports hespress.com, fr.hespress.com, and en.hespress.com.

## `maxItems` (type: `integer`):

Maximum number of articles to scrape

## `proxyConfiguration` (type: `object`):

It's highly recommended to use Residential proxies to avoid blocking by Moroccan websites.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.hespress.com/"
    },
    {
      "url": "https://fr.hespress.com/"
    },
    {
      "url": "https://en.hespress.com/"
    }
  ],
  "maxItems": 100,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `articles` (type: `string`):

The structured articles containing title, content, URL, author, category, published date, tags, and cover image.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.hespress.com/"
        },
        {
            "url": "https://fr.hespress.com/"
        },
        {
            "url": "https://en.hespress.com/"
        }
    ],
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("scraper_guru/hespress-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [
        { "url": "https://www.hespress.com/" },
        { "url": "https://fr.hespress.com/" },
        { "url": "https://en.hespress.com/" },
    ],
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("scraper_guru/hespress-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.hespress.com/"
    },
    {
      "url": "https://fr.hespress.com/"
    },
    {
      "url": "https://en.hespress.com/"
    }
  ],
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call scraper_guru/hespress-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scraper_guru/hespress-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hespress News Scraper",
        "description": "The Hespress News Scraper is a high-performance, robust data extraction tool designed to gather news articles from Morocco's leading news portal, Hespress. It supports full multilingual extraction across all of Hespress's regional subdomains.",
        "version": "1.0",
        "x-build-id": "St6heUNF0y9OedavF"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scraper_guru~hespress-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scraper_guru-hespress-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scraper_guru~hespress-scraper/runs": {
            "post": {
                "operationId": "runs-sync-scraper_guru-hespress-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scraper_guru~hespress-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-scraper_guru-hespress-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "List of Hespress URLs to start scraping from. Supports hespress.com, fr.hespress.com, and en.hespress.com.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItems": {
                        "title": "Maximum items",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of articles to scrape",
                        "default": 100
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "It's highly recommended to use Residential proxies to avoid blocking by Moroccan websites."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
