Europarliament voting

Deprecated

Pricing

Pay per usage

See alternative Actors

Go to Apify Store

Europarliament voting

Deprecated

See alternative Actors

Parse votes of the european parliament.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Mojzis Stupka

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

2 years ago

Last modified

.actor/Dockerfile

# First, specify the base Docker image.
# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
# You can also use any other image from Docker Hub.
FROM apify/actor-python:3.11

# Second, copy just requirements.txt into the Actor image,
# since it should be the only file that affects the dependency install in the next step,
# in order to speed up the build
COPY requirements.txt ./

# Install the packages specified in requirements.txt,
# Print the installed Python version, pip version
# and all installed packages with their versions for debugging
RUN echo "Python version:" \
 && python --version \
 && echo "Pip version:" \
 && pip --version \
 && echo "Installing dependencies:" \
 && pip install -r requirements.txt \
 && echo "All installed Python packages:" \
 && pip freeze

# Next, copy the remaining files and directories with the source code.
# Since we do this after installing the dependencies, quick build will be really fast
# for most source file changes.
COPY . ./

# Use compileall to ensure the runnability of the Actor Python code.
RUN python3 -m compileall -q .

# Specify how to launch the source code of your Actor.
# By default, the "python3 -m src" command is run
CMD ["python3", "-m", "src"]

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "europarliament-voting",
    "title": "Europian Parliament voting",
    "description": "Scrapes votes of the members of the parliament.",
    "version": "0.1",
    "meta": {
        "templateId": "python-beautifulsoup"
    },
    "input": "./input_schema.json",
    "dockerfile": "./Dockerfile",
    "storages": {
        "dataset": {
            "actorSpecification": 1,
            "title": "Votes",
            "views": {
                "titles": {
                    "title": "Votes",
                    "transformation": {
                        "fields": [
                            "id",
                            "ident",
                            "vote"
                        ]
                    },
                    "display": {
                        "component": "table",
                        "properties": {
                            "id": {
                                "label": "Pers ID",
                                "format": "number"
                            },
                            "ident": {
                                "label": "Identifier of the vote",
                                "format": "number"
                            },
                            "vote": {
                                "label": "How did they vote",
                                "format": "text"
                            }

                        }
                    }
                }
            }
        }
    }
}

.actor/input_schema.json

{
    "title": "Python BeautifulSoup Scraper",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "start_urls": {
            "title": "Voting results urls",
            "type": "array",
            "description": "Enter the url of the document holding the results.",
            "prefill": [
                { "url": "https://www.europarl.europa.eu/doceo/document/PV-9-2024-02-08-RCV_EN.xml" }
            ],
            "editor": "requestListSources"
        },
        "vote_ident": {
            "title": "Roll Call Vote Result Identifier",
            "type": "integer",
            "description": "Pick which vote should be parsed",
            "default": 1
        }
    },
    "required": ["start_urls","vote_ident"]
}

src/main.py

1"""
2This module serves as the entry point for executing the Apify Actor. It handles the configuration of logging
3settings. The `main()` coroutine is then executed using `asyncio.run()`.
4
5Feel free to modify this file to suit your specific needs.
6"""
7
8import asyncio
9import logging
10
11from apify.log import ActorLogFormatter
12
13from .main import main
14
15# Configure loggers
16handler = logging.StreamHandler()
17handler.setFormatter(ActorLogFormatter())
18
19apify_client_logger = logging.getLogger('apify_client')
20apify_client_logger.setLevel(logging.INFO)
21apify_client_logger.addHandler(handler)
22
23apify_logger = logging.getLogger('apify')
24apify_logger.setLevel(logging.DEBUG)
25apify_logger.addHandler(handler)
26
27# Execute the Actor main coroutine
28asyncio.run(main())

src/main.py

1"""
2This module defines the `main()` coroutine for the Apify Actor, executed from the `__main__.py` file.
3
4Feel free to modify this file to suit your specific needs.
5
6To build Apify Actors, utilize the Apify SDK toolkit, read more at the official documentation:
7https://docs.apify.com/sdk/python
8"""
9
10from urllib.parse import urljoin
11
12from bs4 import BeautifulSoup
13from httpx import AsyncClient
14
15from apify import Actor
16
17
18async def main() -> None:
19    """
20    The main coroutine is being executed using `asyncio.run()`, so do not attempt to make a normal function
21    out of it, it will not work. Asynchronous execution is required for communication with Apify platform,
22    and it also enhances performance in the field of web scraping significantly.
23    """
24    async with Actor:
25        # Read the Actor input
26        actor_input = await Actor.get_input() or {}
27        start_urls = actor_input.get('start_urls', [{'url': 'https://www.europarl.europa.eu/doceo/document/PV-9-2024-02-08-RCV_EN.xml'}])
28        vote_ident = actor_input.get('vote_ident', 1)
29
30        if not start_urls:
31            Actor.log.info('No start URLs specified in actor input, exiting...')
32            await Actor.exit()
33
34        # Enqueue the starting URLs in the default request queue
35        default_queue = await Actor.open_request_queue()
36        for start_url in start_urls:
37            url = start_url.get('url')
38            Actor.log.info(f'Enqueuing {url} ...')
39            await default_queue.add_request({'url': url, 'userData': {'depth': 0}})
40
41        # Process the requests in the queue one by one
42        while request := await default_queue.fetch_next_request():
43            url = request['url']
44            depth = request['userData']['depth']
45            Actor.log.info(f'Scraping {url} ...')
46
47            try:
48                # Fetch the URL using `httpx`
49                async with AsyncClient() as client:
50                    response = await client.get(url, follow_redirects=True)
51
52                # Parse the response using `BeautifulSoup`
53                soup = BeautifulSoup(response.content, 'xml')
54                rolcall = soup.find("RollCallVote.Result",attrs={"Identifier":vote_ident})
55                positions = ["For","Against","Abstention"]
56
57                data = []
58
59                for pos in positions:
60                    voted = rolcall.find(f"Result.{pos}")
61                    members = voted.find_all("PoliticalGroup.Member.Name")
62                    data += [ {"id": m["PersId"], "ident": vote_ident, "vote":pos} for m in members]
63                for m in data:
64                    await Actor.push_data(m)    
65            except Exception:
66                Actor.log.exception(f'Cannot extract data from {url}.')
67            finally:
68                # Mark the request as handled so it's not processed again
69                await default_queue.mark_request_as_handled(request)

.dockerignore

# configurations
.idea

# crawlee and apify storage folders
apify_storage
crawlee_storage
storage

# installed files
.venv

# git folder
.git

.editorconfig

root = true

[*]
indent_style = space
indent_size = 4
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf

.gitignore

# This file tells Git which files shouldn't be added to source control

.idea
.DS_Store

apify_storage
storage

.venv/
.env/
__pypackages__
dist/
build/
*.egg-info/
*.egg

__pycache__

.mypy_cache
.dmypy.json
dmypy.json
.pytest_cache
.ruff_cache

.scrapy
*.log

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify ~= 1.5.5
5beautifulsoup4 ~= 4.12.2
6httpx ~= 0.25.2
7types-beautifulsoup4 ~= 4.12.0.7
8lxml ~= 5.1.0

Linkedin Sales Navigator Scraper

bestscrapers/linkedin-sales-navigator-scraper

A powerful tool designed to extract detailed lead information from a Sales Navigator URL — without the need for cookies or login.

Linkedin Scrapers

527

3.8

(14)

Youtube Comment Scraper

topaz_sharingan/Youtube-Comment-Scraper

Introducing the YouTube Comment Scraper — a dynamic tool designed to unlock the wealth of insights hidden within YouTube video comments. Whether you're a content creator, researcher, or simply curious, this actor empowers you to effortlessly extract and analyze valuable information, including commen

Moses Bilal

146

Twitter/X Video Transcript Scraper ｜ Free｜ 2025

yeahjjyy/twitter-x-video-transcript-scraper-free-2025

The Tool is a powerful and efficient tool designed to extract transcripts from videos posted on Twitter. It caters to researchers analyzing trends, content creators gathering insights, and businesses monitoring industry conversations, enabling quick and precise collection of video transcript data.

yeahjjyy

PR⭕DUCT HUNT Scraper HD ⭐💯

jupri/producthunt

💫 All-in-one Producthunt.com Scraper

cat

5.0

(1)

PR⭕DUCT HUNT Leaderboard Scraper

jupri/producthunt-leaderboard

💫 Scrape ProductHunt Leaderboard

cat

5.0

(1)

Product Hunt Scraper by day/week/month/year + product + makers

deltaspider/product-hunt-scraper

A high-performance web scraper for extracting comprehensive product launch data from Product Hunt leaderboards. Perfect for founders, investors, and researchers to gather product performance insights for market analysis, competitive intelligence, and identifying trending products.

delta spider

Stackoverflow Academic Research Exporter

red.cars/stackoverflow-academic-research-exporter

Professional Stack Overflow data export for academic research, thesis projects, and educational analysis - Export questions, answers, and community discussions by tags, search queries, and date ranges. Computer science research, programming education studies, and technology trend analysis.

AutomateLab

Linkedin Sales Navigator Scraper

freshdata/linkedin-sales-navigator-scraper

A powerful tool designed to extract detailed leads information from a sales navigator URL — without the need for cookies or login.

FreshData

480

5.0

(2)

Company House GOV.UK

nocodeventure/company-house-gov-uk

Get all the information you need about UK companies in seconds! This scraper grabs data from the official Companies House website - that's where the UK government keeps records of every company.

No-Code Venture

Anon Lab

bikrambiswas/anon-lab

Anon Lab is an open research workspace that transforms anonymity and privacy papers into readable explanations and executable code. Built on the Free Haven Anonymity Bibliography, it makes decades of privacy research interactive, reproducible, and usable for developers and researchers.

Bikram Biswas

5.0

(3)

Pikabu Comments Scraper

powerai/pikabu-comments-scraper

Scrape comments from Pikabu.ru posts with automatic pagination and comprehensive comment data extraction including ratings, emotions, and author information.