Fanfix.com scraper avatar
Fanfix.com scraper
Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Fanfix.com scraper

Fanfix.com scraper

iskander/fanfix-com-scraper

.actor/Dockerfile

1# First, specify the base Docker image.
2# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.
3# You can also use any other image from Docker Hub.
4FROM apify/actor-python:3.11
5
6# Second, copy just requirements.txt into the Actor image,
7# since it should be the only file that affects the dependency install in the next step,
8# in order to speed up the build
9COPY requirements.txt ./
10
11# Install the packages specified in requirements.txt,
12# Print the installed Python version, pip version
13# and all installed packages with their versions for debugging
14RUN echo "Python version:" \
15 && python --version \
16 && echo "Pip version:" \
17 && pip --version \
18 && echo "Installing dependencies:" \
19 && pip install -r requirements.txt \
20 && echo "All installed Python packages:" \
21 && pip freeze
22
23# Next, copy the remaining files and directories with the source code.
24# Since we do this after installing the dependencies, quick build will be really fast
25# for most source file changes.
26COPY . ./
27
28# Use compileall to ensure the runnability of the Actor Python code.
29RUN python3 -m compileall -q .
30
31# Specify how to launch the source code of your Actor.
32# By default, the "python3 -m src" command is run
33CMD ["python3", "-m", "src"]

.actor/actor.json

1{
2    "actorSpecification": 1,
3    "name": "Fanfix.com scraper",
4    "title": "Fanfix.com scraper",
5    "description": "Scrape data from models profiles",
6    "version": "0.1",
7    "meta": {
8        "templateId": "python-start"
9    },
10    "input": "./input_schema.json",
11    "dockerfile": "./Dockerfile"
12}

.actor/input_schema.json

1{
2    "title": "Scrape data from a web page",
3    "type": "object",
4    "schemaVersion": 1,
5    "properties": {
6        "url": {
7            "title": "URL of the user profile",
8            "type": "array",
9            "description": "The URL of user profile you want to get the data from.",
10            "editor": "requestListSources",
11            "prefill": [
12                { "url": "https://app.fanfix.io/@kaylavoid" }
13            ]
14
15        }
16    },
17    "required": ["url"]
18}

src/__main__.py

1"""
2This module serves as the entry point for executing the Apify Actor. It handles the configuration of logging
3settings. The `main()` coroutine is then executed using `asyncio.run()`.
4
5Feel free to modify this file to suit your specific needs.
6"""
7
8import asyncio
9import logging
10
11from apify.log import ActorLogFormatter
12
13from .main import main
14
15# Configure loggers
16handler = logging.StreamHandler()
17handler.setFormatter(ActorLogFormatter())
18
19apify_client_logger = logging.getLogger('apify_client')
20apify_client_logger.setLevel(logging.INFO)
21apify_client_logger.addHandler(handler)
22
23apify_logger = logging.getLogger('apify')
24apify_logger.setLevel(logging.DEBUG)
25apify_logger.addHandler(handler)
26
27# Execute the Actor main coroutine
28asyncio.run(main())

src/main.py

1# Beautiful Soup - library for pulling data out of HTML and XML files, read more at
2# https://www.crummy.com/software/BeautifulSoup/bs4/doc
3from bs4 import BeautifulSoup
4
5# HTTPX - library for making asynchronous HTTP requests in Python, read more at https://www.python-httpx.org/
6from httpx import AsyncClient
7
8# Apify SDK - toolkit for building Apify Actors, read more at https://docs.apify.com/sdk/python
9from apify import Actor
10
11
12async def main() -> None:
13
14    async with Actor:
15        actor_input = await Actor.get_input() or {}
16        urls = actor_input.get('url')
17
18        for url in urls:
19            async with AsyncClient( verify=False) as client:
20                response = await client.get(url['url'], follow_redirects=True)
21
22            soup = BeautifulSoup(response.content, 'html.parser')
23
24            headings = []
25            full_name = soup.select('div[data-testid="creator-fullname-stack-ds"]')[0].text
26            bio = soup.select('div[data-testid="profile-header-card-content-bio-ds"]')[0].text
27            profile_image = soup.select('img[alt="Profile Picture"]')[0]['src']
28            model_object = {'full_name': full_name, 'bio': bio, 'profile_image': profile_image}
29            Actor.log.info(f'scrapped: {full_name}')
30
31            await Actor.push_data(model_object)

.dockerignore

1# configurations
2.idea
3
4# crawlee and apify storage folders
5apify_storage
6crawlee_storage
7storage
8
9# installed files
10.venv
11
12# git folder
13.git

.editorconfig

1root = true
2
3[*]
4indent_style = space
5indent_size = 4
6charset = utf-8
7trim_trailing_whitespace = true
8insert_final_newline = true
9end_of_line = lf

.gitignore

1# This file tells Git which files shouldn't be added to source control
2
3.idea
4.DS_Store
5
6apify_storage
7storage/*
8!storage/key_value_stores
9storage/key_value_stores/*
10!storage/key_value_stores/default
11storage/key_value_stores/default/*
12!storage/key_value_stores/default/INPUT.json
13
14.venv/
15.env/
16__pypackages__
17dist/
18build/
19*.egg-info/
20*.egg
21
22__pycache__
23
24.mypy_cache
25.dmypy.json
26dmypy.json
27.pytest_cache
28.ruff_cache
29
30.scrapy
31*.log

requirements.txt

1# Feel free to add your Python dependencies below. For formatting guidelines, see:
2# https://pip.pypa.io/en/latest/reference/requirements-file-format/
3
4apify ~= 1.6.0
5beautifulsoup4 ~= 4.12.2
6httpx ~= 0.25.2
7types-beautifulsoup4 ~= 4.12.0.7
Developer
Maintained by Community
Categories