Fanfix.com scraper
Deprecated
Pricing
Pay per usage
Go to Store
Fanfix.com scraper
Deprecated
0.0 (0)
Pricing
Pay per usage
1
Total users
37
Monthly users
2
Runs succeeded
0%
Last modified
a year ago
.actor/Dockerfile
# First, specify the base Docker image.# You can see the Docker images from Apify at https://hub.docker.com/r/apify/.# You can also use any other image from Docker Hub.FROM apify/actor-python:3.11
# Second, copy just requirements.txt into the Actor image,# since it should be the only file that affects the dependency install in the next step,# in order to speed up the buildCOPY requirements.txt ./
# Install the packages specified in requirements.txt,# Print the installed Python version, pip version# and all installed packages with their versions for debuggingRUN echo "Python version:" \ && python --version \ && echo "Pip version:" \ && pip --version \ && echo "Installing dependencies:" \ && pip install -r requirements.txt \ && echo "All installed Python packages:" \ && pip freeze
# Next, copy the remaining files and directories with the source code.# Since we do this after installing the dependencies, quick build will be really fast# for most source file changes.COPY . ./
# Use compileall to ensure the runnability of the Actor Python code.RUN python3 -m compileall -q .
# Specify how to launch the source code of your Actor.# By default, the "python3 -m src" command is runCMD ["python3", "-m", "src"]
.actor/actor.json
{ "actorSpecification": 1, "name": "Fanfix.com scraper", "title": "Fanfix.com scraper", "description": "Scrape data from models profiles", "version": "0.1", "meta": { "templateId": "python-start" }, "input": "./input_schema.json", "dockerfile": "./Dockerfile"}
.actor/input_schema.json
{ "title": "Scrape data from a web page", "type": "object", "schemaVersion": 1, "properties": { "url": { "title": "URL of the user profile", "type": "array", "description": "The URL of user profile you want to get the data from.", "editor": "requestListSources", "prefill": [ { "url": "https://app.fanfix.io/@kaylavoid" } ]
} }, "required": ["url"]}
src/__main__.py
1"""2This module serves as the entry point for executing the Apify Actor. It handles the configuration of logging3settings. The `main()` coroutine is then executed using `asyncio.run()`.4
5Feel free to modify this file to suit your specific needs.6"""7
8import asyncio9import logging10
11from apify.log import ActorLogFormatter12
13from .main import main14
15# Configure loggers16handler = logging.StreamHandler()17handler.setFormatter(ActorLogFormatter())18
19apify_client_logger = logging.getLogger('apify_client')20apify_client_logger.setLevel(logging.INFO)21apify_client_logger.addHandler(handler)22
23apify_logger = logging.getLogger('apify')24apify_logger.setLevel(logging.DEBUG)25apify_logger.addHandler(handler)26
27# Execute the Actor main coroutine28asyncio.run(main())
src/main.py
1# Beautiful Soup - library for pulling data out of HTML and XML files, read more at2# https://www.crummy.com/software/BeautifulSoup/bs4/doc3from bs4 import BeautifulSoup4
5# HTTPX - library for making asynchronous HTTP requests in Python, read more at https://www.python-httpx.org/6from httpx import AsyncClient7
8# Apify SDK - toolkit for building Apify Actors, read more at https://docs.apify.com/sdk/python9from apify import Actor10
11
12async def main() -> None:13
14 async with Actor:15 actor_input = await Actor.get_input() or {}16 urls = actor_input.get('url')17
18 for url in urls:19 async with AsyncClient( verify=False) as client:20 response = await client.get(url['url'], follow_redirects=True)21
22 soup = BeautifulSoup(response.content, 'html.parser')23
24 headings = []25 full_name = soup.select('div[data-testid="creator-fullname-stack-ds"]')[0].text26 bio = soup.select('div[data-testid="profile-header-card-content-bio-ds"]')[0].text27 profile_image = soup.select('img[alt="Profile Picture"]')[0]['src']28 model_object = {'full_name': full_name, 'bio': bio, 'profile_image': profile_image}29 Actor.log.info(f'scrapped: {full_name}')30
31 await Actor.push_data(model_object)
.dockerignore
# configurations.idea
# crawlee and apify storage foldersapify_storagecrawlee_storagestorage
# installed files.venv
# git folder.git
.editorconfig
root = true
[*]indent_style = spaceindent_size = 4charset = utf-8trim_trailing_whitespace = trueinsert_final_newline = trueend_of_line = lf
.gitignore
# This file tells Git which files shouldn't be added to source control
.idea.DS_Store
apify_storagestorage/*!storage/key_value_storesstorage/key_value_stores/*!storage/key_value_stores/defaultstorage/key_value_stores/default/*!storage/key_value_stores/default/INPUT.json
.venv/.env/__pypackages__dist/build/*.egg-info/*.egg
__pycache__
.mypy_cache.dmypy.jsondmypy.json.pytest_cache.ruff_cache
.scrapy*.log
requirements.txt
1# Feel free to add your Python dependencies below. For formatting guidelines, see:2# https://pip.pypa.io/en/latest/reference/requirements-file-format/3
4apify ~= 1.6.05beautifulsoup4 ~= 4.12.26httpx ~= 0.25.27types-beautifulsoup4 ~= 4.12.0.7