PHAPI - Extract video search results from Pornhub avatar

PHAPI - Extract video search results from Pornhub

Try for free

Pay $5.00 for 1,000 results

View all Actors
PHAPI - Extract video search results from Pornhub

PHAPI - Extract video search results from Pornhub

plowdata/phapi
Try for free

Pay $5.00 for 1,000 results

A scraper for the popular adult entertainment platform Pornhub. Supports search queries and any user-configurable filters. Download your data as HTML table, JSON, CSV, XML, Excel, RSS, or JSONL.

PHAPI

The well PornHub API, or phapi for short, is a web-scraper for the popular adult entertainment website PornHub.
It has built-in support for scraping any search, bypassing most of the platform's anti-scraping measures.
While it's configurable, don't disable the proxy, as it seems that apify's datacenters are located in Virgina, where PH's services are completely banned, currently. Apart from this, there's no need to use any of the specialized proxies, apify's default datacenter proxies work well, with the scraper handling rotations, etc. automatically (e.g. if any of the URLs are blocked, it will automatically switch to a new one).

Usage

The only option the scraper needs is the search query, all others, e.g. sorting or the number of pages are optional and their defaults are explained in the input configuration. (Note: the default values mirror PH's defaults)
Queries are automatically escaped, in the same way that PH does it, so you can pass in any query, and it will be correctly escaped.

Output

The output is a dataset with the following schema (using zod):

1const UserInformationShortSchema = z.object({
2    type: z.string(),
3    slug: z.string(),
4    name: z.string(),
5    profilePicture: z.string().optional(),
6    isContentPartner: z.boolean(),
7    isVerified: z.boolean(),
8    isPremium: z.boolean(),
9    isAwardsWinner: z.boolean(),
10})
11
12const ResultCountSchema = z.object({
13    from: z.number(),
14    to: z.number(),
15    total: z.number(),
16});
17
18const SearchCategorySchema = z.object({
19    title: z.string(),
20    slug: z.string(),
21    id: z.number(),
22    count: z.number(), // Number of videos in this search that fall under this category
23    video_count: z.number(), // Total number of videos in this category
24});
25
26const VideoPreviewSchema = z.object({
27    videoId: z.string(), // Not sure if this is unique
28    segment: z.string(),
29    viewKey: z.string(), // The id used to view the video, e.g. .../view_video.php?viewkey=...
30    title: z.string().optional(),
31    thumbnail: z.string().optional(),
32    duration: z.object({
33        hours: z.number(),
34        minutes: z.number(),
35        seconds: z.number(),
36    }),
37    uploader: UserInformationShortSchema,
38    viewCount: z.number(),
39    rating: z.number(),
40});
41
42const SearchOutputSchema = z.object({
43    page: z.number(),
44    video: VideoPreviewSchema,
45    resultCount: ResultCountSchema,
46    correctionSuggestion: z.string().optional(),
47    categories: z.array(SearchCategorySchema),
48});

Most of the properties should be self-explanatory, but here's a quick rundown:
The result count is the number of results on a page (and the total), the correction suggestion (if not an empty string, or missing) is the suggestion that PH gives if the search query is misspelled, etc. The categories are the categories are the categories shown on the left side, of the search-page, so all categories filtered by the search query, as well as the number of elements of the search that fall under this category.
The videos are the videos on the page, with the uploader being a short version of the uploader's information, containing the type (e.g. model, pornstar, user or channel), the name, the slug, and the profile picture (if available).
If the uploader is a model or pornstar, they can have additional information, such as whether they are verified, premium, or an awards winner, the content-partner status is only applicable to channels (at least it seems like it is - if this changes from PH's side, this will be passed through as well).

What are results and what does the pricing mean?

Each result corresponds to a single video - since there's multiple videos per page, the other properties of the result are the same for all videos on the page.
Each page usually contains between 32 and 44 videos (first and all subsequent pages, the last one obviously can have less), so if you set the number of pages to 5, you should get around ~209 videos.\

Example - correction suggestion

So if you want to build something that takes in a user's query, then corrects it and then scrapes the corrected query, you can do so by setting the number of pages to 1, and then checking if the correction suggestion is not an empty string, and then using that as the new query.

Developer
Maintained by Community
Actor metrics
  • 9 monthly users
  • 1 star
  • 100.0% runs succeeded
  • Created in Aug 2024
  • Modified about 2 months ago