Fast Instagram Post Scraper 🚀 avatar

Fast Instagram Post Scraper 🚀

Try for free

No credit card required

Go to Store
Fast Instagram Post Scraper 🚀

Fast Instagram Post Scraper 🚀

instagram-scraper/fast-instagram-post-scraper
Try for free

No credit card required

Instagram Post Scraper. Post data: hashtags, comment_count, like_count, usertags, images, videos, shortcode and etc. Scrape Instagram Posts with Ease and Speed

Developer
Maintained by Community

Actor Metrics

  • 31 monthly users

  • No reviews yet

  • 1 bookmark

  • 98% runs succeeded

  • 2.3 days response time

  • Created in Feb 2025

  • Modified 13 days ago

LB

FEATURE REQUEST / View count

Open
Labsed opened this issue
24 days ago

Hey,

Is it possible to add view_count to the post item?

Best

LB

Labsed

24 days ago

And also the video duration.

IS

view_count has been added, but it looks like it's basically null, and I need to make sure that the video duration is available via the video_dash_manifest info.

LB

Labsed

17 days ago

Thanks for considering the feedback. Attaching my crawler in case it can help you with the manifest manipulation, or any other means.

1import {
2  createHttpRouter,
3  HttpCrawler,
4  HttpCrawlerOptions,
5  HttpCrawlingContext,
6  ProxyConfiguration,
7} from "crawlee";
8import jp from "jsonpath";
9import * as cheerio from "cheerio";
10import Video from "../../entities/video";
11import Channel from "../../entities/channel";
12import ChannelStats from "../../entities/channel-stats";
13import VideoStats from "../../entities/video-stats";
14import { CrawlerInterface, Platform } from "../../types";
15
16export class InstagramCrawler
17  extends HttpCrawler<any>
18  implements CrawlerInterface
19{
20  constructor(options?: HttpCrawlerOptions) {
21    const router = createHttpRouter();
22
23    router.addDefaultHandler(async ({ request, json }: HttpCrawlingContext) => {
24      const nodes: Record<string, any>[] = jp.query(
25        json,
26        "$.data.user.edge_owner_to_timeline_media.edges[?(@.node.__typename=='GraphVideo')].node",
27      );
28
29      for (const node of nodes) {
30        const video = Video.create({
31          channel: request.userData.channel,
32          videoId: node.shortcode,
33          title: node.edge_media_to_caption?.edges[0].node.text ?? "",
34          duration: extractDurationFromManifest(
35            node.dash_info.video_dash_manifest,
36          ),
37          publishedAt: new Date(node.taken_at_timestamp * 1000),
38        });
39
40        await Video.upsert(video, ["channel", "videoId"]);
41
42        await VideoStats.save({
43          video,
44          views: node.video_view_count,
45          comments: node.edge_media_to_comment.count,
46          reactions: node.edge_media_preview_like.count,
47          topReactions: [
48            { name: "Like", count: node.edge_media_preview_like.count },
49          ],
50        });
51      }
52    });
53
54    router.addHandler(
55      "stats",
56      async ({ json, request }: HttpCrawlingContext) => {
57        const data: Record<string, any> = jp.value(
58          json,
59          "$..data.xdt_shortcode_media",
60        );
61
62        await Video.update(
63          { id: request.userData.video.id },
64          {
65            title: data.edge_media_to_caption?.edges[0].node.text ?? "",
66            duration: parseInt(data.video_duration),
67            publishedAt: new Date(data.taken_at_timestamp * 1000),
68          },
69        );
70
71        await VideoStats.save({
72          video: request.userData.video,
73          views: data.video_view_count,
74          comments: data.edge_media_to_parent_comment.count,
75          reactions: data.edge_media_preview_like.count,
76          topReactions: [
77            { name: "Like", count: data.edge_media_preview_like.count },
78          ],
79        });
80      },
81    );
82
83    router.addHandler("cadd", async ({ log, json }: HttpCrawlingContext) => {
84      const data: Record<string, any> = jp.value(json, "$..data.user");
85
86      log.info("Got channel data", data);
87
88      const channel = Channel.create({
89        platform: Platform.INSTAGRAM,
90        channelId: data.id,
91        username: data.username,
92        name: data.full_name,
93      });
94
95      await Channel.upsert(channel, ["platform", "channelId"]);
96
97      log.info("Channel saved", channel);
98    });
99
100    router.addHandler(
101      "cstats",
102      async ({ request, json }: HttpCrawlingContext) => {
103        await ChannelStats.save({
104          channel: request.userData.channel,
105          followers: json.data.user.edge_followed_by.count,
106        });
107      },
108    );
109
110    let proxyConfiguration: ProxyConfiguration | undefined;
111
112    if (process.env.IG_PROXY_URL) {
113      proxyConfiguration = new ProxyConfiguration({
114        proxyUrls: [process.env.IG_PROXY_URL],
115      });
116    }
117
118    super({
119      ...options,
120      proxyConfiguration,
121      requestHandler: router,
122      preNavigationHooks: [
123        async (_: HttpCrawlingContext, gotOptions: any) => {
124          gotOptions.headers = {
125            "X-IG-App-ID": "936619743392459",
126          };
127        },
128      ],
129    });
130  }
131
132  addChannelAddRequest(url: string) {
133    return this.addRequests([{ url, label: "cadd" }]);
134  }
135
136  addChannelStatsRequests(channels: Channel[]) {
137    return this.addRequests(
138      channels.map((channel) => ({
139        url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`,
140        label: "cstats",
141        userData: { channel },
142      })),
143    );
144  }
145
146  addChannelVideoRequests(channels: Channel[]) {
147    return this.addRequests(
148      channels.map((channel) => ({
149        url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`,
150        userData: { channel },
151      })),
152    );
153  }
154
155  addVideoStatsRequests(videos: Video[]) {
156    return this.addRequests(
157      videos.map((video) => ({
158        url: "https://www.instagram.com/graphql/query",
159        method: "POST",
160        headers: { "Content-Type": "application/x-www-form-urlencoded" },
161        payload: new URLSearchParams({
162          doc_id: "8845758582119845",
163          variables: JSON.stringify({ shortcode: video.videoId }),
164        }).toString(),
165        useExtendedUniqueKey: true,
166        label: "stats",
167        userData: { video },
168      })),
169    );
170  }
171}
172
173function extractDurationFromManifest(manifestXml: string): number {
174  const match = cheerio
175    .load(manifestXml, { xmlMode: true })("MPD")
176    .attr("mediaPresentationDuration")
177    ?.match(/^PT([\d.]+)S$/);
178
179  return match?.[1] ? parseInt(match[1]) : 0;
180}
IS

Thanks a lot, I found the video_duration parameter, I grabbed the post collection and the parameter was not as comprehensive as the single page information