x-scraper

Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

x-scraper

Under maintenance

Try for free

scrape metadata from an X post (title, image, description). Sometimes this fails... X is pretty restrictive on IP addresses hitting their servers.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Arron Taylor

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

5 months ago

Last modified

.actor/Dockerfile

# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node-puppeteer-chrome:20

# Check preinstalled packages
RUN npm ls crawlee apify puppeteer playwright

# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY --chown=myuser package*.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version \
    && rm -r ~/.npm

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY --chown=myuser . ./

# Run the image. If you know you won't need headful browsers,
# you can remove the XVFB start script for a micro perf gain.
CMD ./start_xvfb_and_run_cmd.sh && npm start --silent

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "my-actor",
    "title": "Project Puppeteer Crawler JavaScript",
    "description": "Crawlee and Puppeteer project in JavaScript.",
    "version": "0.0",
    "meta": {
        "templateId": "js-crawlee-puppeteer-chrome"
    },
    "input": "./input_schema.json",
    "dockerfile": "./Dockerfile"
}

.actor/input_schema.json

{
    "title": "PuppeteerCrawler Template",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "startUrls": {
            "title": "Start URLs",
            "type": "array",
            "description": "URLs to start with.",
            "editor": "requestListSources",
            "prefill": [
                {
                    "url": "https://apify.com"
                }
            ]
        }
    }
}

src/main.js

1import { Actor } from 'apify';
2import { PuppeteerCrawler, Dataset, RequestQueue } from 'crawlee';
3
4await Actor.init();
5
6const {
7    startUrls = [{ url: 'https://x.com/baconbrix/status/1910752770593816703?s=12' }],
8    proxyConfig = null,
9} = await Actor.getInput() ?? {};
10
11if (!startUrls || startUrls.length !== 1) {
12    throw new Error('startUrls must be an array with exactly one URL');
13}
14
15const proxyConfiguration = proxyConfig
16    ? await Actor.createProxyConfiguration(proxyConfig)
17    : await Actor.createProxyConfiguration();
18
19const requestQueue = await RequestQueue.open();
20
21const crawler = new PuppeteerCrawler({
22    proxyConfiguration,
23    requestQueue,
24    maxRequestsPerCrawl: 1,
25
26    launchContext: {
27        useChrome: true,
28        launchOptions: {
29            headless: true,
30        },
31    },
32
33    requestHandler: async ({ page, request, log }) => {
34        log.info(`Scraping X post: ${request.url}`);
35
36        // Wait for tweet content to load with retry
37        const maxRetries = 3;
38        let retries = 0;
39        let articleFound = false;
40
41        while (retries < maxRetries && !articleFound) {
42            try {
43                await page.waitForSelector('article', { timeout: 20000 });
44                articleFound = true;
45            } catch (e) {
46                retries++;
47                if (retries < maxRetries) {
48                    log.warning(`Article not found, retrying (${retries}/${maxRetries})...`);
49                    await page.reload({ waitUntil: 'networkidle2' });
50                } else {
51                    throw new Error(`Article not found after ${maxRetries} attempts on ${request.url}`);
52                }
53            }
54        }
55
56        const data = await page.evaluate(() => {
57            const result = {
58                url: window.location.href,
59                title: '',
60                image: '',
61                ogTitle: '',
62                ogDescription: '',
63                ogImage: '',
64                description: '',
65                rawImages: [],
66            };
67
68            // Grab main tweet content
69            const article = document.querySelector('article');
70            if (article) {
71                const textElement = article.querySelector('div[lang]');
72                if (textElement) result.title = textElement.innerText.trim();
73
74                const textElements = article.querySelectorAll('div[lang]');
75                const allText = Array.from(textElements)
76                    .map(el => el.innerText.trim())
77                    .filter(Boolean)
78                    .join('\n');
79
80                result.content = allText;
81
82                const imgTags = article.querySelectorAll('img');
83                const imgUrls = Array.from(imgTags)
84                    .map(img => img.src)
85                    .filter(src => !src.includes('profile_images') && !src.includes('emoji'));
86                if (imgUrls.length > 0) {
87                    result.image = imgUrls[0];
88                    result.rawImages = imgUrls;
89                }
90            }
91
92            // Fallback metadata
93            const getMeta = (name) =>
94                document.querySelector(`meta[property="${name}"]`)?.content ||
95                document.querySelector(`meta[name="${name}"]`)?.content || '';
96
97            result.ogTitle = getMeta('og:title');
98            result.ogDescription = getMeta('og:description');
99            result.ogImage = getMeta('og:image');
100            result.description = getMeta('description');
101
102            return result;
103        });
104
105        await Dataset.pushData(data);
106    },
107});
108
109await requestQueue.addRequest({ url: startUrls[0].url });
110await crawler.run();
111await Actor.exit();

src/routes.js

1import { Dataset, createPuppeteerRouter } from 'crawlee';
2
3export const router = createPuppeteerRouter();
4
5router.addDefaultHandler(async ({ enqueueLinks, log }) => {
6    log.info(`enqueueing new URLs`);
7    await enqueueLinks({
8        globs: ['https://apify.com/*'],
9        label: 'detail',
10    });
11});
12
13router.addHandler('detail', async ({ request, page, log }) => {
14    const title = await page.title();
15    log.info(`${title}`, { url: request.loadedUrl });
16
17    await Dataset.pushData({
18        url: request.loadedUrl,
19        title,
20    });
21});

.dockerignore

# configurations
.idea

# crawlee and apify storage folders
apify_storage
crawlee_storage
storage

# installed files
node_modules

# git folder
.git

.editorconfig

root = true

[*]
indent_style = space
indent_size = 4
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf

.eslintrc

{
    "extends": "@apify",
    "root": true
}

.gitignore

# This file tells Git which files shouldn't be added to source control

.DS_Store
.idea
.zed
dist
node_modules
apify_storage
storage

package.json

{
    "name": "crawlee-puppeteer-javascript",
    "version": "0.0.1",
    "type": "module",
    "description": "This is an example of an Apify actor.",
    "dependencies": {
        "apify": "^3.2.6",
        "crawlee": "^3.11.5",
        "puppeteer": "*"
    },
    "devDependencies": {
        "@apify/eslint-config": "^0.4.0",
        "eslint": "^8.50.0"
    },
    "scripts": {
        "start": "node src/main.js",
        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1"
    },
    "author": "It's not you it's me",
    "license": "ISC"
}

X (Twitter) Posts Search

scraper_one/x-posts-search

✨ Search for X (formerly Twitter) posts using keywords or hashtags. Retrieve post URLs, content, publication dates, author details, and engagement metrics. Ideal for monitoring brand mentions on X over time.

Scraper One

395

Twitter (X) Scraper post (timeline / # / search / replies)

scraping_solutions/twitter-x-scraper-post-timeline-search-replies

🚀 X (Twitter) Scraper – Collect posts, timelines, hashtags, searches & replies with ease. Extract text, links & engagement, capture threads, and export to CSV/JSON. Perfect for market research, sentiment analysis & trend tracking.

Scraping Solutions

X Twitter

canadesk/x-twitter

Collect Tweets and Usernames from X.com (Twitter). It's fast and costs little!

Canadesk Support

1.2K

1.0

Twitter X-Plorer 🦜

jupri/twitter-scraper

💫 All-in-one Twitter X.com Scraper 🐦🐦‍⬛🐤

cat

711

5.0

X (Twitter) Profile Posts Scraper

scraper_one/x-profile-posts-scraper

Extract posts published by specified X (Twitter) profiles. Retrieve URLs, IDs, content, publication dates, text and engagement metrics. Ideal for social media monitoring solutions.

Scraper One

504

X (Twitter) Advanced Search Post Scraper 𝕏

api-ninja/x-twitter-advanced-search

Advanced X (Twitter) post search with 50+ filters: content, users, geo, time, engagement, media. Dual modes: simple queries or structured filters. Enterprise-grade reliability for precise data extraction

API ninja

135

4.1

X (Twitter) Post Replies Scraper

scraper_one/x-post-replies-scraper

Extract replies/comments from X posts (tweets) provided as input URLs. Retrieve comment text, author, timestamp, reactions, and more. Ideal for social media monitoring, sentiment analysis, and engagement tracking. 🚀

Scraper One

637

5.0

Twitter Posts Scraper

pratikdani/twitter-posts-scraper

The **Twitter Posts Scraper** extracts detailed information from any public Twitter post. Simply provide the URL, and the scraper will gather data like post text, user info, engagement metrics, and more. Perfect for analyzing trends and tracking social media interactions.

Pratik Dani

489

1.0

Post (Tweet) Scrapper - X.com / Twitter (Pay per result)

builditn0w/x-twitter-scrapper

💰 $0.35 per 1000 tweets / custom search query

Lars

158

Twitter Jobs Fast&Cheapest Scraper

fastcrawler/twitter-jobs-fast-cheapest-scraper

1000 results only cost $0.02. Are you tired of spending hours manually scraping job postings from Twitter? The Twitter Jobs Fast & Cheapest Scraper is here to revolutionize how you gather Twitter job data. Whether you’re a recruiter, a data analyst, or someone searching for the perfect job leads