Pricing

from $0.10 / 1,000 results

Privacy Stack

Privacy researcher & developer building production Apify actors for arXiv privacy research. Privacy Stack brings 1 5,00+ real arXiv privacy papers into one place ..carefully verified with no fake URLs & no duplicates. Categories : Internet Privacy Data Privacy Crypto Privacy Post-Quantum Privacy

Pricing

from $0.10 / 1,000 results

Rating

0.0

(0)

Developer

Bikram Biswas

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Privacy Stack 🔐 – 5000 Real arXiv Privacy Papers for Researchers & Builders

Ultimate Privacy Research Scraper – Converts peer‑reviewed privacy & anonymity papers into clean, reproducible, auditable datasets ready for analysis, dashboards, and code.

Live Actor: https://console.apify.com/oblate_wildcat/privacy-stack
GitHub: https://github.com/BikramBiswas786/privacy-stack

💡 What Is Privacy Stack?

Privacy Stack is an Apify Actor that builds a large‑scale, high‑quality research corpus of real arXiv papers in security & privacy.

It scrapes and normalizes 5000 unique arXiv cs.CR papers across 4 critical categories, making it easy to explore, filter, and build on top of the latest privacy research without touching the arXiv UI. [web:20][file:13]

You get:

A clean JSON/CSV dataset you can drop into analysis pipelines
Strong deduplication guarantees
A stable schema designed for LLMs, dashboards, and downstream tools

📊 Categories (4 × 1250 Papers)

Each run targets exactly 1250 papers per category, for a total of 5000 unique cs.CR papers:

🌐 Internet Privacy
- Tor, mix networks, I2P, VPNs, onion routing
- Traffic analysis attacks & defenses
- Website fingerprinting, metadata‑hiding systems
🔐 Crypto Privacy
- Zero‑knowledge proofs (zk‑SNARKs, zk‑STARKs)
- FHE, MPC, Bulletproofs, Pedersen commitments
- Privacy coins (Zcash, Monero), mixer protocols, CoinJoin
📊 Data Privacy
- Differential privacy (local & global)
- Federated learning, secure aggregation
- Synthetic data, re‑identification resistance, anonymization
⚛️ Post‑Quantum / PQ Security
- Kyber, Dilithium, SPHINCS+, Falcon
- Lattice‑based crypto, hash‑based signatures
- PQ‑safe anonymous communication & key exchange

Each paper is tagged with a primary category plus the full arXiv category string, so you can slice the dataset however you want. [file:13]

🚀 Key Features

✅ 100% real arXiv papers
Directly scraped from arxiv.org cs.CR – no synthetic titles, no hallucinations, no fake IDs.
✅ 5000 UNIQUE papers
Global deduplication by arXiv ID, plus per‑category deduplication so the same paper is never counted twice within a category.
✅ Balanced categories
1250 papers for each of the 4 categories → balanced training/test sets for ML and fair comparisons between research areas.
✅ Production‑grade dataset schema
Designed for:
- LLM context building
- dashboards (Grafana/Metabase/Superset)
- offline analytics (Python/pandas, DuckDB, BigQuery)
✅ Zero manual setup on Apify
No requirements.txt needed – runs on Apify’s managed Python runtime.
✅ Repeatable & auditable
Same input → same structure, easy to diff across runs as new papers appear on arXiv.

🧱 Dataset Schema

Each paper in the dataset has a consistent JSON structure:

{
  "id": 1,
  "title": "Device-Independent Anonymous Communication",
  "arxiv_id": "2512.21047",
  "full_category": "cs.CR (Internet Privacy)",
  "short_category": "internet_privacy",
  "authors": ["John Doe", "Jane Smith"],
  "url": "https://arxiv.org/abs/2512.21047",
  "pdf_url": "https://arxiv.org/pdf/2512.21047.pdf",
  "is_real_arxiv": true,
  "published": "2025-12-21",
  "updated": "2025-12-23",
  "abstract": "We propose a device-independent protocol for anonymous communication...",
  "source_run_id": "RUN_ID_FOR_AUDIT"
}

📊 Apify Console Output Tabs

When you run Privacy Stack in Apify Console, the Output tab is split into multiple views (using dataset schema): [file:15]

📚 All Papers (5000) – full corpus merged
🌐 Internet Privacy (1250) – Tor, mixnets, I2P, traffic analysis
🔐 Crypto Privacy (1250) – ZK, FHE, MPC, crypto protocols
📊 Data Privacy (1250) – DP, FL, anonymization, re‑identification
⚛️ Post‑Quantum (1250) – Kyber, Dilithium, PQ anonymous systems
📋 Live Logs – scrape progress, dedup stats, category counts

Each view is sortable & filterable directly in the Apify Console, and also accessible as CSV/JSON via API.

📥 Sample Output Snippet

{
  "id": 42,
  "title": "Traffic Analysis Resistant Mix Networks for the Modern Internet",
  "arxiv_id": "2507.12345",
  "full_category": "cs.CR (Cryptography and Security)",
  "short_category": "internet_privacy",
  "authors": ["Alice Anon", "Bob Mixnet"],
  "url": "https://arxiv.org/abs/2507.12345",
  "pdf_url": "https://arxiv.org/pdf/2507.12345.pdf",
  "is_real_arxiv": true
}

⚙️ How It Works (High‑Level)

Input: categories + maximum papers per category (defaults to 1250 × 4).
Fetch arXiv feeds / search results for each category (cs.CR + keywords / sub-tags).
Normalize results into the unified schema:
- parse title, authors, IDs, URLs, dates, category strings
Deduplicate:
- global deduplication by arxiv_id
- ensure each category’s slice has only unique entries
Store into Apify Dataset with multiple views (all + per‑category).

The Actor is designed to be idempotent in terms of structure, but you will naturally see newer papers when you re‑run it over time.

🚀 Quick Start

1. Run from Apify Console

Open: Privacy Stack Actor
https://console.apify.com/oblate_wildcat/privacy-stack
Set input (optional):
- maxPapersPerCategory: default 1250
- category toggles (if you want only 1–2 categories)
Click Run
When it finishes, open the Output tab:
- Browse All Papers
- Or switch to specific category views
Export as:
- JSON (items?clean=true)
- CSV (items?format=csv)
- HTML table (for quick browsing)

2. Run via Apify CLI

$apify run privacy-stack-research-scraper

This will:

run the Actor locally
store dataset in ./storage/datasets/default/
you can then inspect OUTPUT.json or CSV in that folder.

🧪 Example: Using the Dataset in Python

import requests
import pandas as pd

DATASET_URL = "https://api.apify.com/v2/datasets/<DATASET_ID>/items?clean=true"

res = requests.get(DATASET_URL)
res.raise_for_status()
items = res.json()

df = pd.DataFrame(items)

# Example: show recent ZK papers
zk_df = df[df['title'].str.contains("zero-knowledge", case=False, na=False)]
print(zk_df[['title', 'arxiv_id', 'url']].head())

# Example: count papers per short_category
print(df['short_category'].value_counts())

🧠 Typical Use Cases

Literature review for PhD / MSc / paper writing
Quickly get 5000+ relevant cs.CR papers organized by topical area.
Benchmark building
Curate evaluation sets for LLMs, anonymization tools, or privacy frameworks.
Trend analysis
See how research volume changes over time in areas like ZK proofs or post‑quantum crypto.
Dataset for downstream models
Use title + abstract as input for topic modeling, embeddings, or semantic search.
Meta‑research
Study the evolution of anonymity, privacy‑preserving ML, and PQ crypto.

🔐 Design Principles

Real papers only – every record must correspond to a real arXiv entry.
Transparent scraping – URLs always point back to arxiv.org.
No guessing / hallucinating metadata – if arXiv does not provide it, it is not faked.
Reproducibility – input + time window → deterministically shaped dataset schema.

📦 Actor Input (Suggested Schema)

Typical input fields (simplified):

{
  "maxPapersPerCategory": 1250,
  "includeInternetPrivacy": true,
  "includeCryptoPrivacy": true,
  "includeDataPrivacy": true,
  "includePostQuantum": true
}

You can extend this in future (e.g., year range, specific arXiv query strings, exclusion filters).

🧑‍💻 About the Author

Bikram Biswas (@BikramBiswas786)

Quantum & privacy tooling developer
Creator of Anon Lab (interactive privacy paper explorer)
Active on Apify building research‑grade Actors for security, privacy, and data aggregation.

Apify profile: https://apify.com/bikrambiswas

📄 Citation

If Privacy Stack helps in your work, you can cite it as:

@software{biswas2025privacystack,
  author = {Biswas, Bikram},
  title = {Privacy Stack: 5000 Real arXiv Privacy Papers for Researchers},
  year = {2025},
  url = {https://apify.com/oblate_wildcat/privacy-stack}
}

📝 License & Ethics

Use this dataset responsibly.
All papers belong to their respective authors and arXiv.
This Actor only organizes metadata and links; it does not strip or redistribute paywalled content.

Privacy Stack turns scattered security & privacy literature into a single, structured research surface you can actually build on.
Run it, export it, and plug it straight into your research pipeline.

Privacy Compliance Analyzer V.1

actor_researcher.48/privacy-compliance-analyzer-v-1

Scan websites for privacy compliance issues. Detects trackers, checks GDPR/CCPA rights, finds privacy policies, and generates DSAR templates. Get actionable compliance scores and recommendations in an easy-to-read HTML report. Perfect for privacy audits and regulatory assessments.

ANIRBAN ROY

Tor Project Scraper Api

fresh_cliff/tor-project-scraper-api

Extract Tor Project data, privacy tools, security research, onion services. Real-time privacy intelligence with mirror fallbacks. Export to JSON/CSV. Monitor Tor updates, track security tools, analyze privacy networks. Fast requests + BeautifulSoup scraper. Tor Project Scraper API - Extract Privacy.

Brennan Crawford

Scrape Privacy Policy — Data, Details & Metadata

tropical_quince/privacy-policy-scraper

Scrape privacy policy data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

Scan Privacy Compliance — Data, Details & Metadata

tropical_quince/privacy-compliance-scanner

Scan privacy compliance data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

Mixnet-paper-scraper

bikrambiswas/mixnet-paper-scraper

Scrapes academic papers on Mixnet, Nym, and privacy technology from arXiv and verified research sources. Filters by keyword and year. Returns title, authors, abstract, publication year, and PDF links. Perfect for privacy researchers and developers. Uses arXiv API with fallback Nym papers.

Bikram Biswas

5.0

The Privacy Policy Generator

anointment/the-privacy-policy-generator

Save $500 on legal fees. Generates a professional, GDPR & CCPA compliant Privacy Policy for your website or app in seconds. Returns clean Markdown/HTML ready to copy-paste.

Anointment

Privacy Scraper

ultramarine_freezer/TELE123

Descarga imágenes y videos de perfiles de Privacy.com.br con opción de descarga masiva en ZIP

JIGSAW

188

2.9

Google Play Store

canadesk/google-play-store-ppe

Extract app details, reviews, search results, and privacy information from the Google Play.

Canadesk Support

Tiktok Poster

alizarin_refrigerator-owner/tiktok-poster

Automate posting videos to TikTok! Upload videos with captions, hashtags, and privacy settings. Features Video Upload - Post videos from URL Captions & Hashtags - Add descriptions and tags Privacy Controls - Set visibility, comments, duet, stitch Auto-download - Automatically fetches video from URL