Mixnet-paper-scraper avatar
Mixnet-paper-scraper

Pricing

from $0.10 / 1,000 results

Go to Apify Store
Mixnet-paper-scraper

Mixnet-paper-scraper

Scrapes academic papers on Mixnet, Nym, and privacy technology from arXiv and verified research sources. Filters by keyword and year. Returns title, authors, abstract, publication year, and PDF links. Perfect for privacy researchers and developers. Uses arXiv API with fallback Nym papers.

Pricing

from $0.10 / 1,000 results

Rating

5.0

(1)

Developer

Bikram Biswas

Bikram Biswas

Maintained by Community

Actor stats

0

Bookmarked

9

Total users

1

Monthly active users

18 days ago

Last modified

Share

Apify Actor $1M Challenge Participant

Crafted by Bikram Biswas | Private Actor | Created: December 2025 | Version: 0.0.13

This Actor scrapes academic papers on Mixnet, Nym, and privacy technologies from arXiv (via official API) and verified sources. It filters by keyword/year, enriches with AI insights, and outputs structured data like titles, authors, abstracts, PDFs, and relevance scores. Ideal for researchers building anonymity tools or analyzing Nym ecosystems.

Key Features

  • API-Driven Sourcing: Fetches from arXiv cs.CR category with pagination for fresh results.
  • Curated Fallback: Includes ~70 verified Nym/Mixnet papers to ensure comprehensiveness.
  • Filtering & Customization: By keyword, year, and max results; optional AI enhancements.
  • Output: JSON in Apify Dataset, with schema for tabular Console views.
  • Compliance: Ethical API use; no scraping violations.
  • Performance: Handles 200+ results efficiently (<10s runs).
<h2>How It Works</h2>
<ol>
<li><strong>Input Parsing</strong>: Reads keyword, year, etc., from JSON input.</li>
<li><strong>arXiv Query</strong>: Searches with pagination, parses XML for metadata.</li>
<li><strong>Integration</strong>: Merges with verified papers, dedupes by URL.</li>
<li><strong>Processing</strong>: Filters, sorts by relevance, adds AI gists if enabled.</li>
<li><strong>Output</strong>: Pushes to Dataset for easy access/export.</li>
</ol>
<p><em>Note</em>: AI features use simple logic (e.g., string matching); no external LLM calls for speed.</p>
<h2>Input Configuration</h2>
<p>Configure via Apify Console or API. Inputs are JSON-based with defaults. Below is a table of fields:</p>
<table>
<thead>
<tr>
<th>Field</th>
<th>Type</th>
<th>Required</th>
<th>Default</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td>keyword</td>
<td>string</td>
<td>No</td>
<td>"mixnet"</td>
<td>Search term (lowercase, phrases OK). Filters titles/abstracts.</td>
<td>"nym privacy"</td>
</tr>
<tr>
<td>year</td>
<td>string</td>
<td>No</td>
<td>"" (all years)</td>
<td>Exact 4-digit year filter.</td>
<td>"2024"</td>
</tr>
<tr>
<td>max_results</td>
<td>integer</td>
<td>No</td>
<td>150</td>
<td>Max papers (1-200).</td>
<td>50</td>
</tr>
<tr>
<td>include_summary</td>
<td>boolean</td>
<td>No</td>
<td>true</td>
<td>Enable AI gists/relevance scores.</td>
<td>false</td>
</tr>
<tr>
<td>proxyConfiguration</td>
<td>object</td>
<td>No</td>
<td>{ "useApifyProxy": false }</td>
<td>Proxy settings for API calls.</td>
<td>{ "useApifyProxy": true }</td>
</tr>
</tbody>
</table>
<p><strong>Full Input Schema</strong> (for advanced users):</p>
<pre><code>{
"title": "Mixnet-Paper-Scraper Input",
"type": "object",
"schemaVersion": 1,
"properties": { ... } // As per previous audit
}</code></pre>
<h2>Usage Examples</h2>
<h3>Console Run</h3>
<p>In Apify Console: Go to <em>Input</em> tab, enter JSON, and run.</p>
<pre><code>{
"keyword": "post-quantum mixnet",
"year": "2025",
"max_results": 20,
"include_summary": true
}</code></pre>
<h3>API Call</h3>
<p>Use cURL or SDK to run programmatically:</p>
<pre><code>curl -X POST \
'https://api.apify.com/v2/acts/bikrambiswas~mixnet-paper-scraper/runs?token=YOUR_APIFY_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"keyword": "nym anonymity",
"max_results": 100
}'</code></pre>
<h3>Integration with Zapier/Make</h3>
<p>Connect output Dataset to Google Sheets: Trigger on run finish, map fields like title/PDF.</p>
<h3>Local Development</h3>
<p>Pull with Apify CLI:</p>
<pre><code>apify pull bikrambiswas/mixnet-paper-scraper</code></pre>
<p>Edit in VS Code, push updates.</p>
<h2>Output Example</h2>
<p>Results in Dataset tab (tabular view via schema). Sample item:</p>
<pre><code>{
"title": "Outfox: a Postquantum Packet Format for Layered Mixnets",
"authors": "Alfredo Rial, Ania M. Piotrowska, Harry Halpin",
"abstract": "Post-quantum secure packet format...",
"gist": "AI insight: Enhances mixnet security against quantum attacks.",
"arxiv_id": "2412.19937",
"pdf_url": "https://arxiv.org/pdf/2412.19937.pdf",
"link": "https://arxiv.org/abs/2412.19937",
"year": "2025",
"source": "arXiv",
"keywords_matched": ["mixnet", "post-quantum"],
"research_area": "Cryptography",
"relevance_score": 0.99,
"scraped_at": "2025-12-29T00:00:00Z"
}</code></pre>
<h2>Output Schema</h2>
<p>Defined in <code>.actor/output_schema.json</code> for organized Console tables. Add this to your Actor source:</p>
<pre><code>{
"actorOutputSchemaVersion": 1,
"title": "Mixnet-Paper-Scraper Output",
"description": "Array of paper objects with metadata and insights. Displays as table in Apify Console.",
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {
"type": "string",
"title": "Title",
"description": "Paper's full title."
},
"authors": {
"type": "string",
"title": "Authors",
"description": "Comma-separated author names."
},
"abstract": {
"type": "string",
"title": "Abstract",
"description": "Truncated summary (300 chars)."
},
"gist": {
"type": "string",
"title": "AI Gist",
"description": "Concise AI-generated insight (if enabled)."
},
"arxiv_id": {
"type": "string",
"title": "arXiv ID",
"description": "Unique arXiv identifier."
},
"pdf_url": {
"type": "string",
"title": "PDF URL",
"description": "Direct PDF download link."
},
"link": {
"type": "string",
"title": "Abstract Link",
"description": "URL to abstract page."
},
"year": {
"type": "string",
"title": "Year",
"description": "Publication year."
},
"source": {
"type": "string",
"title": "Source",
"description": "e.g., 'arXiv' or 'Nym Verified'."
},
"keywords_matched": {
"type": "array",
"title": "Matched Keywords",
"description": "Input keywords found in paper."
},
"research_area": {
"type": "string",
"title": "Research Area",
"description": "Categorized field (e.g., 'Anonymity')."
},
"relevance_score": {
"type": "number",
"title": "Relevance Score",
"description": "0-1 score for keyword match."
},
"scraped_at": {
"type": "string",
"title": "Scraped At",
"description": "ISO UTC timestamp."
}
},
"required": ["title", "authors", "year", "pdf_url"]
}
}</code></pre>
<p><strong>Why This Schema?</strong>: It ensures clear field names/types, enabling tabular displays. All Actors benefit from this for better UX—raw data becomes interpretable tables.</p>
<h2>Troubleshooting</h2>
<ul>
<li><strong>No Results</strong>: Check keyword/year; broaden if needed.</li>
<li><strong>API Errors</strong>: Enable proxy for rate limits.</li>
<li><strong>Issues</strong>: 0 open; report in Apify Monitoring.</li>
</ul>
<h2>Resources & Community</h2>
<ul>
<li><a href="https://arxiv.org/help/api/user-manual">arXiv API Docs</a></li>
<li><a href="https://nymtech.net">Nym Network</a></li>
<li><a href="https://docs.apify.com">Apify Docs</a></li>
<li>Join: Reddit r/privacy, Discord Nym channels.</li>
</ul>
<p><em>Promote</em>: Share in privacy forums for growth!</p>
# Mixnet Paper Scraper

A robust Apify Actor to scrape research papers from arXiv related to mix networks and anonymous communication systems. This scraper collects paper metadata, PDF links, and provides basic relevance scores and gists. Designed to fetch up to 400 papers using focused keywords.


🚀 Features

  • Fetches papers from arXiv API.
  • Keywords used for search:
    • mixnet
    • mix network
    • nym mixnet
  • Metadata collected per paper:
    • Title
    • Authors (up to 5)
    • Year of publication
    • PDF URL
    • Relevance score (0.5–1.0)
    • Gist (summary of research focus)
    • Scraped timestamp
  • Automatically deduplicates papers by PDF URL.
  • Fully asynchronous using httpx and Apify SDK.
  • Hardcoded to fetch up to 400 papers, depending on availability.

📦 Installation

  1. Clone or download the repository.
  2. Ensure Python >= 3.13 is installed.
  3. Install required dependencies:
$pip install apify httpx