Pricing

from $0.10 / 1,000 results

Try for free

Go to Apify Store

Mixnet-paper-scraper

Try for free

Scrapes academic papers on Mixnet, Nym, and privacy technology from arXiv and verified research sources. Filters by keyword and year. Returns title, authors, abstract, publication year, and PDF links. Perfect for privacy researchers and developers. Uses arXiv API with fallback Nym papers.

Pricing

from $0.10 / 1,000 results

Rating

5.0

(1)

Developer

Bikram Biswas

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Key Features

API-Driven Sourcing: Fetches from arXiv cs.CR category with pagination for fresh results.
Curated Fallback: Includes ~70 verified Nym/Mixnet papers to ensure comprehensiveness.
Filtering & Customization: By keyword, year, and max results; optional AI enhancements.
Output: JSON in Apify Dataset, with schema for tabular Console views.
Compliance: Ethical API use; no scraping violations.
Performance: Handles 200+ results efficiently (<10s runs).

<h2>How It Works</h2>
        <ol>
          <li><strong>Input Parsing</strong>: Reads keyword, year, etc., from JSON input.</li>
            <li><strong>arXiv Query</strong>: Searches with pagination, parses XML for metadata.</li>
              <li><strong>Integration</strong>: Merges with verified papers, dedupes by URL.</li>
                <li><strong>Processing</strong>: Filters, sorts by relevance, adds AI gists if enabled.</li>
                  <li><strong>Output</strong>: Pushes to Dataset for easy access/export.</li>
                  </ol>

                  <p><em>Note</em>: AI features use simple logic (e.g., string matching); no external LLM calls for speed.</p>

                  <h2>Input Configuration</h2>
                  <p>Configure via Apify Console or API. Inputs are JSON-based with defaults. Below is a table of fields:</p>

                  <table>
                    <thead>
                        <tr>
                              <th>Field</th>
                                    <th>Type</th>
                                          <th>Required</th>
                                                <th>Default</th>
                                                      <th>Description</th>
                                                            <th>Example</th>
                                                                </tr>
                                                                  </thead>
                                                                    <tbody>
                                                                        <tr>
                                                                              <td>keyword</td>
                                                                                    <td>string</td>
                                                                                          <td>No</td>
                                                                                                <td>"mixnet"</td>
                                                                                                      <td>Search term (lowercase, phrases OK). Filters titles/abstracts.</td>
                                                                                                            <td>"nym privacy"</td>
                                                                                                                </tr>
                                                                                                                    <tr>
                                                                                                                          <td>year</td>
                                                                                                                                <td>string</td>
                                                                                                                                      <td>No</td>
                                                                                                                                            <td>"" (all years)</td>
                                                                                                                                                  <td>Exact 4-digit year filter.</td>
                                                                                                                                                        <td>"2024"</td>
                                                                                                                                                            </tr>
                                                                                                                                                                <tr>
                                                                                                                                                                      <td>max_results</td>
                                                                                                                                                                            <td>integer</td>
                                                                                                                                                                                  <td>No</td>
                                                                                                                                                                                        <td>150</td>
                                                                                                                                                                                              <td>Max papers (1-200).</td>
                                                                                                                                                                                                    <td>50</td>
                                                                                                                                                                                                        </tr>
                                                                                                                                                                                                            <tr>
                                                                                                                                                                                                                  <td>include_summary</td>
                                                                                                                                                                                                                        <td>boolean</td>
                                                                                                                                                                                                                              <td>No</td>
                                                                                                                                                                                                                                    <td>true</td>
                                                                                                                                                                                                                                          <td>Enable AI gists/relevance scores.</td>
                                                                                                                                                                                                                                                <td>false</td>
                                                                                                                                                                                                                                                    </tr>
                                                                                                                                                                                                                                                        <tr>
                                                                                                                                                                                                                                                              <td>proxyConfiguration</td>
                                                                                                                                                                                                                                                                    <td>object</td>
                                                                                                                                                                                                                                                                          <td>No</td>
                                                                                                                                                                                                                                                                                <td>{ "useApifyProxy": false }</td>
                                                                                                                                                                                                                                                                                      <td>Proxy settings for API calls.</td>
                                                                                                                                                                                                                                                                                            <td>{ "useApifyProxy": true }</td>
                                                                                                                                                                                                                                                                                                </tr>
                                                                                                                                                                                                                                                                                                  </tbody>
                                                                                                                                                                                                                                                                                                  </table>

                                                                                                                                                                                                                                                                                                  <p><strong>Full Input Schema</strong> (for advanced users):</p>
                                                                                                                                                                                                                                                                                                  <pre><code>{
                                                                                                                                                                                                                                                                                                    "title": "Mixnet-Paper-Scraper Input",
                                                                                                                                                                                                                                                                                                      "type": "object",
                                                                                                                                                                                                                                                                                                        "schemaVersion": 1,
                                                                                                                                                                                                                                                                                                          "properties": { ... }  // As per previous audit
                                                                                                                                                                                                                                                                                                          }</code></pre>

                                                                                                                                                                                                                                                                                                          <h2>Usage Examples</h2>
                                                                                                                                                                                                                                                                                                          <h3>Console Run</h3>
                                                                                                                                                                                                                                                                                                          <p>In Apify Console: Go to <em>Input</em> tab, enter JSON, and run.</p>
                                                                                                                                                                                                                                                                                                          <pre><code>{
                                                                                                                                                                                                                                                                                                            "keyword": "post-quantum mixnet",
                                                                                                                                                                                                                                                                                                              "year": "2025",
                                                                                                                                                                                                                                                                                                                "max_results": 20,
                                                                                                                                                                                                                                                                                                                  "include_summary": true
                                                                                                                                                                                                                                                                                                                  }</code></pre>

                                                                                                                                                                                                                                                                                                                  <h3>API Call</h3>
                                                                                                                                                                                                                                                                                                                  <p>Use cURL or SDK to run programmatically:</p>
                                                                                                                                                                                                                                                                                                                  <pre><code>curl -X POST \
                                                                                                                                                                                                                                                                                                                    'https://api.apify.com/v2/acts/bikrambiswas~mixnet-paper-scraper/runs?token=YOUR_APIFY_TOKEN' \
                                                                                                                                                                                                                                                                                                                      -H 'Content-Type: application/json' \
                                                                                                                                                                                                                                                                                                                        -d '{
                                                                                                                                                                                                                                                                                                                            "keyword": "nym anonymity",
                                                                                                                                                                                                                                                                                                                                "max_results": 100
                                                                                                                                                                                                                                                                                                                                  }'</code></pre>

                                                                                                                                                                                                                                                                                                                                  <h3>Integration with Zapier/Make</h3>
                                                                                                                                                                                                                                                                                                                                  <p>Connect output Dataset to Google Sheets: Trigger on run finish, map fields like title/PDF.</p>

                                                                                                                                                                                                                                                                                                                                  <h3>Local Development</h3>
                                                                                                                                                                                                                                                                                                                                  <p>Pull with Apify CLI:</p>
                                                                                                                                                                                                                                                                                                                                  <pre><code>apify pull bikrambiswas/mixnet-paper-scraper</code></pre>
                                                                                                                                                                                                                                                                                                                                  <p>Edit in VS Code, push updates.</p>

                                                                                                                                                                                                                                                                                                                                  <h2>Output Example</h2>
                                                                                                                                                                                                                                                                                                                                  <p>Results in Dataset tab (tabular view via schema). Sample item:</p>
                                                                                                                                                                                                                                                                                                                                  <pre><code>{
                                                                                                                                                                                                                                                                                                                                    "title": "Outfox: a Postquantum Packet Format for Layered Mixnets",
                                                                                                                                                                                                                                                                                                                                      "authors": "Alfredo Rial, Ania M. Piotrowska, Harry Halpin",
                                                                                                                                                                                                                                                                                                                                        "abstract": "Post-quantum secure packet format...",
                                                                                                                                                                                                                                                                                                                                          "gist": "AI insight: Enhances mixnet security against quantum attacks.",
                                                                                                                                                                                                                                                                                                                                            "arxiv_id": "2412.19937",
                                                                                                                                                                                                                                                                                                                                              "pdf_url": "https://arxiv.org/pdf/2412.19937.pdf",
                                                                                                                                                                                                                                                                                                                                                "link": "https://arxiv.org/abs/2412.19937",
                                                                                                                                                                                                                                                                                                                                                  "year": "2025",
                                                                                                                                                                                                                                                                                                                                                    "source": "arXiv",
                                                                                                                                                                                                                                                                                                                                                      "keywords_matched": ["mixnet", "post-quantum"],
                                                                                                                                                                                                                                                                                                                                                        "research_area": "Cryptography",
                                                                                                                                                                                                                                                                                                                                                          "relevance_score": 0.99,
                                                                                                                                                                                                                                                                                                                                                            "scraped_at": "2025-12-29T00:00:00Z"
                                                                                                                                                                                                                                                                                                                                                            }</code></pre>

                                                                                                                                                                                                                                                                                                                                                            <h2>Output Schema</h2>
                                                                                                                                                                                                                                                                                                                                                            <p>Defined in <code>.actor/output_schema.json</code> for organized Console tables. Add this to your Actor source:</p>
                                                                                                                                                                                                                                                                                                                                                            <pre><code>{
                                                                                                                                                                                                                                                                                                                                                              "actorOutputSchemaVersion": 1,
                                                                                                                                                                                                                                                                                                                                                                "title": "Mixnet-Paper-Scraper Output",
                                                                                                                                                                                                                                                                                                                                                                  "description": "Array of paper objects with metadata and insights. Displays as table in Apify Console.",
                                                                                                                                                                                                                                                                                                                                                                    "type": "array",
                                                                                                                                                                                                                                                                                                                                                                      "items": {
                                                                                                                                                                                                                                                                                                                                                                          "type": "object",
                                                                                                                                                                                                                                                                                                                                                                              "properties": {
                                                                                                                                                                                                                                                                                                                                                                                    "title": {
                                                                                                                                                                                                                                                                                                                                                                                            "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                    "title": "Title",
                                                                                                                                                                                                                                                                                                                                                                                                            "description": "Paper's full title."
                                                                                                                                                                                                                                                                                                                                                                                                                  },
                                                                                                                                                                                                                                                                                                                                                                                                                        "authors": {
                                                                                                                                                                                                                                                                                                                                                                                                                                "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                        "title": "Authors",
                                                                                                                                                                                                                                                                                                                                                                                                                                                "description": "Comma-separated author names."
                                                                                                                                                                                                                                                                                                                                                                                                                                                      },
                                                                                                                                                                                                                                                                                                                                                                                                                                                            "abstract": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "title": "Abstract",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "description": "Truncated summary (300 chars)."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "gist": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "title": "AI Gist",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "description": "Concise AI-generated insight (if enabled)."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "arxiv_id": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "title": "arXiv ID",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "description": "Unique arXiv identifier."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "pdf_url": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "title": "PDF URL",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "description": "Direct PDF download link."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "link": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "title": "Abstract Link",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "description": "URL to abstract page."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "year": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "title": "Year",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "description": "Publication year."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "source": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "title": "Source",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "description": "e.g., 'arXiv' or 'Nym Verified'."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "keywords_matched": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "type": "array",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "title": "Matched Keywords",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "description": "Input keywords found in paper."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "research_area": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "title": "Research Area",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "description": "Categorized field (e.g., 'Anonymity')."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "relevance_score": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "type": "number",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "title": "Relevance Score",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        "description": "0-1 score for keyword match."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "scraped_at": {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "type": "string",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "title": "Scraped At",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "description": "ISO UTC timestamp."
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      },
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          "required": ["title", "authors", "year", "pdf_url"]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            }
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            }</code></pre>

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <p><strong>Why This Schema?</strong>: It ensures clear field names/types, enabling tabular displays. All Actors benefit from this for better UX—raw data becomes interpretable tables.</p>

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <h2>Troubleshooting</h2>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <ul>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              <li><strong>No Results</strong>: Check keyword/year; broaden if needed.</li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <li><strong>API Errors</strong>: Enable proxy for rate limits.</li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <li><strong>Issues</strong>: 0 open; report in Apify Monitoring.</li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  </ul>

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <h2>Resources & Community</h2>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  <ul>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <li><a href="https://arxiv.org/help/api/user-manual">arXiv API Docs</a></li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      <li><a href="https://nymtech.net">Nym Network</a></li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <li><a href="https://docs.apify.com">Apify Docs</a></li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          <li>Join: Reddit r/privacy, Discord Nym channels.</li>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          </ul>

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          <p><em>Promote</em>: Share in privacy forums for growth!</p>





                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          # Mixnet Paper Scraper

A robust Apify Actor to scrape research papers from arXiv related to mix networks and anonymous communication systems. This scraper collects paper metadata, PDF links, and provides basic relevance scores and gists. Designed to fetch up to 400 papers using focused keywords.

🚀 Features

Fetches papers from arXiv API.
Keywords used for search:
- mixnet
- mix network
- nym mixnet
Metadata collected per paper:
- Title
- Authors (up to 5)
- Year of publication
- PDF URL
- Relevance score (0.5–1.0)
- Gist (summary of research focus)
- Scraped timestamp
Automatically deduplicates papers by PDF URL.
Fully asynchronous using httpx and Apify SDK.
Hardcoded to fetch up to 400 papers, depending on availability.

📦 Installation

Clone or download the repository.
Ensure Python >= 3.13 is installed.
Install required dependencies:

$pip install apify httpx

arXiv Paper Scraper

cloud9_ai/arxiv-paper-scraper

Scrape academic papers from arXiv.org. Search by keyword, browse categories, or get latest papers. Extract titles, abstracts, authors, PDF links, and citation data via arXiv API.

cloud9

ArXiv Academic Paper Scraper

fortuitous_pirate/arxiv-scraper

Scrape academic papers from ArXiv. Extract titles, authors, abstracts, categories, and PDF links. Essential for research and literature reviews.

Fortuitous Pirate

ArXiv Paper Scraper

nexgendata/arxiv-scraper

Extract research papers, abstracts, authors, and citations from arXiv.org. Perfect for academic research monitoring, literature reviews, and scientific trend analysis.

Stephan Corbeil

arXiv Scraper

artificially/arxiv-scraper

Search and extract academic papers from arXiv.org. Get paper titles, authors, abstracts, categories, and PDF links for AI/ML, physics, math, and more.

Artificially

Privacy Stack

bikram786/privacy-stack

Privacy researcher & developer building production Apify actors for arXiv privacy research. Privacy Stack brings 1 5,00+ real arXiv privacy papers into one place ..carefully verified with no fake URLs & no duplicates. Categories : Internet Privacy Data Privacy Crypto Privacy Post-Quantum Privacy

Bikram Biswas

Arxiv Paper Scraper

technicaldost/arxiv-paper-scraper

Technical Dost Solutions

arXiv Search Scraper 📚

easyapi/arxiv-search-scraper

Extract comprehensive research paper data from arXiv search results. Get detailed metadata including titles, authors, abstracts, categories and more. Perfect for academic research monitoring, trend analysis and building paper databases. 🎓📚

EasyApi

5.0

arXiv Daily Digest Scraper

tropical_quince/arxiv-daily-digest

Scrape arXiv papers by search query or category. Extract titles, authors, abstracts, and PDF links from recent submissions.

Donny Nguyen

Arxiv Citation Network Scraper

codepoetry/arxiv-citation-network-scraper

A professional Apify Actor that scrapes academic papers from arXiv and builds citation networks. Extract paper metadata, analyze author collaborations, track research trends, and discover emerging topics in science and technology.