# Legal News Aggregator - National Law Review Articles (`jungle_synthesizer/legal-news-aggregator-scraper`) Actor

Extract attorney-authored legal news and analysis articles from the National Law Review. Returns title, author, law firm, publication date, practice areas, jurisdictions, summary, and full text. First legal news aggregator on Apify.

- **URL**: https://apify.com/jungle\_synthesizer/legal-news-aggregator-scraper.md
- **Developed by:** [BowTiedRaccoon](https://apify.com/jungle_synthesizer) (community)
- **Categories:** News, Other, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## National Law Review Legal News Scraper

Scrape attorney-authored legal news and analysis articles from the [National Law Review](https://natlawreview.com). Returns article title, author, law firm, publication date, practice areas, jurisdictions, summary, full text, and lead image URL for 33,000+ articles across every major US practice area.

---

### Legal News Scraper Features

- Extracts 11 fields per article — title, author, firm, date, practice areas, jurisdictions, summary, full body text, image, source, and scrape timestamp
- Pulls from the full sitemap index — 33,000+ articles spanning a decade of legal commentary
- Sorts newest-first by lastmod, so `maxItems: 100` returns the most recent 100 articles, not a random slice
- Accepts a direct URL list too, for targeted scrapes of specific articles
- No proxies, no browser, no CAPTCHA — just clean HTML from a server-rendered site
- Parses JSON-LD schema.org `NewsArticle` metadata, which is about as stable as web data gets

---

### Who Uses National Law Review Data?

- **Law firm marketing teams** — Track which firms and attorneys publish on which practice areas, benchmark thought-leadership output
- **Compliance and regulatory teams** — Monitor new analysis on regulatory changes across jurisdictions you care about
- **Legal tech startups** — Build datasets of attorney-authored content for search, summarization, or LLM training
- **Market intelligence analysts** — Track sentiment, topic frequency, and firm activity across the legal industry
- **Dataset builders** — Collect a deep corpus of structured legal writing without scraping paywalled publications

---

### How the Legal News Scraper Works

1. **Walk the sitemap** — Fetches the National Law Review sitemap index and each child sitemap, collecting article URLs with their last-modified timestamps
2. **Sort and slice** — Orders articles newest-first, then caps the list to `maxItems`
3. **Fetch each article** — CheerioCrawler pulls each page at moderate concurrency, respecting rate limits
4. **Parse and save** — Pulls JSON-LD metadata for the authoritative fields and CSS selectors for the body, practice areas, and jurisdictions

Skip steps 1 and 2 by passing a list of article URLs directly. The scraper handles that mode too, since sometimes you already know which articles you want.

---

### Input

```json
{
  "maxItems": 100,
  "sp_intended_usage": "Compliance monitoring across tax and employment practice areas",
  "sp_improvement_suggestions": "None"
}
````

Or target specific articles by URL:

```json
{
  "articleUrls": [
    { "url": "https://natlawreview.com/article/major-h-1b-changes-announced-including-new-100000-fee" },
    { "url": "https://natlawreview.com/article/whats-domain-name-explainer-domain-investing" }
  ],
  "maxItems": 2,
  "sp_intended_usage": "Targeted research",
  "sp_improvement_suggestions": "None"
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| maxItems | integer | 100 | Maximum number of articles to scrape. Articles are sorted newest-first when walking the sitemap. Set to 0 for unlimited. |
| articleUrls | array | `[]` | Optional list of specific article URLs. When provided, the sitemap walk is skipped and only these URLs are crawled. |
| proxyConfiguration | object | none | Proxy settings. Not required — National Law Review is a public site with no anti-bot protection. |

***

### Legal News Scraper Output Fields

```json
{
  "article_url": "https://natlawreview.com/article/major-h-1b-changes-announced-including-new-100000-fee",
  "title": "Major H-1B Changes Announced, Including New $100,000 Fee",
  "source_site": "natlawreview",
  "author_name": "Norris McLaughlin P.A.",
  "author_firm": "Norris McLaughlin  P.A.",
  "publication_date": "2025-09-22",
  "summary": "In a series of startling and conflicting announcements that caused a great deal of panic over the weekend for H-1B holders and their employers, President Trump ",
  "full_text": "In a series of startling and conflicting announcements ... particularly small business and nonprofits.",
  "practice_areas": [
    "Immigration",
    "Labor Employment",
    "Administrative Regulatory"
  ],
  "jurisdictions": [
    "All Federal"
  ],
  "image_url": "https://natlawreview.com/sites/default/files/2025-09/H1B%20Visa%20Lottery%20Employment%20Immigration_2.jpg",
  "scraped_at": "2026-04-18T01:19:26.230Z"
}
```

| Field | Type | Description |
|-------|------|-------------|
| article\_url | string | Canonical article URL |
| title | string | Article headline |
| source\_site | string | Source publication — currently always `natlawreview` |
| author\_name | string | Attorney or author page name |
| author\_firm | string | Law firm the author works for |
| publication\_date | string | Publication date in ISO 8601 format (YYYY-MM-DD) |
| summary | string | Short article summary from JSON-LD description |
| full\_text | string | Full article body as plain text, HTML tags stripped and entities decoded |
| practice\_areas | array | Practice areas tagged on the article (e.g. `Construction Law`, `Real Estate`) |
| jurisdictions | array | Jurisdictions tagged on the article (e.g. `Florida`, `All Federal`) |
| image\_url | string | Lead image URL, if present |
| scraped\_at | string | ISO 8601 timestamp of when the article was scraped |

***

### FAQ

#### How do I scrape the latest articles from the National Law Review?

Run the scraper with default input. It walks the sitemap, sorts by last-modified date, and returns the most recent 100 articles. Change `maxItems` to scrape more or fewer.

#### How do I scrape specific National Law Review articles by URL?

Pass a list of `articleUrls` in the input. The scraper skips the sitemap walk and fetches only the URLs you provide. Useful for re-scraping specific articles or building custom pipelines.

#### How much does the National Law Review Scraper cost to run?

The scraper uses the standard $0.10 per actor start + $0.001 per article record pricing. A 100-article run costs about $0.20 and finishes in under a minute. A full 33,000-article sitemap walk runs in under 10 minutes.

#### Does the scraper need proxies?

No. The National Law Review is a public Drupal site served through Varnish cache. No Cloudflare, no CAPTCHAs, no rate limiting in practice — the scraper ships with proxy settings disabled by default.

#### What practice areas does the National Law Review cover?

Every major US practice area, roughly. Construction law, immigration, tax, labor and employment, IP, real estate, financial services, environmental, health care, and dozens more. Each article is tagged with its practice areas and jurisdictions in the output.

***

### Need More Features?

Need custom fields, filters, or coverage of additional legal news sites (JD Supra, Above the Law, Mondaq)? [File an issue](https://console.apify.com/actors/issues) or get in touch.

### Why Use the Legal News Aggregator Scraper?

- **First of its kind** — No other Apify actor targets legal news and analysis. This is the only one.
- **Clean structured output** — JSON-LD-backed fields mean consistent author, firm, and date attribution across tens of thousands of articles, which saves you the cleanup pass you were going to run anyway
- **Affordable** — ~$0.001 per article, no proxy costs, no browser costs

# Actor input Schema

## `sp_intended_usage` (type: `string`):

Please describe how you plan to use the data extracted by this crawler.

## `sp_improvement_suggestions` (type: `string`):

Provide any feedback or suggestions for improvements.

## `sp_contact` (type: `string`):

Provide your email address so we can get in touch with you.

## `articleUrls` (type: `array`):

Optional list of specific National Law Review article URLs to scrape (e.g. https://natlawreview.com/article/<slug>). When provided, the sitemap walk is skipped and only these URLs are crawled. Leave empty to scrape the latest articles from the sitemap.

## `maxItems` (type: `integer`):

Maximum number of articles to scrape. Articles are sorted newest first when walking the sitemap. Set to 0 for unlimited.

## `proxyConfiguration` (type: `object`):

Select proxies. National Law Review is a public website and does not require proxies.

## Actor input object example

```json
{
  "sp_intended_usage": "Describe your intended use...",
  "sp_improvement_suggestions": "Share your suggestions here...",
  "sp_contact": "Share your email here...",
  "articleUrls": [],
  "maxItems": 10,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "sp_intended_usage": "Describe your intended use...",
    "sp_improvement_suggestions": "Share your suggestions here...",
    "sp_contact": "Share your email here...",
    "articleUrls": [],
    "maxItems": 10,
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("jungle_synthesizer/legal-news-aggregator-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "sp_intended_usage": "Describe your intended use...",
    "sp_improvement_suggestions": "Share your suggestions here...",
    "sp_contact": "Share your email here...",
    "articleUrls": [],
    "maxItems": 10,
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("jungle_synthesizer/legal-news-aggregator-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "sp_intended_usage": "Describe your intended use...",
  "sp_improvement_suggestions": "Share your suggestions here...",
  "sp_contact": "Share your email here...",
  "articleUrls": [],
  "maxItems": 10,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call jungle_synthesizer/legal-news-aggregator-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jungle_synthesizer/legal-news-aggregator-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Legal News Aggregator - National Law Review Articles",
        "description": "Extract attorney-authored legal news and analysis articles from the National Law Review. Returns title, author, law firm, publication date, practice areas, jurisdictions, summary, and full text. First legal news aggregator on Apify.",
        "version": "1.0",
        "x-build-id": "DBMjBNwdpbXomin3m"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jungle_synthesizer~legal-news-aggregator-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jungle_synthesizer-legal-news-aggregator-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jungle_synthesizer~legal-news-aggregator-scraper/runs": {
            "post": {
                "operationId": "runs-sync-jungle_synthesizer-legal-news-aggregator-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jungle_synthesizer~legal-news-aggregator-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-jungle_synthesizer-legal-news-aggregator-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "sp_intended_usage",
                    "sp_improvement_suggestions"
                ],
                "properties": {
                    "sp_intended_usage": {
                        "title": "What is the intended usage of this data?",
                        "minLength": 1,
                        "type": "string",
                        "description": "Please describe how you plan to use the data extracted by this crawler."
                    },
                    "sp_improvement_suggestions": {
                        "title": "How can we improve this crawler for you?",
                        "minLength": 1,
                        "type": "string",
                        "description": "Provide any feedback or suggestions for improvements."
                    },
                    "sp_contact": {
                        "title": "Contact Email",
                        "minLength": 1,
                        "type": "string",
                        "description": "Provide your email address so we can get in touch with you."
                    },
                    "articleUrls": {
                        "title": "Article URLs",
                        "type": "array",
                        "description": "Optional list of specific National Law Review article URLs to scrape (e.g. https://natlawreview.com/article/<slug>). When provided, the sitemap walk is skipped and only these URLs are crawled. Leave empty to scrape the latest articles from the sitemap.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of articles to scrape. Articles are sorted newest first when walking the sitemap. Set to 0 for unlimited.",
                        "default": 10
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies. National Law Review is a public website and does not require proxies."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
