# Construction Dive Scraper (`lexis-solutions/constructiondive-com-scraper`) Actor

Construction Dive scraper for USA construction news and press releases: extract articles, contacts, images, and metadata from ConstructionDive.com Deep Dive section for construction tech, PR, and market research workflows.

- **URL**: https://apify.com/lexis-solutions/constructiondive-com-scraper.md
- **Developed by:** [Lexis Solutions](https://apify.com/lexis-solutions) (community)
- **Categories:** News, Lead generation, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.90 / 1,000 articles

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![Construction Dive Scraper](https://i.postimg.cc/SKzhZXzh/image.png)

### What does the Construction Dive Scraper do?

This actor crawls Construction Dive (constructiondive.com) to collect article and press-release content. It supports both article/search pages (topic pages, press-release index) and article/detail pages (news and press releases). The scraper normalizes and produces structured dataset items for downstream analysis.

### What data can I extract from ConstructionDive.com with this scraper?

With this actor you can extract:

- Article-level metadata: `url`, `title`, `publishDate`, `subHeading`, `description` (cleaned paragraphs and headings)
- Authors: `author`, `authorTitle`
- Lead image: `articleImage`, `imageCaption`, `sourceText`
- Inline and gallery images: `images` array with `imageUrl`, `captionText`, `sourceText`
- Press release fields (when page is a press release): `contactName`, `contactEmail`, `contactPhone`, `aboutSection`

### How to use this Scraper

This actor is runnable on Apify or locally. Provide `startUrls` (search/listing pages or direct article pages) and `maxItems` to limit how many detail pages are enqueued per start URL.

Steps:

1. Provide start URLs (search/listing or detail pages) in the input or via the `startUrls` console field.
2. Set `maxItems` to limit how many detail pages to collect per start URL.
3. Run the actor and download the dataset when complete.

### Input

The actor accepts the following input parameters:

- `startUrls` (array of objects) - **Required**. URLs to start with. Can be search/listing pages or individual article/press-release pages.
- `maxItems` (integer) - The maximum number of detail pages to enqueue per start URL. Example: 5
- `proxyConfiguration` (object) - Proxy configuration settings.

#### Supported URL Examples

- Topic/search pages / press release index: https://www.constructiondive.com/press-release, https://www.constructiondive.com/topic/commercial-building/
- Site root / landing: https://www.constructiondive.com/
- Article/detail pages: https://www.constructiondive.com/news/data-centers-community-benefits-spec-new-york-build-panel/816335/

Example input:

```json
{
    "startUrls": [
        { "url": "https://www.constructiondive.com/press-release/" },
        { "url": "https://www.constructiondive.com/topic/commercial-building/" }
    ],
    "maxItems": 5,
    "proxyConfiguration": { "useApifyProxy": true }
}
````

#### Note: The `startUrls` field is required, and it is highly recommended to use proxy for large scale scraping so the actor defaults to using the Datacenter proxy.

### Output

The scraped data is saved to the default dataset. Each item represents either a news article or a press release depending on the page type. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

#### Common Fields

- `url`, `title`, `publishDate`, `subHeading`, `description`
- `author`, `authorTitle` (news pages)
- `articleImage`, `imageCaption`, `sourceText` (lead image)
- `images` (array of image objects: `imageUrl`, `captionText`, `sourceText`)
- `contactName`, `contactEmail`, `contactPhone`, `aboutSection` (press releases)

All absent values are explicitly set to `null`.

#### Example Output - Press Release

```json
{
    "url": "https://www.constructiondive.com/press-release/20260330-southern-impression-homes-jlm-living-advance-the-eleanor-bloomingdale-b/",
    "title": "Southern Impression Homes & JLM Living Advance The Eleanor - Bloomingdale BTR Community | Construction Dive",
    "publishDate": "March 30, 2026",
    "contactName": "Kara Pound",
    "contactEmail": "kara@oldcitypr.com",
    "contactPhone": "386-237-4500",
    "description": "JACKSONVILLE, Fla. — Southern Impression Homes (SIH), a leading full-service property development group specializing in Build-to-Rent (BTR) communities, announces its joint project with JLM Living on The Eleanor - Bloomingdale, a 253-unit single-family rental community currently under construction on Little Neck Road in Bloomingdale, Georgia, just outside Savannah. Vertical construction commenced in October 2025, with the first units expected in April 2026 and full project completion slated for early 2027. \"The Eleanor - Bloomingdale demonstrates our ability to seamlessly integrate design, development, and construction into a single execution platform,\" said Chris Funk, President and CEO of Southern Impression Homes...",
    "aboutSection": "Search Home Topics Commercial Corporate News Economy Infrastructure Labor Safety Tech Sustainability Legal/Regs Deep Dive Opinion Library Events Press Releases Get Construction Dive in your inbox...",
    "subHeading": "253-unit Bloomingdale development highlights strength of vertically-integrated partnership",
    "author": null,
    "authorTitle": null,
    "articleImage": null,
    "imageCaption": null,
    "sourceText": null,
    "images": []
}
```

#### Example Output - News Article

```json
{
    "url": "https://www.constructiondive.com/news/data-centers-community-benefits-spec-new-york-build-panel/816335/",
    "title": "Data centers must prove their community worth, panel says",
    "publishDate": "April 2, 2026",
    "author": "Kate Serpico",
    "authorTitle": "Senior Editor",
    "subHeading": "As AI drives demand for data centers, developers face growing pressure to deliver tangible local benefits",
    "description": "NEW YORK — Data center developers rushing to build facilities to support artificial intelligence applications must prove their worth to local communities, industry experts said at a panel discussion here...",
    "articleImage": "https://www.constructiondive.com/imgproxy/...",
    "imageCaption": "Aerial view of data center construction site",
    "sourceText": "Permission granted by Construction Photography",
    "images": [
        {
            "imageUrl": "https://www.constructiondive.com/imgproxy/...",
            "captionText": "Interior view of data center server room",
            "sourceText": "Construction Photography"
        }
    ],
    "contactName": null,
    "contactEmail": null,
    "contactPhone": null,
    "aboutSection": null
}
```

#### Notes and Limitations

- The actor depends on the current Construction Dive HTML structure; update selectors if the site changes.
- Respect site Terms of Service and robots.txt. Use proxies and throttling to reduce blocking risk.
- Pagination links that are relative (e.g., `?page=2`) are resolved against the current page URL to produce absolute next-page links.

#### 🔍 Looking to Scrape more News Websites?

In addition to this actor, you can explore our suite of dedicated scrapers tailored for other popular news websites. Each scraper is optimized for its target site to ensure accurate, efficient, and high-performance data extraction.

| Scraper                                                                            | Country | Description                                                                                                                                                                                                                                                               |
| ---------------------------------------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Ynet.co.il Scraper](https://apify.com/lexis-solutions/ynet)                       | Isreal  | Scrape news content from ynet.co.il to gather headlines, summaries, and metadata. Ideal for news aggregation, market analysis, and tracking real-time trends. Fast, structured, and customizable extraction from an Israel-based source.                                  |
| [ElEspanol.com Scraper](https://apify.com/lexis-solutions/elespanol)               | Spain   | Scrape news content from El Español - including headlines, summaries, article bodies, authors, and publish dates. Ideal for news aggregation, market analysis, and trend tracking. Fast, structured, and customizable extraction from Spain’s leading news source.        |
| [Reddit Answers Scraper](https://apify.com/lexis-solutions/reddit-answers-scraper) | Global  | Unlock structured AI-powered Q\&A from Reddit Answers—extract organized answers, source subreddits, related posts, and suggested topics. Perfect for market research, content creation, SEO strategy, and knowledge base building. Fast, reliable, and fully customizable. |

Explore these solutions to expand your data collection capabilities across events data extraction websites.

***

👀 p.s.

Need changes or a custom export format (CSV/JSONL)? I can add dataset schema views or additional fields.

Contact the maintainer or open an issue in the repo for improvements.

Image Credit: https://www.constructiondive.com/

# Actor input Schema

## `startUrls` (type: `array`):

URLs to start with.

## `maxItems` (type: `integer`):

Maximum number of articles that will be extracted.

## `proxyConfiguration` (type: `object`):

Select proxies to be used by your crawler.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://www.constructiondive.com/"
    }
  ],
  "maxItems": 5,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://www.constructiondive.com/"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("lexis-solutions/constructiondive-com-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://www.constructiondive.com/" }] }

# Run the Actor and wait for it to finish
run = client.actor("lexis-solutions/constructiondive-com-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://www.constructiondive.com/"
    }
  ]
}' |
apify call lexis-solutions/constructiondive-com-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=lexis-solutions/constructiondive-com-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Construction Dive Scraper",
        "description": "Construction Dive scraper for USA construction news and press releases: extract articles, contacts, images, and metadata from ConstructionDive.com Deep Dive section for construction tech, PR, and market research workflows.",
        "version": "1.0",
        "x-build-id": "kHKe0YbFZmlI6kCXd"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/lexis-solutions~constructiondive-com-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-lexis-solutions-constructiondive-com-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/lexis-solutions~constructiondive-com-scraper/runs": {
            "post": {
                "operationId": "runs-sync-lexis-solutions-constructiondive-com-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/lexis-solutions~constructiondive-com-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-lexis-solutions-constructiondive-com-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "URLs to start with.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItems": {
                        "title": "Maximum number of items",
                        "type": "integer",
                        "description": "Maximum number of articles that will be extracted.",
                        "default": 5
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Select proxies to be used by your crawler.",
                        "default": {
                            "useApifyProxy": true
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
