LightBurn Forum Scrapper
Pricing
from $0.05 / 1,000 results
LightBurn Forum Scrapper
LightBurn Forum Crawler extracts LightBurn forum topics, posts, and replies into clean, flat CSV/JSON records for semantic analysis, with one row per post or comment including type, original IDs, author, cleaned text, URLs, timestamps, likes, source, and matched keyword when applicable.
Pricing
from $0.05 / 1,000 results
Rating
0.0
(0)
Developer
Zhenyu Towne
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
LightBurn Forum Semantic Crawler
Extract clean, semantic-analysis-ready posts and comments from the LightBurn Software forum.
This Actor crawls LightBurn forum topics through the public Discourse JSON API and exports one flat dataset row per original post or reply. The output is designed for NLP, LLM, embedding, semantic search, clustering, topic modeling, support trend analysis, and spreadsheet workflows.
What It Does
- Crawls the latest LightBurn forum topics or searches by keyword.
- Fetches topic posts and replies.
- Converts forum HTML into clean plain text.
- Exports one row per
postorcomment. - Preserves original Discourse post and topic IDs.
- Includes author, URL, timestamp, likes, source mode, and matched keyword.
- Removes image data from the main text field so exports are easier to analyze.
Use Cases
- Build a semantic search index from LightBurn forum discussions.
- Analyze common user issues and support patterns.
- Cluster posts by topic or intent.
- Prepare forum text for embeddings or LLM classification.
- Export clean CSV or JSON data for spreadsheets and BI tools.
Input Options
| Field | Description |
|---|---|
baseUrl | Forum base URL. Defaults to https://forum.lightburnsoftware.com. |
keywords | Optional keyword or comma-separated keywords. If empty, the Actor crawls latest topics. |
startDate | Optional start date filter, for example 2026-01-01. |
endDate | Optional end date filter, for example 2026-01-31. |
timeField | Date field used for filtering: created_at, last_posted_at, or bumped_at. |
maxTopics | Maximum number of topics to process. |
maxPages | Maximum number of listing or search pages to scan. |
includeReplies | Set to false to export only the original topic post. |
maxPostsPerTopic | Maximum number of posts/comments exported from each topic. |
categoryIds | Optional list of Discourse category IDs to include. |
requestDelayMillis | Delay between forum API requests. |
Output
Each dataset item is a single flat record.
| Field | Description |
|---|---|
recordType | post for the first post in a topic, comment for replies. |
originalPostId | Original Discourse post ID. |
originalTopicId | Original Discourse topic ID. |
topicTitle | Forum topic title. |
topicUrl | URL of the forum topic. |
postUrl | Direct URL to the post or comment. |
postNumber | Post number within the topic. |
replyToPostNumber | Referenced post number when the comment is a reply. |
authorUsername | Forum username. |
authorName | Display name when available. |
originalText | Cleaned plain text extracted from the post body. |
createdAt | Post creation timestamp. |
updatedAt | Post update timestamp. |
likeCount | Number of likes on the post. |
source | latest or search. |
matchedKeyword | Keyword that matched the topic in search mode. |
crawledAt | Timestamp when the row was exported. |
Example Output
{"recordType": "comment","originalPostId": 605754,"originalTopicId": 190079,"topicTitle": "Downloaded 2.1.01 and my laser will not come to full power","topicUrl": "https://forum.lightburnsoftware.com/t/example-topic/190079","postUrl": "https://forum.lightburnsoftware.com/t/example-topic/190079/2","postNumber": 2,"replyToPostNumber": null,"authorUsername": "MikeyH","authorName": "Mike Hembrey","originalText": "Check your Units settings. The upgrade might have flipped the switch.","createdAt": "2026-05-25T22:02:31.813Z","updatedAt": "2026-05-25T22:02:31.813Z","likeCount": 0,"source": "latest","matchedKeyword": null,"crawledAt": "2026-05-26T03:07:28.862Z"}
Notes
This Actor is built for structured text extraction. It does not download images or include image URLs in the main semantic text field. The resulting dataset is intentionally flat so CSV and JSON exports remain easy to analyze.