R3 | DE avatar

R3 | DE

Pricing

from $0.10 / 1,000 results

Go to Apify Store
R3 | DE

R3 | DE

R³-DE is a multi-layer NLP pipeline that transforms raw text or scraped content into structured records with speakers, entities, actions, states, intent, sentiment, causality, confidence scores, and prompt–completion pairs, ideal for training chatbots, instruction-tuned LLMs, and safety-critical AI.

Pricing

from $0.10 / 1,000 results

Rating

5.0

(4)

Developer

GUN | METAL

GUN | METAL

Maintained by Community

Actor stats

1

Bookmarked

5

Total users

1

Monthly active users

8 days ago

Last modified

Share

R³-DE: Rich Recursive Reasoning & Dialogue Extraction Pipeline

R³-DE (Rich Recursive Reasoning & Dialogue Extraction) is a powerful, multi-layer Natural Language Understanding (NLU) pipeline designed to transform raw text or web-scraped content (dialogues, transcripts, articles, forum posts, interviews, meeting notes, etc.) into highly structured, AI-training-ready datasets.

It extracts speakers, entities, actions, states, intents, sentiment, temporal information, semantic clusters, causal relationships, uncertainty/confidence scores, and clean prompt-completion pairs — making it ideal for:

  • Training chatbots and conversational agents
  • Fine-tuning instruction-tuned LLMs
  • Building safety-critical assistants
  • Creating high-quality synthetic dialogue datasets
  • Analyzing multi-turn conversations with causal & uncertainty modeling

Features

  • Web scraping support [Upcoming]
  • Speaker-aware sentence splitting & improved speaker detection
  • Rich entity, action, state, intent & sentiment extraction
  • Semantic clustering of utterances
  • Causal inference between triggers and states (DoWhy + linear regression)
  • Uncertainty & confidence scoring (Gaussian Process Regression)
  • Speaker → Trigger knowledge graph (NetworkX)
  • Sequential turn-based fallback graph
  • Clean prompt-completion pairs ready for LLM fine-tuning
  • Detailed metadata: average confidence, info density, cluster count, etc.

Output Structure (Simplified Example)

{
"structured_records": [
{
"speaker": "Alice",
"entity": "meeting tomorrow",
"state": "schedule meeting tomorrow at 3pm",
"trigger": "schedule",
"intent": "command",
"sentiment": 0.0,
"confidence": 0.92,
"original_text": "Let's schedule a meeting tomorrow at 3pm..."
}
],
"sequences": [
{
"prompt": "[Alice @ inferred] Let's schedule a meeting tomorrow at 3pm...",
"completion": "[Command] schedule meeting tomorrow at 3pm (trigger: schedule)"
}
],
"metadata": {
"num_records": 42,
"average_confidence": 0.887,
"num_clusters_discovered": 4
}
}

Dataset Usage

1. Training & Evaluating ML Models

The dataset acts as high-quality labeled data.

  • Used to train classifiers, sequence models, or detection systems
  • Ideal for supervised learning due to strict field typing
  • Excellent for benchmarking, since labels are consistent and reproducible

Typical uses:

  • Intent classification
  • Event sequence prediction
  • Anomaly or intrusion detection
  • Policy or rule-learning systems

2. Ground-Truth for System Validation

Because outputs are deterministic:

  • The dataset becomes a reference ground-truth
  • Used to validate pipelines, APIs, or agents
  • Enables regression testing (“did the system behavior change?”)

Especially valuable in:

  • Security systems
  • Compliance workflows
  • AI agent evaluation

3. Synthetic Data for Controlled Experiments

The inference contract makes the dataset synthetic but trustworthy.

  • Safe to share (no real user data)
  • Fully reproducible
  • Tunable by modifying input frames

Used in:

  • Research experiments
  • Model ablation studies
  • Stress-testing edge cases
  • Simulating rare or costly scenarios

4. Feature Stores & Analytics Pipelines

Since the dataset is CSV-based and schema-locked:

  • Directly ingestible into data warehouses
  • Easy to convert into feature vectors
  • Compatible with SQL, Spark, Pandas, and BI tools

Supports:

  • Time-series analysis
  • Behavior modeling
  • Trend detection
  • KPI computation

5. Evaluation of AI / Agentic Systems

The dataset can function as a test harness for AI agents.

  • Compare expected vs. actual actions
  • Measure precision, recall, latency, or policy adherence
  • Detect hallucinations or invalid state transitions

Particularly effective for:

  • Voice AI
  • Conversational agents
  • Autonomous decision systems

Benefits of Using R³-DE

1. Raw → Rich, Automatically

R³-DE converts unstructured text into high-fidelity, structured datasets without requiring labels, schemas, or manual annotation. Everything is inferred directly from the data.

2. Training-Ready AI Data

Outputs include prompt–completion pairs, speaker roles, intents, and semantic states, making the dataset immediately usable for training chatbots, instruction-tuned LLMs, and conversational agents.

3. Causality-Aware Understanding

Unlike traditional NLP pipelines, R³-DE models causal relationships between triggers and outcomes, enabling deeper reasoning and safer decision-making systems.

4. Built-In Confidence & Uncertainty

Each extracted record includes confidence scores and noise estimates, allowing downstream systems to reason about reliability, ambiguity, and risk.

5. Speaker & Turn Awareness

Multi-speaker conversations, overrides, commands, questions, and recoveries are preserved as structured, turn-aware records instead of flattened text.

6. Zero Annotation Overhead

No predefined ontology, taxonomy, or labeling rules are required. R³-DE adapts to new domains and formats automatically.

7. Domain-Agnostic by Design

Works equally well for support chats, engineering logs, safety reports, meeting transcripts, system alerts, and operational dialogues.

8. Knowledge Graph Generation

Automatically produces event and knowledge graphs that capture relationships between speakers, actions, and outcomes.

9. Production & Safety Friendly

Designed for safety-critical and enterprise workflows where traceability, confidence, and explainability matter.

10. Apify-Native & Scalable

Runs as an Apify Actor, scales on demand, and produces datasets that integrate seamlessly with existing data pipelines.

R³-DE turns raw language into decision-ready intelligence.


Inference Contract Proof

Developer: Sakaeth Ram

This dataset is generated using a Small Language Model (SLM) operating strictly under an inference-only contract. The model performs no training, fine-tuning, or weight updates during generation.

The SLM is configured for deterministic constrained semantic generation, ensuring that identical inputs always produce identical outputs. Inference is executed with zero temperature, greedy decoding, and bounded token limits to eliminate variability and prevent open-ended text completion.

Inputs are provided as validated intent–state frames with predefined roles, ordered event timelines, and explicit timestamps. The SLM maps these inputs directly to structured semantic frames, emitting outputs exclusively in schema-locked CSV (or JSON) format.

All outputs are enforced at inference time to be:

  • Schema-compliant
  • Strictly typed
  • Complete (no missing fields)
  • Semantically consistent with declared intent and roles

The SLM functions as a deterministic semantic frame generator, not a creative language model, making the resulting dataset reproducible, auditable, and suitable for downstream machine learning and analytical workflows.