Streaming LLM Responses

The streaming endpoint provides real-time delivery of LLM-generated text as it’s being produced, creating a more responsive user experience similar to ChatGPT or Claude.

NOTE: This is handled automatically by components in the Client SDK, should you wish to wish to use that approach.

Streaming Updates Endpoint

POST /api/flowgraph/control/streaming

Key Features

Delivers LLM text in real-time chunks as generated
Uses long-polling for efficient streaming without WebSockets
Maintains index-based synchronization for reliable delivery
Automatically handles connection interruptions

When to use

After receiving a generation.start response type
For progressive display of LLM-generated content
To improve perceived response time for users
When the flow-graph uses streaming-enabled LLM nodes

Request Format

{
  "sessionStepID": string,
  "lastStreamIndex": number
}

sessionStepID: The unique identifier provided in the generation.start response
lastStreamIndex: The index of the last received update (-1 for initial request)