Streaming vs. Non-streaming

In this section, we'll demonstrate the difference between streaming and non-streaming modes while using the Konko API.


Understanding the Modes


Streaming Mode

In streaming mode, the Konko API provides a real-time simulation of conversation by delivering message parts as they're generated. This offers a dynamic experience to users as they receive information chunk by chunk, replicating the flow of an organic conversation.

import konko

# Business query example: Asking about the return policy.
konko_completion = konko.ChatCompletion.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an FAQ bot for an online store."},
        {"role": "user", "content": "What's your return policy?"}
    ],
    temperature=0.1,
    max_tokens=300,
    n=1,
    stream=True
)

for message_chunk in konko_completion:
    print(message_chunk)


Sample Output

ChatCompletionChunk(
  id='a91c0730-161f-4885-aa5f-076da3dfa8c8',
  choices=[
    Choice(
      delta=ChoiceDelta(
        content=' Our',
        function_call=None,
        role='assistant',
        tool_calls=None
      ),
      finish_reason=None,
      index=0,
      logprobs=None
    ),
    ...
    Choice(
      delta=ChoiceDelta(
        content=' website',
        function_call=None,
        role='assistant',
        tool_calls=None
      ),
      finish_reason=None,
      index=0,
      logprobs=None
    )
  ],
  created=1704899844,
  model='mistralai/mistral-7b-instruct-v0.1',
  object='chat.completion.chunk',
  system_fingerprint=None,
  token={
    'id': 4400,
    'text': ' website',
    'logprob': 0,
    'special': False
  },
  generated_text=None,
  details=None,
  stats=None,
  usage=None
)

Non-streaming Mode

Conversely, in non-streaming mode, the entire response is constructed before it's returned to the user. This mode is useful when the complete response is critical before moving forward.

import konko

# Business query example: Asking about the return policy.
konko_completion = konko.chat.completions.create(
    model="mistralai/mistral-7b-instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an FAQ bot for an online store."},
        {"role": "user", "content": "What's your return policy?"}
    ],
    temperature=0.1,
    max_tokens=300,
    n=1,
    stream=False
)


Sample Output

ChatCompletion(
  id='bfe4f373-0e52-486e-8da8-a7e12e18ee11',
  choices=[
    Choice(
      finish_reason=None,
      index=0,
      logprobs=None,
      message=ChatCompletionMessage(
        content=' Our return policy varies depending on the product purchased. For most items, we offer a 30-day return period. Items must be returned in their original condition and packaging. For more detailed information, please refer to our Returns & Exchanges section on our website.',
        role='assistant',
        function_call=None,
        tool_calls=None
      )
    )
  ],
  created=1704902213,
  model='mistralai/mistral-7b-instruct-v0.1',
  object='chat.completion',
  system_fingerprint=None,
  usage=None
)

For more details on how to use this API, refer to our API Reference Page.


Your choice between streaming and non-streaming should be based on how you want your user to perceive the response. Consider the nature of the user's query and the expected response time when making your decision.


What’s Next