Streaming vs. Non-streaming

In this section, we'll demonstrate the difference between streaming and non-streaming modes while using the Konko API.

Understanding the Modes

Streaming Mode

In streaming mode, the Konko API provides a real-time simulation of conversation by delivering message parts as they're generated. This offers a dynamic experience to users as they receive information chunk by chunk, replicating the flow of an organic conversation.

import konko

# Business query example: Asking about the return policy.
konko_completion = konko.ChatCompletion.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an FAQ bot for an online store."},
        {"role": "user", "content": "What's your return policy?"}
    ],
    temperature=0.1,
    max_tokens=300,
    n=1,
    stream=True
)

for message_chunk in konko_completion:
    print(message_chunk)

Sample Output

ChatCompletionChunk(
  id='a91c0730-161f-4885-aa5f-076da3dfa8c8',
  choices=[
    Choice(
      delta=ChoiceDelta(
        content=' Our',
        function_call=None,
        role='assistant',
        tool_calls=None
      ),
      finish_reason=None,
      index=0,
      logprobs=None
    ),
    ...
    Choice(
      delta=ChoiceDelta(
        content=' website',
        function_call=None,
        role='assistant',
        tool_calls=None
      ),
      finish_reason=None,
      index=0,
      logprobs=None
    )
  ],
  created=1704899844,
  model='mistralai/mistral-7b-instruct-v0.1',
  object='chat.completion.chunk',
  system_fingerprint=None,
  token={
    'id': 4400,
    'text': ' website',
    'logprob': 0,
    'special': False
  },
  generated_text=None,
  details=None,
  stats=None,
  usage=None
)

Non-streaming Mode

Conversely, in non-streaming mode, the entire response is constructed before it's returned to the user. This mode is useful when the complete response is critical before moving forward.

import konko

# Business query example: Asking about the return policy.
konko_completion = konko.chat.completions.create(
    model="mistralai/mistral-7b-instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are an FAQ bot for an online store."},
        {"role": "user", "content": "What's your return policy?"}
    ],
    temperature=0.1,
    max_tokens=300,
    n=1,
    stream=False
)

Sample Output

ChatCompletion(
  id='bfe4f373-0e52-486e-8da8-a7e12e18ee11',
  choices=[
    Choice(
      finish_reason=None,
      index=0,
      logprobs=None,
      message=ChatCompletionMessage(
        content=' Our return policy varies depending on the product purchased. For most items, we offer a 30-day return period. Items must be returned in their original condition and packaging. For more detailed information, please refer to our Returns & Exchanges section on our website.',
        role='assistant',
        function_call=None,
        tool_calls=None
      )
    )
  ],
  created=1704902213,
  model='mistralai/mistral-7b-instruct-v0.1',
  object='chat.completion',
  system_fingerprint=None,
  usage=None
)

For more details on how to use this API, refer to our API Reference Page.

Your choice between streaming and non-streaming should be based on how you want your user to perceive the response. Consider the nature of the user's query and the expected response time when making your decision.