Documentation

Prompting Guidelines

Best practices for crafting prompts for various models.

This guide provides guidelines for correctly prompting various language models using the Konko API. Below are instructions for each model, including example code snippets.

Models and Prompting Formats

Prompting Llama 2 Chat and CodeLlama Instruct


  • Konko Model IDs:
    meta-llama/llama-2-70b-chat
    meta-llama/llama-2-13b-chat
    codellama/codellama-34b-instruct
    codellama/codellama-13b-instruct
    codellama/codellama-7b-instruct
    konko/llama-2-7b-32k-instruct

  • Endpoint: ChatCompletion

  • Usage: When using these models, the prompt is structured with specific tags (<s>, [INST], <<SYS>>) to define the conversation flow. However, Konko's ChatCompletion endpoint automatically formats these prompts based on user input. Users provide their inputs in a conversational manner without needing to manually include these tags.

    Under the Hood: Prompt Formatting (FYI)

    For the Meta-LLaMa Models, the underlying prompt structure is as follows:

    <s>[INST] &lt;&lt;SYS&gt;&gt;
      system_prompt  
    <</SYS>>  
      user_prompt_1 [/INST]  
      assistant_response_1 </s>  
    <s>[INST] user_prompt_1 [/INST]
    
    • <s> and </s> mark the start and end of a message.
    • [INST] and [/INST] denote instructional blocks.
    • <<SYS>> and <</SYS>> enclose the system message.
  • Example Code:

    import konko
    
    konko_completion = konko.chat.completions.create(
        model="meta-llama/llama-2-70b-chat",
        messages=[
            {"role": "system", "content": "You are a summarizer"},
            {"role": "user", "content": "Your prompt here..."}
        ],
        temperature=0.1,
        max_tokens=300,
        n=2
    )
    

Prompting Mistral-Orca


  • Model ID:
    open-orca/mistral-7b-openorca

  • Endpoint: ChatCompletion

  • Usage: This model utilizes a conversational format. The ChatCompletion endpoint in Konko handles the required formatting, allowing users to input their messages directly.

    Under the Hood: Prompt Formatting (FYI)

    For Open-Orca/Mistral-7B-OpenOrca, the underlying prompt format is as follows:

    <|im_start|>system
    You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!
    <|im_end|>
    <|im_start|>user
    How are you?<|im_end|>
    <|im_start|>assistant
    I am doing well!<|im_end|>
    <|im_start|>user
    Please tell me about how mistral winds have attracted super-orcas.<|im_end|>
    <|im_start|>assistant
    
  • Example Code:

    import konko
    
    konko_completion = konko.chat.completions.create(
        model="open-orca/mistral-7b-openorca",
        messages=[
            {"role": "system", "content": "You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!"},
            {"role": "user", "content": "How are you?"},
            {"role": "assistant", "content": "I am doing well!"},
            {"role": "user", "content": "Please tell me about how mistral winds have attracted super-orcas."}
        ],
        temperature=0.1,
        max_tokens=300,
        n=2
    )
    

Prompting MistralAI: Mistral-7B-Instruct-v0.1


  • Model ID:
    mistralai/mistral-7b-instruct-v0.1

  • Endpoint: ChatCompletion

  • Usage: This model is designed for instruction-based interactions. The ChatCompletion endpoint in Konko processes input with specific formatting, enabling users to include instruction-specific tokens in their messages.

    Under the Hood: Prompt Formatting (FYI)

    The prompt structure for mistralai/Mistral-7B-Instruct-v0.1 incorporates special tokens to delineate instructions. Here's how it looks:

    system_prompt
    <s>[INST]user_prompt_1 [/INST]
    assistant_response</s>
    [INST] user_prompt_2 [/INST]
    
  • Example Code:

    import konko
    
    konko_instruct = konko.chat.completions.create(
        model="mistralai/mistral-7b-instruct-v0.1",
        messages=[
            {"role": "system", "content": "You are Mistral-7B-Instruct, an advanced language model from MistralAI, focusing on understanding and generating responses based on specific instructions. Ensure clarity and accuracy in your responses."},
            {"role": "user", "content": "What is your favourite condiment?"},
            {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
            {"role": "user", "content": "Do you have mayonnaise recipes?"}
        ],
        temperature=0.1,
        max_tokens=300,
        n=2
    )
    

Prompting NousResearch


  • Model ID:
    nousresearch/nous-hermes-llama2-13b
    nousresearch/nous-hermes-llama-2-7b

  • Endpoint: ChatCompletion

  • Usage: Nous-Hermes-Llama2-13B, developed by NousResearch, is optimized for chat-based interactions that follow a specific prompt format. The ChatCompletion endpoint in Konko is specifically designed to handle this format, facilitating clear and structured dialogues.

    Under the Hood: Prompt Formatting (FYI)

    The prompt structure for NousResearch/Nous-Hermes-Llama2-13B follows the Alpaca format, which includes clear instruction and response sections. Here's an example of how it looks:

    system
    ### Instruction:
    user_prompt
    
    ### Response:
    <leave a newline blank for model to respond>
    
    or
    
    system
    ### Instruction:
    user_prompt
    
    ### Input:
    additional_context
    
    ### Response:
    <leave a newline blank for model to respond>
    
  • Example Code:

    import konko
    
    konko_hermes = konko.chat.completions.create(
        model="nousresearch/nous-hermes-llama2-13b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant"},
            {"role": "user", "content": "Can you explain quantum computing?\n### Input:\nIncluding its implications for cryptography."}
        ],
        temperature=0.1,
        max_tokens=300,
        n=2
    )
    

Prompting LLaMA-2-7B-32K


  • Model ID:
    konko/llama-2-7b-32k

  • Endpoint: Completion

  • Usage: LLaMA-2-7B-32K is designed for a wide range of text completion tasks and does not require any specialized prompt formatting. This flexibility makes it ideal for diverse applications, from creative writing to information retrieval. The Completion endpoint in Konko supports the direct submission of text prompts, allowing the model to generate responses based on the provided context.

    Under the Hood: Prompt Formatting (FYI)

    Unlike models that need structured prompts, LLaMA-2-7B-32K works with straightforward text inputs. The model interprets and responds to the content of the prompt without the need for additional formatting or instructions. Here’s a general idea of how to use it:

    "Your unstructured text prompt here..."
    

    This approach enables the model to understand and respond to a wide variety of queries and instructions.

  • Example Code:

    import konko
    
    konko_completion = konko.chat.completions.create(
        model="konko/llama-2-7b-32k",
        prompt="Your unstructured text prompt here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting Llama 2 and CodeLlama


  • Konko Model IDs:
    codellama/codellama-34b
    codellama/codellama-34b-python
    meta-llama/llama-2-70b
    meta-llama/llama-2-13b

  • Endpoint: Completion

  • Usage: These models are versatile and do not require any specific formatting, making them user-friendly for a variety of tasks. Whether it's generating code, engaging in conversation, or providing information, these models can handle text inputs directly.

    Under the Hood: Prompt Formatting (FYI)

    These models operate effectively with straightforward text prompts. They interpret and respond to the content directly without the need for specialized formatting or structured instructions. Here's a basic idea of how to prompt them:

    "Your unstructured text prompt here..."
    

    This approach allows the models to process and respond to a wide range of queries, coding tasks, or conversational topics.

  • Example Code:

    import konko
    
    # Example for codellama/CodeLlama-34b
    konko_completion_code34b = konko.completions.create(
        model="codellama/codellama-34b",
        prompt="Your code-related prompt here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    
    # Example for meta-llama/llama-2-70b
    konko_completion_llama270b = konko.completions.create(
        model="meta-llama/llama-2-70b",
        prompt="Your general inquiry or discussion topic here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting Phind-CodeLlama-34B-v2


  • Model ID: phind/phind-codellama-34b-v2

  • Endpoint: Completion

  • Usage: Phind/Phind-CodeLlama-34B-v2 is specifically tailored for code generation and assistance. It utilizes the Alpaca/Vicuna instruction format, which is effective in guiding the model to understand and execute programming-related tasks. The Completion endpoint in Konko allows users to input these structured prompts directly.

    How to Prompt the Model

    The model responds best to prompts that include a clear system instruction followed by a user message. This format aids in contextualizing the task for the model. Here’s an example of how to format your prompt:

    ### System Prompt
    You are an intelligent programming assistant.
    
    ### User Message
    Implement a linked list in C++
    
    ### Assistant
    ...
    

    This structured approach helps the model to understand the nature of the coding task and respond with appropriate code or guidance.

  • Example Code:

    import konko
    
    konko_completion_phind = konko.completions.create(
        model="phind/phind-codellama-34b-v2",
        prompt="### System Prompt\nYou are an intelligent programming assistant.\n\n### User Message\nImplement a linked list in C++\n\n### Assistant\n",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting Phind-CodeLlama-34B-Python-v1


  • Model ID: phind/phind-codellama-34b-python-v1

  • Endpoint: Completion

  • Usage: This model is specialized in Python coding tasks and is instruction-tuned but not chat-tuned. It's designed to understand and execute Python programming instructions effectively. The Completion endpoint in Konko supports this model by processing direct instruction-based prompts.

    How to Prompt the Model

    Unlike models that require chat markup or complex formats, Phind/Phind-CodeLlama-34B-Python-v1 works best with straightforward instructions. Simply state what you want the model to do and append "\n: " at the end of your task description. This method helps the model clearly identify and focus on the task at hand. Here's an example:

    ### Instruction:
    Write me a linked list implementation in Python:
    ### Response:
    

    This format is concise and directly communicates the programming task to the model, making it ideal for generating Python code solutions.

  • Example Code:

    import konko
    
    konko_completion_phind_python = konko.completions.create(
        model="phind/phind-codellama-34b-python-v1",
        prompt="### Instruction:\n{Write me a linked list implementation in Python:}\n### Response:\n",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    
    

Prompting nsql-llama-2-7B


  • Model ID: numbersstation/nsql-llama-2-7b

  • Endpoint: Completion

  • Usage: NumbersStation/nsql-llama-2-7B is expertly tailored for text-to-SQL generation tasks. It translates natural language prompts into SQL queries, especially focusing on SELECT queries based on provided table schemas. This model is particularly efficient for generating SQL queries from structured natural language questions.

    Usage: To effectively utilize this model, you should provide a detailed table schema followed by a natural language question that requires an SQL SELECT query as a response. The model operates best with prompts that clearly outline the database structure and the query requirement. Here are some examples of how to structure your prompts:

    Example 1:

    "CREATE TABLE stadium (...)
    
    -- Using valid SQLite, answer the following questions for the tables provided above.
    -- What is the maximum, the average, and the minimum capacity of stadiums ?
    SELECT"
    

    Example 2:

    "CREATE TABLE stadium (...)
    
    -- Using valid SQLite, answer the following questions for the tables provided above.
    -- How many stadiums in total?
    SELECT"
    

    Example 3:

    "CREATE TABLE work_orders (...)
    
    -- Using valid SQLite, answer the following questions for the tables provided above.
    -- How many work orders are open?
    SELECT"
    

    This format, which includes both the table schema and a specific SQL query question, guides the model to generate accurate and relevant SQL queries.

  • Example Code:

    import konko
    
    konko_completion_nsql = konko.completions.create(
        model="numbersstation/nsql-llama-2-7b",
        prompt="CREATE TABLE stadium (...) -- Your SQL question here... SELECT",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting Mistral-7B-v0.1


  • Model ID: mistralai/mistral-7b-v0.1

  • Endpoint: Completion

  • Usage: The Mistral-7B-v0.1 model from mistralai is a versatile tool designed for a broad spectrum of text-based tasks. It doesn't require any specific formatting for the prompts, making it highly accessible for various applications, including information retrieval, creative writing, and question-answering.

    Prompting the Model

    This model operates efficiently with straightforward text prompts. You can simply input your query, instruction, or topic without needing to follow any special structure or formatting. The model's design allows it to interpret and generate responses based on the content of these unstructured prompts. Here’s an example of how to use it:

    "Your unstructured text prompt here..."
    

    This approach lets the model process and respond to a wide array of inquiries, allowing for flexibility in its applications.

  • Example Code:

    import konko
    
    konko_completion_mistral = konko.completions.create(
        model="mistralai/mistral-7b-v0.1",
        prompt="Your unstructured text prompt here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting SQLCoder


  • Model ID:
    defog/sqlcoder2

  • Endpoint: Completion

  • Usage: This model is designed for converting questions into SQL queries, typically requiring a specific format for the prompt. Although Konko's Completion endpoint allows users to send any format, models generally perform better when following a structured prompt template. The typical format for sqlcoder2 includes instructions, input, and a request for a SQL query response.

    Under the Hood: Prompt Formatting (FYI)

    The standard prompt format for defog/sqlcoder2 is:

    ### Instructions:
    Your task is to convert a question into a SQL query, given a database schema...
    
    ### Input:
    Generate a SQL query that answers the question "{question}"...
    
    ### Response:
    Here is the SQL query I have generated to answer the question "{question}":
    ```sql
    

    This format helps guide the model to understand the task and respond appropriately.

  • Example Code:

    import konko
    
    konko_completion = konko.Completion.create(
        model="defog/sqlcoder2",
        prompt="Your structured prompt here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting Airoboros GPTQ


  • Model ID:
    TheBloke/Airoboros-L2-70b-2.2.1-GPTQ

  • Endpoint: Completion

  • Usage: Supports various formats for tasks like conversations, context-based Q&A, coding, and function calling.

    Under the Hood: Prompt Formatting (FYI)

    • Basic Chat & Stylized Responses:

      // Basic Chat
      A chat. USER: How does quantum computing work? ASSISTANT:
      
      // Stylized Response
      A chat between Einstein and Newton. Einstein: 'I believe time is relative.' Newton:
      
    • Context Obedient Question Answering:

      BEGININPUT
      BEGINCONTEXT
      topic: Quantum Mechanics
      ENDCONTEXT
      'Can you explain the uncertainty principle?'
      ENDINPUT
      BEGININSTRUCTION
      'Explain in simple terms.'
      ENDINSTRUCTION
      
    • Coding:

      Create a python application with the following requirements:
      - Asyncio FastAPI webserver
      - ping endpoint that returns the current date in JSON format
      - file upload endpoint, which calculates the file's sha256 checksum, and checks postgres to deduplicate
      
      
    • Function Calling:

      As an AI assistant, please select the most suitable function and parameters from the list of available functions below, based on the user's input. Provide your response in JSON format.
      
      Input: I want to know how many times 'Python' is mentioned in my text file.
      
      Available functions:
      file_analytics:
        description: This tool performs various operations on a text file.
        params:
          action: The operation we want to perform on the data, such as "count_occurrences", "find_line", etc.
          filters:
            keyword: The word or phrase we want to search for.
      
      
  • Example Code:

    import konko
    
    konko_completion = konko.Completion.create(
        model="TheBloke/Airoboros-L2-70b-2.2.1-GPTQ",
        prompt="Your structured prompt here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

Prompting Trelis Llama Function Calling


  • Model ID:
    Trelis/Llama-2-70b-chat-hf-function-calling-v2

  • Endpoint: Completion

  • Usage: This model specializes in function calling, handling inputs formatted as function metadata and user prompts. It returns responses in JSON format representing the function and its arguments.

    Under the Hood: Prompt Formatting (FYI)

    • Function Calling Format:
      <FUNCTIONS>
      {
          "function": "function_name",
          "description": "Function description",
          "arguments": [
              {
                  "name": "argument_name",
                  "type": "argument_type",
                  "description": "Argument description"
              }
          ]
      }
      </FUNCTIONS>
      [INST] User's query or instruction [/INST]
      
      The model interprets this format to execute function calls based on the provided metadata and user prompt.
  • Example Code:

    import konko
    import json
    
    # Define function metadata and user prompt
    function_metadata = {
        "function": "search_bing",
        "description": "Search the web for content on Bing.",
        "arguments": [{"name": "query", "type": "string", "description": "The search query string"}]
    }
    user_prompt = 'Search for the latest news on AI.'
    
    # Format the prompt
    function_list = json.dumps(function_metadata)
    prompt = f"<FUNCTIONS>{function_list}</FUNCTIONS>\n\n[INST] {user_prompt} [/INST]\n\n"
    
    konko_completion = konko.Completion.create(
        model="Trelis/Llama-2-70b-chat-hf-function-calling-v2",
        prompt=prompt,
        temperature=0.1,
        max_tokens=300,
        n=1
    )
    

    The above code shows how to structure prompts for function calling tasks with the Trelis/Llama-2-70b-chat-hf-function-calling-v2 model in the Konko API.


Prompting Yarn-Mistral


  • Model ID:
    NousResearch/Yarn-Mistral-7b-128k

  • Endpoint: Completion

  • Usage: This model is designed for general-purpose language tasks and doesn't require a specific prompt template. Users can input their prompts directly.

    Under the Hood: Prompt Formatting (FYI)

    • General Prompt Format:
      Simply input the desired prompt. For example:
      {prompt}
      This format allows for a wide range of queries and instructions without the need for specialized tagging or structure.
  • Example Code:

    import konko
    
    konko_completion = konko.Completion.create(
        model="NousResearch/Yarn-Mistral-7b-128k",
        prompt="Your prompt here...",
        temperature=0.1,
        max_tokens=300,
        n=1
    )