For foundational setup and basic integration with the Proxy Endpoint, please first refer to the guidelines in the Quickstart section. Below, we build upon this initial integration, introducing advanced features to enhance your use of the Proxy Endpoint.

We distinguish between two primary use cases that are currently supported:

  • RAG+Citation
  • Text Classification
Make sure to check out this Playground to experiment with the Proxy Endpoint.

RAG+C System Message Format

Proper formatting of the system message is essential for the effective functioning of RAG-Buddy. The system message for the RAG+C use case is comprised of an introduction, optional instructions, and a series of articles.

RAG+C Elements of the System Message

  1. System Introduction: Sets the context for the interaction.

    • Example: “You are a customer support agent of the e-banking application Piggy Bank Extraordinaire.”
  2. System Instructions (Optional): Use default instructions or provide custom ones.

    • Default: Guidelines for selecting and referencing articles (recommended).
    • Custom: Replace {system_instructions} in the template with your instructions.
  3. Articles: Each article should have a unique ID and relevant content.

    • Format: ## ID:<unique_identifier> [Article Title and Content]. Content can be multi-line.

Formatting the Message for RAG+C

Assemble the system message like the Python example below, with each section with an all caps header and separated by double line breaks:

Construction of the system message
system_content = f"{system_intro}\n\nINSTRUCTIONS:\n{system_instructions}\n\nARTICLES:\n{articles}\n\n"

This structured format is crucial for the correct processing and response generation in RAG-Buddy.

Text-Classification System Message Format

The system message for the Text-Classification use case is comprised of system instructions (not optional).

Text-Classification Elements of the System Message

  1. System Instructions: Guidelines for the LLM to select the most relevant class and to return that.
    • Example: “You are an expert assistant in the field of customer service. Your task is to help workers in the customer service department of a company.\nYour task is to classify the customer’s question in order to help the customer service worker to answer the question. In order to help the worker, you MUST respond with the name of one of the following classes you know.\nIn case you reply with something else, you will be penalized.\nThe classes are the following:“

Formatting the Message for Text-Classification

Assemble the system message like the Python example below:

Construction of the system message
system_content = f"{system_instructions}"

This structured format is crucial for the correct processing and response generation in RAG-Buddy.

Advanced Features

The feature below are applicable to both RAG+C and Text Classification use cases.

Cache-Control

The Helvia-RAG-Buddy-Cache-Control header is essential for managing how RAG-Buddy’s cache is used, influencing both reading and writing operations.

Options for Cache-Control

  1. no-cache: The cache will not be used for reading, but it will be updated with the new response.
  2. no-store: Responses will not be added to the cache.
  3. no-cache, no-store: Both reading from and writing to the cache are disabled.
  4. Omitting the header: This enables both reading from and writing to the cache.

Effects of Cache-Control Options

The table below summarizes the impact of each Cache-Control option on cache behavior:

Cache-Control Header OptionRead from CacheWrite to Cache
no-cacheNoYes
no-storeYesNo
no-cache, no-storeNoNo
(Header Omitted)YesYes

Understanding and applying these options correctly can significantly impact the performance and efficiency of your integration with RAG-Buddy.

When using the no-cache header, if your cache already contains the same question, the response associated to that question will be overwritten.
When using the no-store header, the question/answer will not be stored in the cache which could lead to not receiving cache hits for your requests.
When setting a high temperature for creative or non-deterministic model outputs, it’s advised to also include headers to disable the cache, ensuring the uniqueness and variety of responses are preserved.

Reading Response Headers

  • Purpose: Understanding whether your request was served from the cache or fetched anew can be critical for debugging and performance optimization.
  • Implementation:
    • After each API call, check the response headers.
    • Look for the response header Helvia-RAG-Buddy-Cache-Status.
    • This header will indicate whether the response was a cache hit or not. If there was a cache hit, the value will be an integer, referring to an internal database ID. If there was a cache miss, the header will not be part of the repsponse or will have value None or an empty string.
When using the OpenAI Python client SDK, the response headers can only be read when using the completions.with_raw_response method. See the example below for more on that.

These additional features provide you with greater control and insight into how RAG-Cache is interacting with your requests. Leveraging them effectively can optimize your application’s performance and data relevance.

Comprehensive Example

In this example we implement the RAG+C use case. For the Text-Classification use case please refer to the Quickstart.

This code example illustrates the integration of Cache-Control for cache management and interpreting the cache status header to determine cache usage.

This time we will demonstrate the RAG+C use case, here the system is fed with a set of pre-selected articles related to banking services. These articles provide the necessary context for the AI to understand and respond accurately to user queries.

RAG-Buddy Proxy Integration
import openai

# Your OpenAI API key
openai_api_key = "sk-abc123" # Replace with your actual API key

# Your RAG Buddy key
rag_buddy_key = "RAG_CA_abc123" # Replace with your actual RAG Buddy key

# Needed for RAG Cache integration
base_url = "https://api.ragbuddy.ai/proxy/ragc/v1"
headers = {"Helvia-RAG-Buddy-Token": rag_buddy_key,
            "Helvia-RAG-Buddy-Cache-Control": "no-cache"} # Replace with your desired Cache-Control option

# System messages
system_intro = "You are a customer support agent of the e-banking application called Piggy Bank Extraordinaire."
system_instructions = """Select the best article to reply to the question of the user below.
If you can find the answer in the articles listed below, then:
You MUST select exactly one article from the listed articles.
You MUST add the ID of the selected article at the start of your answer in the format "(ID:number) your answer", For example: (ID:1) your answer.
You MUST provide a short summarized answer.
If you cannot find the answer in the list of articles below, then:
You MUST say "(ID:None) I cannot answer this" and MUST say nothing more.
"""

# Articles
chosen_articles = """
## ID:123e4567-e89b-12d3-a456-426655440000    Interest Rates on Piggy Bank Extraordinaire's Savings Accounts
At Piggy Bank Extraordinaire, we understand the importance of saving for your future. That's why our Gold Plus Savings Account offers a competitive interest rate of 2.6% per annum, ensuring your savings grow steadily over time. For those seeking more flexibility, our Silver Flexi Savings Account provides an interest rate of 1.8% per annum, with the added benefit of no minimum balance requirement. Our Bronze Everyday Savings Account is perfect for daily transactions, offering a 1.2% interest rate per annum. With Piggy Bank Extraordinaire, you can choose the savings account that best suits your financial goals and lifestyle.

## ID:123e4567-e89b-12d3-a456-426655440001    Comparing Interest Rates: Piggy Bank Extraordinaire vs. Other Banks
When it comes to choosing a bank for your savings, interest rates play a crucial role. Piggy Bank Extraordinaire stands out with its competitive rates. Our Gold Plus Savings Account offers an interest rate of 2.6% per annum, significantly higher than the industry average of 2.0%. In comparison, Big Bank offers 2.2% on its equivalent account, and Global Trust offers 2.1%. Furthermore, Piggy Bank Extraordinaire's Silver Flexi and Bronze Everyday accounts also outperform their counterparts at other banks, offering higher returns on your deposits. With Piggy Bank Extraordinaire, you can be assured of getting one of the best rates in the market for your savings.

## ID:123e4567-e89b-12d3-a456-426655440002    Understanding Fixed Deposit Interest Rates at Piggy Bank Extraordinaire
Fixed Deposits at Piggy Bank Extraordinaire are an excellent way to earn higher interest on your savings. Our Fixed Deposit accounts offer various tenures ranging from 6 months to 5 years, with interest rates varying accordingly. For a 6-month deposit, enjoy an interest rate of 2.0% per annum. The rate increases to 2.5% for a 1-year term and peaks at 3.5% for a 5-year term. These rates are designed to reward longer commitments with higher returns. Our Fixed Deposits are perfect for customers who wish to lock in their savings for a fixed period to earn a guaranteed return without the risks associated with market fluctuations.
"""

# User query (can be replaced with any relevant question)
user_query = "What are the interest rates for the Gold Plus Savings Account?"

# Format the template
template = f"{system_intro}\n\nINSTRUCTIONS:\n{system_instructions}\n\nARTICLES:\n{chosen_articles}\n\n"

# Initialize OpenAI client
client = openai.OpenAI(
    api_key=openai_api_key,
    timeout=10,
    default_headers=headers,
    base_url=base_url,
)

# Prepare the messages
messages = [
    {
        "role": "system",
        "content": template,
    },
    {"role": "user", "content": user_query},
]

# Call OpenAI API
raw_response = client.chat.completions.with_raw_response.create( # Use with_raw_response to read response headers
    model="gpt-4o",  # Replace with the specific model name
    messages=messages,
)

completion = raw_response.parse()

# Read the response
response = completion.choices[0].message
# Print the response headers
print(raw_response.headers)
# Check for cache status header
print("Cache hit: ", raw_response.headers.get("Helvia-RAG-Buddy-Cache-Status"))
# Print the answer
print("Answer: ", response.content)