Proxy Endpoint
The Proxy Endpoint serves as a straightforward way to enhance your existing LLM setup with RAG-Buddy capabilities.
We distinguish between two primary use cases that are currently supported:
- RAG+Citation
- Text Classification
RAG+C System Message Format
Proper formatting of the system message is essential for the effective functioning of RAG-Buddy. The system message for the RAG+C use case is comprised of an introduction, optional instructions, and a series of articles.
RAG+C Elements of the System Message
-
System Introduction: Sets the context for the interaction.
- Example: “You are a customer support agent of the e-banking application Piggy Bank Extraordinaire.”
-
System Instructions (Optional): Use default instructions or provide custom ones.
- Default: Guidelines for selecting and referencing articles (recommended).
- Custom: Replace
{system_instructions}
in the template with your instructions.
-
Articles: Each article should have a unique ID and relevant content.
- Format:
## ID:<unique_identifier> [Article Title and Content]
. Content can be multi-line.
- Format:
Formatting the Message for RAG+C
Assemble the system message like the Python example below, with each section with an all caps header and separated by double line breaks:
This structured format is crucial for the correct processing and response generation in RAG-Buddy.
Text-Classification System Message Format
The system message for the Text-Classification use case is comprised of system instructions (not optional).
Text-Classification Elements of the System Message
- System Instructions: Guidelines for the LLM to select the most relevant class and to return that.
- Example: “You are an expert assistant in the field of customer service. Your task is to help workers in the customer service department of a company.\nYour task is to classify the customer’s question in order to help the customer service worker to answer the question. In order to help the worker, you MUST respond with the name of one of the following classes you know.\nIn case you reply with something else, you will be penalized.\nThe classes are the following:“
Formatting the Message for Text-Classification
Assemble the system message like the Python example below:
This structured format is crucial for the correct processing and response generation in RAG-Buddy.
Advanced Features
The feature below are applicable to both RAG+C and Text Classification use cases.
Cache-Control
The Helvia-RAG-Buddy-Cache-Control
header is essential for managing how RAG-Buddy’s cache is used, influencing both reading and writing operations.
Options for Cache-Control
no-cache
: The cache will not be used for reading, but it will be updated with the new response.no-store
: Responses will not be added to the cache.no-cache, no-store
: Both reading from and writing to the cache are disabled.- Omitting the header: This enables both reading from and writing to the cache.
Effects of Cache-Control Options
The table below summarizes the impact of each Cache-Control option on cache behavior:
Cache-Control Header Option | Read from Cache | Write to Cache |
---|---|---|
no-cache | No | Yes |
no-store | Yes | No |
no-cache, no-store | No | No |
(Header Omitted) | Yes | Yes |
Understanding and applying these options correctly can significantly impact the performance and efficiency of your integration with RAG-Buddy.
no-cache
header, if your cache already contains the same question, the response associated to that question will be overwritten.no-store
header, the question/answer will not be stored in the cache which could lead to not receiving cache hits for your requests.Reading Response Headers
- Purpose: Understanding whether your request was served from the cache or fetched anew can be critical for debugging and performance optimization.
- Implementation:
- After each API call, check the response headers.
- Look for the response header
Helvia-RAG-Buddy-Cache-Status
. - This header will indicate whether the response was a cache hit or not. If there was a cache hit, the value will be an integer, referring to an internal database ID. If there was a cache miss, the header will not be part of the repsponse or will have value
None
or an empty string.
completions.with_raw_response
method. See the example below for more on that.These additional features provide you with greater control and insight into how RAG-Cache is interacting with your requests. Leveraging them effectively can optimize your application’s performance and data relevance.
Comprehensive Example
This code example illustrates the integration of Cache-Control for cache management and interpreting the cache status header to determine cache usage.
This time we will demonstrate the RAG+C use case, here the system is fed with a set of pre-selected articles related to banking services. These articles provide the necessary context for the AI to understand and respond accurately to user queries.