Correctness Test

Evaluate correctness of LLM output

POST https://api.punya.ai/v1/correctness

This API allows you to submit a message from your user and the response from your LLM-powered chat bot for evaluating its correctness and relevance to the prompt from your user.

The API provides 1) the verdict which can be either correct or incorrect or partially_correct, 2) the score which is a number within a range between 0 to 1, 3) the explanation which is a string to provide some explanation to support the verdict.

Request Body

Name

Type

Description

user_message*

String

User's message

bot_response*

String

Bot's response generated from your LLM model

{
    "verdict": "incorrect",
    "score": 0.0,
    "explanation": "Explanation"
}

Example 1

Request

{
    "user_message": "What type of mammal lays the biggest eggs?",
    "bot_response": "Ostrich lays the biggest eggs."
}

Response

{
    "verdict": "partially_correct",
    "score": 0.5,
    "explanation": "The output is partially correct. While it correctly identifies that ostriches lay the biggest eggs among birds, it is incorrect to say that ostriches are mammals. Therefore, the output is only partially correct."
}

Example 2

Request

{
    "user_message": "I weigh 150 pounds and want to lose 1 pound every week. How many weeks will it take for me to reach my goal weight of 130 pounds?",
    "bot_response": "It will take you 20 weeks to reach your goal weight. "
}

Response

{
    "verdict": "correct",
    "score": 1.0,
    "explanation": "The output is correct. It correctly calculates the number of weeks required to reach the goal weight by subtracting the current weight from the goal weight and dividing it by the desired weight loss per week.\n\nThe output is highly relevant to the ask in the input. It directly answers the question by providing the specific number of weeks (20 weeks) required to reach the goal weight of 130 pounds."
}

PreviousAuthentication

Last updated 2 years ago

hashtagEvaluate correctness of LLM output

hashtagRequest Body

hashtagExample 1

hashtagRequest

hashtagResponse

hashtagExample 2

hashtagRequest

hashtagResponse

Evaluate correctness of LLM output

Request Body

Example 1

Request

Response

Example 2

Request

Response