Skip to content

Conversation

VivekSinghDS
Copy link
Contributor

The tests cover different aspects for testing LLM performance. Some of them are as follows :

  1. Length: Measures the absolute difference in length between the ground truth and model prediction.

  2. Jaccard similarity: Calculates the Jaccard similarity score between the ground truth and model prediction sets.

  3. Dot product similarity: Measures the similarity between the ground truth and model prediction embeddings using dot product.

  4. ROUGE score: Computes the ROUGE-1 score between the ground truth and model prediction.

  5. Word overlap: Calculates the percentage of overlapping words between the ground truth and model prediction after removing stop words.

  6. Part-of-speech composition: Analyzes the percentage of verbs, adjectives, and nouns in the model prediction.

The tests cover different aspects for testing LLM performance. Some of them are as follows : 

1. Length: Measures the absolute difference in length between the ground truth and model prediction.

2. Jaccard similarity: Calculates the Jaccard similarity score between the ground truth and model prediction sets.

3. Dot product similarity: Measures the similarity between the ground truth and model prediction embeddings using dot product.

4. ROUGE score: Computes the ROUGE-1 score between the ground truth and model prediction.

5. Word overlap: Calculates the percentage of overlapping words between the ground truth and model prediction after removing stop words.

6. Part-of-speech composition: Analyzes the percentage of verbs, adjectives, and nouns in the model prediction.
@RohitSaha RohitSaha merged commit 92eaa2d into georgian-io:feature-llm-qa Feb 21, 2024
@VivekSinghDS VivekSinghDS deleted the patch-1 branch February 23, 2024 05:34
@benjaminye benjaminye added the enhancement New feature or request label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants