Advanced Tutorial
This section is for the ones who want to implement their own Tasks or want to use the framework APIs for other use cases.
Scores
Scores are the methods that we use to measure the goodness of the completion of our model, comparing to the expected answer.
Available Scores
Score Name | Description |
---|---|
Exact Match | 1 if the completion and target are equal, otherwise 0 . |
English Sentence Bleu | Bleu n-gram score with NLTK English word tokenizer. Between 0 to 1 . |
Persian Sentence Bleu | Bleu n-gram score with Hazm Persian word tokenizer. Between 0 to 1 . |
English Rouge | Rouge score with NLTK English word tokenizer. Between 0 to 1 . |
Persian Rouge | Rouge score with Hazm Persian word tokenizer. Between 0 to 1 . |
Make Your Score
You can write your score function anywhere and just wrap it using wrap_scorer
.
import random
from parsbench.scores.base import wrap_scorer
@wrap_scorer
def random_score(completion: str, target: str) -> float:
return random.random()
The function's name will be used as the name of your score in the benchmark result.
Data Loader
Data Loader class will load the dataset needed for evaluation.
JSONLine
JSONLineDataLoader
is based on the jsonlines
file format (.jsonl
). It can load jsonline files from web or local.
from parsbench.tasks.base import JSONLineDataLoader
data_loader = JSONLineDataLoader(data_path="dataset.jsonl")
data = data_loader.load()
CSV
CSVDataLoader
is based on the CSV
file format (.csv
). It can load CSV files from web or local.
from parsbench.tasks.base import CSVDataLoader
data_loader = CSVDataLoader(data_path="dataset.csv")
data = data_loader.load()
HuggingFace
HuggingFaceDataLoader
is based on the datasets
library of HuggingFace that loads datasets from local or downloads it from the HuggingFace Dataset Hub.
from parsbench.tasks.base import HuggingFaceDataLoader
data_loader = HuggingFaceDataLoader(
data_path="persiannlp/parsinlu_entailment"
split="validation"
)
data = data_loader.load()
Prompt Template
Using PromptTemplate
class, you can define the prompt template for different lanugages, shot templates, shot examples, prompt variables, etc.
With Shot Template
In this example, you can define a prompt template for sentiment analysis task with shot template.
Prompt and Shot Templates:
FA_TEMPLATE = """
جمله زیر نظر یک شخص است. این جمله به زبان فارسی است. بار یا احساس موجود در این جمله را شناسایی کن.
پاسخ های ممکن حالتهای روبرو هستند:
SAD
NETURAL
HAPPY
فقط کلمه مروبط به احساس نظر داده شده را خروجی بده.
{example_shots}
نظر: {review}
احساس:
"""
FA_SHOT_TEMPLATE = """
نظر: {review}
احساس: {label}
"""
And for the task prompt template:
from parsbench.tasks.base import PromptTemplate
prompt_template = PromptTemplate(
language_templates={"fa": FA_TEMPLATE},
prompt_shot_templates={"fa": FA_SHOT_TEMPLATE},
prompt_variables_mapping={"review": "review"},
target_variables_mapping={"label": "label"},
)
prompt = prompt_template.get_prompt(
prompt_lang="fa",
data={"review": "غذا خیلی بد بود", "label": "SAD"}
n_shots=3,
sample_data=[
{"review": "خوشمزه بود ممونم", "label": "HAPPY"},
{"review": "غذا خوب بود فقط کاش زودتر میرسید.", "label": "NETURAL"},
{"review": "نوشابه گرم بود. پیتزا هم خیلی بد مزه بود.", "label": "SAD"}
]
)
With Shot Examples
If the task is complicated and you want to use methods like CoT (Chain of Thought) prompting, You can use static shot examples.
from parsbench.tasks.base import PromptTemplate
prompt_template = PromptTemplate(
language_templates={"fa": FA_TEMPLATE},
prompt_shot_examples={"fa": {1: FA_1_SHOT, 3: FA_3_SHOT, 5: FA_5_SHOT}}
)
prompt = prompt_template.get_prompt(prompt_shot=5, ...)
Load Templates From File
You can also load prompts from text files using LazyLoadTemplates
.
from parsbench.tasks.base import PromptTemplate, LazyLoadTemplates
prompt_template = PromptTemplate(
language_templates=LazyLoadTemplates(
fa="fa_math.txt",
en="en_math.txt",
),
...
)
Constant Prompt Variable
Sometimes you may wanna fill some prompt variables with constant data and you don't wanna put it in the prompt template text. You can use ConstantPromptVariable
:
from parsbench.tasks.base import PromptTemplate, ConstantPromptVariable
prompt_template = PromptTemplate(
language_templates={"fa": FA_TEMPLATE},
prompt_shot_templates={"fa": FA_SHOT_TEMPLATE},
prompt_variables_mapping={
"input": "input",
"first_name": ConstantPromptVariable("شهریار")
},
target_variables_mapping={"label": "label"},
)
Tasks
The primary unit of the ParsBench framework is a task. Tasks are battery-included evaluators which do all the process from loading dataset to evaluating models and outputting the result.
Task Data Provider
Each task includes a dataset for evaluation. If you need to get the data of that task, you can use task.get_data
function.
from parsbench.tasks import ParsiNLUEntailment
with ParsiNLUEntailment() as task: # Open in context manager to load data.
data = task.get_data()
Task Match Generator
A TaskMatch
is an object which includes prompt, target answer, model completion and the score. Initially the matches doesn't have completion
and score
attributes (default to None
). But you can generate completions and score them using the task class itself.
To generate matches based on the prompt template, you can use following code:
from parsbench.tasks import ParsiNLUEntailment
with ParsiNLUEntailment() as task:
matches = task.generate_matches(prompt_lang="fa", n_shots=0, n_first=100)
Generate Completions and Score
After generating matches, now we should generate completions for each match prompt. Then we score them based on the defined scorer in the task.
from parsbench.tasks import ParsiNLUEntailment
with ParsiNLUEntailment() as task:
matches = task.generate_matches(prompt_lang="fa", n_shots=0, n_first=100)
model.generate_completions(matches) # Completions are generated by the model.
tasks.score_matches(matches)
Make Your Task
If you want to have a task with your own private dataset, prompts, setups, etc. You can inherit one of the existing tasks or create your own task from scratch.
We suggest to put your prompt templates in a text file and use LazyLoadTemplates
to load them for a better performance.
from parsbench.scores.base import Scorer, wrap_scorer
from parsbench.tasks.base import (
HuggingFaceDataLoader,
LazyLoadTemplates,
PromptTemplate,
Task,
TaskCategory,
TaskMatchGroup,
)
@wrap_scorer
def my_custom_score(completion: str, target: str) -> float:
return float(completion.strip() == target.strip())
class CustomTask(Task):
task_name: str = "Custom Task"
task_category: TaskCategory = TaskCategory.REASONING
data_loader: HuggingFaceDataLoader = HuggingFaceDataLoader(
data_path="org/custom_dataset",
split="test",
)
data_target_key: str = "target"
prompt_template: PromptTemplate = PromptTemplate(
language_templates=LazyLoadTemplates(
en="path/to/en_template.txt",
fa="path/to/fa_template.txt",
),
prompt_shot_templates=LazyLoadTemplates(
en="path/to/en_shot_template.txt",
fa="path/to/fa_shot_template.txt",
),
prompt_variables_mapping={"prompt_variable1": "variable1", "prompt_variable2": "variable2"},
target_variables_mapping={"prompt_target": "target"},
)
scorer: Scorer = my_custom_score
def score_matches(self, matches: TaskMatchGroup) -> TaskMatchGroup:
matches.format_completions(
lambda c: c.strip().strip("'").lower()
)
return super().score_matches(matches)
def get_overall_score(cls, matches: TaskMatchGroup) -> float:
return sum(match.score for match in matches) / len(matches)
You can use any data loader, prompt template and scorer you want.
Add Sub Tasks
Your task might have a couple of sub tasks. In that case you should specify sub_task_key
and sub_tasks
attributes in the task class.
class CustomTask(Task):
...
sub_task_key: str = "category" # Column in the dataset which specify the sub tasks.
sub_tasks: list[str] = ["math_and_logic", "common_knowledge", "literature"] # Expected sub tasks.
...
And for generating matches or evaluating the model, you can specify a subset of the sub tasks.
with CustomTask() as task:
results = task.evaluate(..., sub_tasks=["math_and_logic"])
# OR
benchmark = CustomBenchmark(
...,
tasks=[
PersianMath,
CustomTask.select_sub_tasks(["math_and_logic"]),
]
)