📎Finetuning (Phi-2)
This page details our finetuning approach for Phi-2
As detailed in the Challenges and Objectives we find that:
Phi-2 is not as responsive to instructions, particularly regarding how to format the input
The full context window of the model doesn't seem well utilized. Adding chunks of context degrades the performance.
We finetune the model to help overcome both of these challenges.
We use LoRA (https://arxiv.org/pdf/2106.09685) to keep the finetuning cheap (in terms of computational resources needed).
Training Data
We select 1400 questions from the 1461 provided in the training txt file. We leave the rest (61) as unseen for validation purposes.
We use our RAG approach with chunk size 150 and k = 7 and generate static context that we add to the training txt. This can be seen in our repo: (https://github.com/Alexgichamba/itu_qna_challenge/blob/main/data/qs_train_with_context.txt)
We use this context for finetuning.
Prompt
The prompt includes:
instructions for the QA task, and the expected formatting
context
abbreviations
question
options
Objective
The objective is simple: Given the instruction, context, abbreviations, question and options, generate the correct option and an explanation
def formatting_func(self, example, abbreviations):
prompt = f"Instruct: You will answer each question correctly by giving only the Option ID, the number that follows each Option.\n"
prompt += f"The output should be in the format: Option <Option id>\n"
prompt += f"Provide the answer to the following multiple choice question in the specified format.\n\n"
prompt += f"Context: {example.context}\n\n"
abbreviations_text = "\n".join([f"{list(abbrev.keys())[0]}: {list(abbrev.values())[0]}" for abbrev in abbreviations])
f"Abbreviations:\n{abbreviations_text}\n\n"
prompt += f"Question: {example.question}\n"
for i, option in enumerate(example.options, 1):
prompt += f"Option {i}: {option}\n"
prompt += "Answer: Option"
target = f"{example.answer}\nExplanation: {example.explanation}"
return prompt + target
Training config
LoRA config
we encourage you to refer to the LoRA paper for futher clarifications
no bias
Training args
batch size = 1
gradient accumulation steps = 4
epochs = 2 (700 steps)
max learning rate =
lr scheduler: linear
warmup steps = 100

We make available our finetuned models:
Last updated