Finetuning (Phi-2)

This page details our finetuning approach for Phi-2

PreviousRAG NextResponse Scoring (Falcon7B)

Last updated 10 months ago

Finetuning (Phi-2)

This page details our finetuning approach for Phi-2

As detailed in the Challenges and Objectives we find that:

Phi-2 is not as responsive to instructions, particularly regarding how to format the input
The full context window of the model doesn't seem well utilized. Adding chunks of context degrades the performance.

We finetune the model to help overcome both of these challenges.

We use LoRA () to keep the finetuning cheap (in terms of computational resources needed).

Training Data

We select 1400 questions from the 1461 provided in the training txt file. We leave the rest (61) as unseen for validation purposes.

We use our RAG approach with chunk size 150 and k = 7 and generate static context that we add to the training txt. This can be seen in our repo: ()

We use this context for finetuning.

Prompt

The prompt includes:

instructions for the QA task, and the expected formatting
context
abbreviations
question
options

Objective

The objective is simple: Given the instruction, context, abbreviations, question and options, generate the correct option and an explanation

def formatting_func(self, example, abbreviations):
    prompt = f"Instruct: You will answer each question correctly by giving only the Option ID, the number that follows each Option.\n"
    prompt += f"The output should be in the format: Option <Option id>\n"
    prompt += f"Provide the answer to the following multiple choice question in the specified format.\n\n"
    prompt += f"Context: {example.context}\n\n"
    abbreviations_text = "\n".join([f"{list(abbrev.keys())[0]}: {list(abbrev.values())[0]}" for abbrev in abbreviations])
    f"Abbreviations:\n{abbreviations_text}\n\n"
    prompt += f"Question: {example.question}\n"
    for i, option in enumerate(example.options, 1):
        prompt += f"Option {i}: {option}\n"
    prompt += "Answer: Option"
    
    target = f"{example.answer}\nExplanation: {example.explanation}"
    
    return prompt + target

Training config

LoRA config

we encourage you to refer to the LoRA paper for futher clarifications

$\alpha =64$
$r = 32$
no bias
$dropout = 0.05$

Training args

batch size = 1
gradient accumulation steps = 4
epochs = 2 (700 steps)
max learning rate = $5 \times 10^{-5}$
lr scheduler: linear
warmup steps = 100

We make available our finetuned models: