Zindi ITU Challenge Docs
  • Overview
    • 💡Introduction
    • ☎️Challenges and Objectives
  • METHOD
    • 📪RAG
    • 📎Finetuning (Phi-2)
    • 🧙‍♂️Response Scoring (Falcon7B)
    • 🤖Abbreviations
  • Final Systems
    • 🛞Phi-2
    • 🛻Falcon7B
  • REPRODUCING RESULTS
    • 🚀Installation Instructions
    • 🏓Phi-2
    • ⚾Falcon7B
  • Links
Powered by GitBook
On this page
  • Training Data
  • Prompt
  • Objective
  • Training config
  • LoRA config
  • Training args
  1. METHOD

Finetuning (Phi-2)

This page details our finetuning approach for Phi-2

PreviousRAGNextResponse Scoring (Falcon7B)

Last updated 10 months ago

As detailed in the Challenges and Objectives we find that:

  • Phi-2 is not as responsive to instructions, particularly regarding how to format the input

  • The full context window of the model doesn't seem well utilized. Adding chunks of context degrades the performance.

We finetune the model to help overcome both of these challenges.

We use LoRA () to keep the finetuning cheap (in terms of computational resources needed).

Training Data

We select 1400 questions from the 1461 provided in the training txt file. We leave the rest (61) as unseen for validation purposes.

We use our RAG approach with chunk size 150 and k = 7 and generate static context that we add to the training txt. This can be seen in our repo: ()

We use this context for finetuning.

Prompt

The prompt includes:

  • instructions for the QA task, and the expected formatting

  • context

  • abbreviations

  • question

  • options

Objective

The objective is simple: Given the instruction, context, abbreviations, question and options, generate the correct option and an explanation

def formatting_func(self, example, abbreviations):
    prompt = f"Instruct: You will answer each question correctly by giving only the Option ID, the number that follows each Option.\n"
    prompt += f"The output should be in the format: Option <Option id>\n"
    prompt += f"Provide the answer to the following multiple choice question in the specified format.\n\n"
    prompt += f"Context: {example.context}\n\n"
    abbreviations_text = "\n".join([f"{list(abbrev.keys())[0]}: {list(abbrev.values())[0]}" for abbrev in abbreviations])
    f"Abbreviations:\n{abbreviations_text}\n\n"
    prompt += f"Question: {example.question}\n"
    for i, option in enumerate(example.options, 1):
        prompt += f"Option {i}: {option}\n"
    prompt += "Answer: Option"
    
    target = f"{example.answer}\nExplanation: {example.explanation}"
    
    return prompt + target

Training config

LoRA config

we encourage you to refer to the LoRA paper for futher clarifications

  • α=64\alpha =64α=64

  • r=32r = 32r=32

  • no bias

  • dropout=0.05dropout = 0.05dropout=0.05

Training args

  • batch size = 1

  • gradient accumulation steps = 4

  • epochs = 2 (700 steps)

  • max learning rate = 5×10−55 \times 10^{-5}5×10−5

  • lr scheduler: linear

  • warmup steps = 100

We make available our finetuned models:

(Best):

k=3:

no context:

📎
https://arxiv.org/pdf/2106.09685
https://github.com/Alexgichamba/itu_qna_challenge/blob/main/data/qs_train_with_context.txt
https://huggingface.co/alexgichamba/phi-2-finetuned-qa-lora-r32-a16_longcontext
https://huggingface.co/alexgichamba/phi-2-finetuned-qa-lora-r32-a16_ogcontext
https://huggingface.co/alexgichamba/phi-2-finetuned-qa-lora-r32-a16_notag
Evaluation accuracy from training with k=3 and k=7