Zindi ITU Challenge Docs
  • Overview
    • 💡Introduction
    • ☎️Challenges and Objectives
  • METHOD
    • 📪RAG
    • 📎Finetuning (Phi-2)
    • 🧙‍♂️Response Scoring (Falcon7B)
    • 🤖Abbreviations
  • Final Systems
    • 🛞Phi-2
    • 🛻Falcon7B
  • REPRODUCING RESULTS
    • 🚀Installation Instructions
    • 🏓Phi-2
    • ⚾Falcon7B
  • Links
Powered by GitBook
On this page
  • Building the glossary
  • Using the glossary
  1. METHOD

Abbreviations

This page details how we build our abbreviations glossary and use it in the prompt

As seen in Challenges and Objectives there are many abbreviations of technical terms used in the questions and options. To help the model make informed responses, we add abbreviations to the prompt for any abbreviations seen in the question and options.

Building the glossary

While going through the documents provided, we found that the docs include an abbreviations section listing the abbreviations and full forms used throughout the documents.

We search through the docs to find these abbreviations.

There are a couple of challenges that are apparent:

  • duplicate abbreviations and full forms

  • different full forms of the abbreviation appearing in different texts

  • typos/inconsistent formatting

We apply some heuristics to help build a better glossary:

  • no abbreviations to be added to the glossary can be shorter than 2 characters

  • if a new, different full form of an abbreviation is found, it must differ by an edit distance of at least 3 to be added to the glossary (else it must be a negligible difference)

  • if a new, full form of an abbreviation is a subset of the existing one, retain only the superset

  • if a new, different full form of an abbreviation satisfies the two conditions above, then it is added to the glossary as an alternative definition using the conjunction "or"

Finally, we manually review the abbreviations to remove unsuitable duplicates to keep the glossary as informative as possible. We reduced duplicates from 700+ to a few tens.

Using the glossary

When any abbreviations from the glossary are seen in the question or options, we add them to the context. We find this useful particulary for questions that ask for full forms of abbreviations.

def create_prompt(question, context, abbrevs):
    context = context.strip()
    abbreviations_text = "\n".join([f"{list(abbrev.keys())[0]}: {list(abbrev.values())[0]}" for abbrev in abbrevs])
    prompt = (
        f">>DOMAIN<<\n{context}\n"
        f"Abbreviations:\n{abbreviations_text}\n"
        f">>QUESTION<<{question}\n\n"
        f">>ANSWER<<"
    )
    return prompt
PreviousResponse Scoring (Falcon7B)NextPhi-2

Last updated 10 months ago

🤖