Abbreviations
This page details how we build our abbreviations glossary and use it in the prompt
As seen in Challenges and Objectives there are many abbreviations of technical terms used in the questions and options. To help the model make informed responses, we add abbreviations to the prompt for any abbreviations seen in the question and options.
Building the glossary
While going through the documents provided, we found that the docs include an abbreviations section listing the abbreviations and full forms used throughout the documents.
We search through the docs to find these abbreviations.
There are a couple of challenges that are apparent:
duplicate abbreviations and full forms
different full forms of the abbreviation appearing in different texts
typos/inconsistent formatting
We apply some heuristics to help build a better glossary:
no abbreviations to be added to the glossary can be shorter than 2 characters
if a new, different full form of an abbreviation is found, it must differ by an edit distance of at least 3 to be added to the glossary (else it must be a negligible difference)
if a new, full form of an abbreviation is a subset of the existing one, retain only the superset
if a new, different full form of an abbreviation satisfies the two conditions above, then it is added to the glossary as an alternative definition using the conjunction "or"
Finally, we manually review the abbreviations to remove unsuitable duplicates to keep the glossary as informative as possible. We reduced duplicates from 700+ to a few tens.
Using the glossary
When any abbreviations from the glossary are seen in the question or options, we add them to the context. We find this useful particulary for questions that ask for full forms of abbreviations.
Last updated