RAG
This page details the components of our RAG pipeline.
Key Components
Document Indexing: Uses llama-index and ragatouille libraries to process, chunk and index documents for efficient retrieval.
Abbreviation Handling: Incorporates abbreviation expansions into the prompt to enrich the context.
Answer Generation: Uses our fine-tuned model to generate answers based on a custom prompt.
Answer Parsing: Extracts and formats the model's response into a usable answer format.
Implementation Details
Document Indexing
The script uses a combination of llama-index and ragatouille to process and index documents. This process involves:
Loading the word documents using llama-index's SimpleDirectoryReader.
Splitting documents into chunks of 150 tokens.
Vectorizing chunks using ColBERT embeddings through ragatouille.
Creating an index of vectorized chunks which uses FAISS for efficient retrieval.
Abbreviation Handling
As mentioned in Abbreviations we use the abbreviation glossary to expand the prompt.
Answer Generation
Context Retrieval
For each question, we retrieve relevant context using the created index. This process fetches the top 13(Phi-2) and top 3(Falcon-7B) most relevant chunks from out indexed documents, which are then concatenated to form the context for the question at hand.
Prompt Creation
The contents of the prompt vary depending on the model. For Phi-2 the prompt includes:
An instruction for the model
The retrieved context
Relevant abbreviations and their expansions
The question
The answer options
A prefix for the answer prompt
The structure of the prompt for Falcon-7B is described here Response Scoring (Falcon7B).
Last updated