Quickstart#

In this quickstart we’ll show you:

  • How to prepare a document database for your RQA system

  • How to use SimpleRQA to quickly configure and run an RQA system

As a reference, the full example code can be found in demo.py script at the root of the repository.

Prepare Document#

LocalRQA integrates with frameworks such as LangChain and LlamaIndex to easily ingest text data in various formats, such as JSON data, HTML data, data from Google Drive, etc.

Note

The following step requires selenium to be installed correctly. If you don’t have it configured correctly, you can skip this step as we have already prepared the databricks_web.pkl file for you in the example/demo folder.

For example, you could load data from a website using SeleniumURLLoader from langchain, then save and parse them into a collection of documents (docs):

from langchain_community.document_loaders import SeleniumURLLoader
from langchain.text_splitter import CharacterTextSplitter
from local_rqa.text_loaders.langchain_text_loader import LangChainTextLoader

# specify how to load the data and how to chunk them
loader_func, split_func = SeleniumURLLoader, CharacterTextSplitter
loader_parameters = {'urls': ["https://docs.databricks.com/en/dbfs/index.html"]}
splitter_parameters = {'chunk_size': 400, 'chunk_overlap': 50, 'separator': "\n\n"}
kwargs = {"loader_params": loader_parameters, "splitter_params": splitter_parameters}

# load the data, chunk them, and save them
docs = LangChainTextLoader(
    save_folder="example/demo",  # change this to your own folder
    save_filename="databricks_web.pkl",
    loader_func=loader_func,
    splitter_func=split_func
).load_data(**kwargs)

this list of documents (docs) is now your document database, which will be used to create an embedding index for the RQA system.

Build an RQA System#

Given a path to a document database (see above), we can directly use SimpleRQA to 1) create and save an embedding index if <example/index> is empty, 2) plugin an embedding model and a generative model, and 3) run QA!

from local_rqa.pipelines.retrieval_qa import SimpleRQA
from local_rqa.schema.dialogue import DialogueSession

rqa = SimpleRQA.from_scratch(
    document_path="example/demo/databricks_web.pkl",
    index_path="example/demo/index",
    embedding_model_name_or_path="intfloat/e5-base-v2",  # embedding model
    qa_model_name_or_path="lmsys/vicuna-7b-v1.5"  # generative model
)
response = rqa.qa(
    batch_questions=['What is DBFS?'],
    batch_dialogue_session=[DialogueSession()],
)
print(response.batch_answers[0])
# DBFS stands for Databricks File System, which is a ...

where response is a RQAOutput object:

class RQAOutput:
    batch_answers: List[str]
    batch_source_documents: List[List[Document]]
    batch_dialogue_session: List[DialogueSession]

Next Steps#

Beyond this simple example, you can:

  • prepare your own RQA data for training, evaluation, and serving (Data)

  • train your own models using algorithms we implemented from latest research (Training)

  • evaluate your RQA system with automatic metrics (Evaluation)

  • deploy your RQA system to interact with real users (Serving)