Serving#

LocalRQA provides methods to 1) deploy your RQA system with a simple model backend, and 2) launch an interactive UI for human evaluation or free chat. This can be used to showcase your RQA system to the public, or to collect human feedback to further improve your system using techniques such as RLHF.

Launchers#

Currently, we provide methods to launch your RQA system in two different ways.

Static evaluation: users directly evaluate the quality of pre-generated responses (e.g., computed from a test set). See Static Human Evaluation for more details.

Static Evaluation UI

Static Evaluation#

Interactive chat: users can chat with a system and rate the correctness/helpfulness of each response. See Interactive Chat for more details.

Interactive Chat UI

Interactive Chat#

Acceleration Frameworks#

We also integrate with several inference acceleration frameworks to speed up model’s retrieval/answer generation. Currently, we support the following frameworks:

To speed up retrieval:

To speed up text generation:

See Inference Acceleration Frameworks for more details on how to use them with the serving methods mentioned above.