<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=145304570664993&amp;ev=PageView&amp;noscript=1">
Ancient Greeks consulting a Pythia computer

May 31, 2023

OpenAssistant fine-tuned Pythia-12B: open-source ChatGPT alternative

Written By:

Steve Barlow

We're Hiring

Join us and build the next generation AI stack - including silicon, hardware and software - the worldwide standard for AI compute

Join our team

One of the most exciting open-source chatbot alternatives to ChatGPT – OpenAssistant’s OASST1 fine-tuned Pythia-12B - is now available to run for free on Paperspace, powered by Graphcore IPUs. This truly open-source model can be used commercially without restrictions.

oasst-sft-4-pythia12b is a variant of EleutherAI’s Pythia model family, fine-tuned using the Open Assistant Conversations (OASST1) dataset, a crowdsourced “human-generated, human-annotated assistant-style conversation corpus".

The OASST1 dataset consists of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees. 

Running OASST1 Fine-tuned Pythia-12B inference on Paperspace 

Open Assistant’s fine-tuned Pythia can easily be run on Graphcore IPUs using a Paperspace Gradient notebook. New users can try out Pythia on an IPU-POD4, with Paperspace’s six hour free trial. For a higher performance implementation, you can scale up to an IPU-POD16

The notebook guides you through creating and configuring an inference pipeline and running the pipeline to build a turn-by-turn conversation. 

Because the OpenAssistant model uses the same underlying Pythia-12B model as Dolly, we run it using the Dolly pipeline. 

Let's begin by loading the inference config. We use the same configuration file as Dolly and manually modify the vocab size which is the only difference between the model graphs. A configuration suitable for your instance will automatically be selected.


from utils.setup import dolly_config_setup config_name = "dolly_pod4" if number_of_ipus == 4 else "dolly_pod16" config, *_ = dolly_config_setup("config/inference.yml", "release", config_name) # Update vocab size for oasst-sft-4-pythia-12b-epoch-3.5 - 50288 rather than 50280 config.model.embedding.vocab_size = 50288 config

Next, we want to create our inference pipeline. Here we define the maximum sequence length and maximum micro batch size. Before executing a model on IPUs it needs to be turned into an executable format by compiling it. This will happen when the pipeline is created. All input shapes must be known before compiling, so if the maximum sequence length or micro batch size is changed, the pipeline will need to be recompiled.

Selecting a longer sequence length or larger batch size will use more IPU memory. This means that increasing one may require you to decrease the other.

This cell will take approximately 18 minutes to complete, which includes downloading the model weights.