metadata

license: mit
language:
  - en
pipeline_tag: text-generation
library_name: transformers
tags:
  - tinyllama
  - fine-tuned
  - chat
  - conversational
  - rlaif
  - alignment
  - peft
  - lora
model-index:
  - name: TinyPi-1.1B-Chat-v1.5
    results:
      - task:
          type: text-generation
        metrics: []

TinyPi-1.1B-Chat-v1.5

Model Description

TinyPi-1.1B-Chat-v1.5 is an advanced, conversational language model that represents a significant evolution from its v1 predecessor. Starting with a base model fine-tuned on a large corpus of Discord chat data, this version has undergone a sophisticated second stage of alignment using Reinforcement Learning from AI Feedback (RLAIF).

The goal of this project was to cultivate an AI with a distinct, friendly, and engaging personality. While the v1 model successfully developed a unique "voice," it sometimes lacked factual depth and consistency. The v1.5 update addresses this directly by training the model on a high-quality dataset of corrections generated by a superior AI (Google's Gemini 1.5 Flash).

This process has made TinyPi not only more knowledgeable and less prone to repetitive loops but has also sharpened its persona, making it a more robust, reliable, and delightful conversational partner.

How to Use

This is a merged, standalone model and can be used directly for text generation. For best results, use the chat template which includes a system prompt to guide its persona.

Installation

pip install transformers torch accelerate

Inference with Python

from transformers import pipeline
import torch

model_path = "Kittykat924/TinyPi-Chat-v1.5"
pipe = pipeline(
    "text-generation",
    model=model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "What's a creative way to explain how a CPU works?"

# Format the conversation using the chat template
messages = [
    {"role": "user", "content": prompt},
]
prompt_formatted = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate a response
outputs = pipe(
    prompt_formatted,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95
)

# Extract and print the assistant's response
response = outputs[0]["generated_text"]
assistant_response = response.split("<|assistant|>")[1].strip()
print(assistant_response)

Training Procedure

This model was developed in a two-stage fine-tuning process.

Stage 1: Initial Persona Fine-tuning (Creation of v1)

Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Dataset: A large, private dataset of over 2 million general-purpose Discord chat messages.
Method: LoRA fine-tuning using the peft library.
Result: A model with a strong, emergent personality but with some factual inconsistencies and conversational weaknesses (e.g., repetitiveness).

Stage 2: RLAIF Alignment (Creation of v1.5)

This stage used an automated, AI-driven data generation loop to correct the flaws of the v1 model.

"Student" Model: The merged v1 model from Stage 1.
"Teacher" (Evaluator) AI: gemini-1.5-flash.
"Chat Partner" AI: gemini-1.5-flash.
Workflow:
1. A conversation was initiated between the "Chat Partner" and "TinyPi" (v1).
2. For each of TinyPi's responses, the "Evaluator" AI judged its quality, accuracy, and adherence to the target persona.
3. If a response was flawed, the Evaluator generated a high-quality, corrected version.
4. Only these (instruction, corrected_output) pairs were saved, creating a dataset focused exclusively on fixing the model's mistakes.
Dataset: [Customize] Approximately [e.g., 1,200] high-quality, corrected examples generated by this RLAIF process.
Continual Learning: To prevent catastrophic forgetting, the RLAIF dataset was combined with a small "replay" sample (~20,000 examples) of the original Discord data.
Final Fine-tune: A new LoRA adapter was trained on this combined dataset, starting from the v1 model. This new adapter was then merged to create the final v1.5 model.

Model Capabilities and Limitations

Capabilities:

Maintains a consistent, friendly, and humorous persona.
Engages in coherent, multi-turn conversations on a wide variety of topics.
Improved factual accuracy and reasoning ability on subjects covered during the RLAIF process.
Less prone to generic refusals and repetitive loops compared to v1.

Limitations:

This model is designed for conversational and entertainment purposes. It is not a substitute for expert advice and may still produce factual inaccuracies.
Its personality is a core feature. It may not be suitable for tasks requiring a purely neutral or formal tone.
The model inherits biases from its training data, which includes a large corpus of internet chat logs and AI-generated text. User discretion is advised.

-Kittykat924