README.md · Kittykat924/TinyPi-Chat-v1.5 at main

TinyPi-Chat-v1.5 / README.md

Kittykat924

Update README.md

fef6c70 verified 4 days ago

preview code

raw

history blame contribute delete

4.98 kB

	---
	license: mit
	language:
	- en
	pipeline_tag: text-generation
	library_name: transformers

	tags:
	- tinyllama
	- fine-tuned
	- chat
	- conversational
	- rlaif
	- alignment
	- peft
	- lora

	model-index:
	- name: TinyPi-1.1B-Chat-v1.5
	results:
	- task:
	type: text-generation
	metrics: []
	---

	# TinyPi-1.1B-Chat-v1.5

	## Model Description

	TinyPi-1.1B-Chat-v1.5 is an advanced, conversational language model that represents a significant evolution from its v1 predecessor. Starting with a base model fine-tuned on a large corpus of Discord chat data, this version has undergone a sophisticated second stage of alignment using Reinforcement Learning from AI Feedback (RLAIF).

	The goal of this project was to cultivate an AI with a distinct, friendly, and engaging personality. While the v1 model successfully developed a unique "voice," it sometimes lacked factual depth and consistency. The v1.5 update addresses this directly by training the model on a high-quality dataset of corrections generated by a superior AI (Google's Gemini 1.5 Flash).

	This process has made TinyPi not only more knowledgeable and less prone to repetitive loops but has also sharpened its persona, making it a more robust, reliable, and delightful conversational partner.

	## How to Use

	This is a merged, standalone model and can be used directly for text generation. For best results, use the chat template which includes a system prompt to guide its persona.

	### Installation

	```bash
	pip install transformers torch accelerate
	```

	### Inference with Python

	```python
	from transformers import pipeline
	import torch

	model_path = "Kittykat924/TinyPi-Chat-v1.5"
	pipe = pipeline(
	"text-generation",
	model=model_path,
	torch_dtype=torch.float16,
	device_map="auto"
	)

	prompt = "What's a creative way to explain how a CPU works?"

	# Format the conversation using the chat template
	messages = [
	{"role": "user", "content": prompt},
	]
	prompt_formatted = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	# Generate a response
	outputs = pipe(
	prompt_formatted,
	max_new_tokens=256,
	do_sample=True,
	temperature=0.7,
	top_k=50,
	top_p=0.95
	)

	# Extract and print the assistant's response
	response = outputs[0]["generated_text"]
	assistant_response = response.split("<\|assistant\|>")[1].strip()
	print(assistant_response)
	```

	## Training Procedure

	This model was developed in a two-stage fine-tuning process.

	### Stage 1: Initial Persona Fine-tuning (Creation of v1)

	* Base Model: `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
	* Dataset: A large, private dataset of over 2 million general-purpose Discord chat messages.
	* Method: LoRA fine-tuning using the `peft` library.
	* Result: A model with a strong, emergent personality but with some factual inconsistencies and conversational weaknesses (e.g., repetitiveness).

	### Stage 2: RLAIF Alignment (Creation of v1.5)

	This stage used an automated, AI-driven data generation loop to correct the flaws of the v1 model.

	* "Student" Model: The merged `v1` model from Stage 1.
	* "Teacher" (Evaluator) AI: `gemini-1.5-flash`.
	* "Chat Partner" AI: `gemini-1.5-flash`.
	* Workflow:
	1. A conversation was initiated between the "Chat Partner" and "TinyPi" (v1).
	2. For each of TinyPi's responses, the "Evaluator" AI judged its quality, accuracy, and adherence to the target persona.
	3. If a response was flawed, the Evaluator generated a high-quality, corrected version.
	4. Only these `(instruction, corrected_output)` pairs were saved, creating a dataset focused exclusively on fixing the model's mistakes.
	* Dataset: [Customize] Approximately [e.g., `1,200`] high-quality, corrected examples generated by this RLAIF process.
	* Continual Learning: To prevent catastrophic forgetting, the RLAIF dataset was combined with a small "replay" sample (~20,000 examples) of the original Discord data.
	* Final Fine-tune: A new LoRA adapter was trained on this combined dataset, starting from the v1 model. This new adapter was then merged to create the final v1.5 model.

	## Model Capabilities and Limitations

	Capabilities:
	* Maintains a consistent, friendly, and humorous persona.
	* Engages in coherent, multi-turn conversations on a wide variety of topics.
	* Improved factual accuracy and reasoning ability on subjects covered during the RLAIF process.
	* Less prone to generic refusals and repetitive loops compared to v1.

	Limitations:
	* This model is designed for conversational and entertainment purposes. It is not a substitute for expert advice and may still produce factual inaccuracies.
	* Its personality is a core feature. It may not be suitable for tasks requiring a purely neutral or formal tone.
	* The model inherits biases from its training data, which includes a large corpus of internet chat logs and AI-generated text. User discretion is advised.

	-Kittykat924