Spaces:

pabloce
/

exllama

Running on Zero

App Files Files Community

exllama / README.md

Pablo Carrera (パブロ)

Update README.md 5.29.0

c1cd445 unverified 12 days ago

preview code

raw

history blame contribute delete

2.28 kB

	---
	title: Exllama
	emoji: 😽
	colorFrom: purple
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.29.0
	app_file: app.py
	pinned: false
	header: mini
	fullWidth: true
	license: apache-2.0
	short_description: 'Chat: exllama v2'
	---

	# Exllama Chat 😽

	[![Open In Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/pabloce/exllama)
	[![Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

	A Gradio-based chat interface for ExLlamaV2, featuring Mistral-7B-Instruct-v0.3 and Llama-3-70B-Instruct models. Experience high-performance inference on consumer GPUs with Flash Attention support.

	## 🌟 Features

	- 🚀 Powered by ExLlamaV2 inference library
	- 💨 Flash Attention support for optimized performance
	- 🎯 Supports multiple instruction-tuned models:
	- Mistral-7B-Instruct v0.3
	- Meta's Llama-3-70B-Instruct
	- ⚡ Dynamic text generation with adjustable parameters
	- 🎨 Clean, modern UI with dark mode support

	## 🎮 Parameters

	Customize your chat experience with these adjustable parameters:

	- System Message: Set the AI assistant's behavior and context
	- Max Tokens: Control response length (1-4096)
	- Temperature: Adjust response creativity (0.1-4.0)
	- Top-p: Fine-tune response diversity (0.1-1.0)
	- Top-k: Control vocabulary sampling (0-100)
	- Repetition Penalty: Prevent repetitive text (0.0-2.0)

	## 🛠️ Technical Details

	- Framework: Gradio 5.5.0
	- Models: ExLlamaV2-compatible models
	- UI: Custom-themed interface with Gradio's Soft theme
	- Optimization: Flash Attention for improved performance

	## 🔗 Links

	- [Try it on Hugging Face Spaces](https://huggingface.co/spaces/pabloce/exllama)
	- [ExLlamaV2 GitHub Repository](https://github.com/turboderp/exllamav2)
	- [Join our Discord](https://discord.gg/gmVgCk6X2x)

	## 📝 License

	This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.

	## 🙏 Acknowledgments

	- [ExLlamaV2](https://github.com/turboderp/exllamav2) for the core inference library
	- [Hugging Face](https://huggingface.co/) for hosting and model distribution
	- [Gradio](https://gradio.app/) for the web interface framework

	---

	Made with ❤️ using ExLlamaV2 and Gradio