Cloudflare AI

Meta/LLaMA-2-7B-chat-int8

A 7 billion parameter version of the LLaMA 2 language model, optimized for chat and quantized to 8-bit for efficient inference.

7B Parameters 8-bit Quantized Chat Optimized Context: 4096 tokens

Hello! I'm LLaMA 2, a large language model running on Cloudflare's infrastructure. How can I assist you today?

LLaMA 2 may produce inaccurate information. Consider checking important information.

API Usage

curl -X POST \
  https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/meta/llama-2-7b-chat-int8 \
  -H "Authorization: Bearer {api_token}" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Explain quantum computing"}]}'

View API documentation

Made with DeepSite - 🧬 Remix