A 7 billion parameter version of the LLaMA 2 language model, optimized for chat and quantized to 8-bit for efficient inference.
curl -X POST \
https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/meta/llama-2-7b-chat-int8 \
-H "Authorization: Bearer {api_token}" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Explain quantum computing"}]}'