format: md
Messages API
Endpoints core do modelo de linguagem. Estas são as únicas chamadas atualmente implementadas como NATIVO no LiteLLM Gateway.
format: md
POST /v1/messages
POST /v1/messages
Envia uma mensagem para o modelo e recebe uma resposta gerada.
Headers
| Header | Valor | Obrigatório |
|---|---|---|
Authorization | Bearer <token> ou x-api-key <key> | Sim |
anthropic-version | 2023-06-01 | Sim |
anthropic-beta | managed-agents-2026-04-01, files-api-2025-04-14, skills-2025-10-02, user-profiles-2026-03-24, structured-outputs-2025-12-15, token-counting-2024-11-01, ccr-byoc-2025-07-29, oauth-2025-04-20 | Opcional |
Request Body
{
model: string; // ID do modelo
messages: Array<{
role: "user" | "assistant";
content: string | Array<ContentBlock>;
}>;
max_tokens?: number; // Máximo de tokens na resposta
stream?: boolean; // Habilitar SSE streaming
thinking?: {
type: "enabled";
budget_tokens: number; // 0-2047=low, 2048-8191=medium, >=8192=high
};
temperature?: number; // 0-1, default 1.0
top_p?: number; // Nucleus sampling
top_k?: number; // Top-k sampling
stop_sequences?: string[]; // Sequências de parada
system?: string | Array<SystemBlock>; // System prompt
metadata?: Record<string, any>; // Metadados da requisição
tools?: Array<ToolDefinition>; // Ferramentas disponíveis
tool_choice?: ToolChoice; // Controle de seleção de ferramenta
}
Response (non-streaming)
{
id: string; // "msg_..."
type: "message";
role: "assistant";
content: Array<ContentBlock>; // TextBlock | ToolUseBlock | ThinkingBlock
model: string;
stop_reason: "end_turn" | "max_tokens" | "stop_sequence" | "tool_use" | null;
stop_sequence: string | null;
usage: {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens?: number;
cache_read_input_tokens?: number;
server_tool_use?: {
web_search_requests?: number;
web_fetch_requests?: number;
};
};
}
Exemplo curl
curl -X POST http://localhost:4000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'
Códigos de Erro
| Código | Significado |
|---|---|
400 | Bad request (schema inválido) |
401 | Não autorizado (API key inválida) |
403 | Proibido (sem acesso ao modelo) |
404 | Modelo não encontrado |
413 | Payload muito grande |
429 | Rate limit excedido |
500 | Erro interno do servidor |
529 | Servidor temporariamente sobrecarregado |
format: md
POST /v1/messages (Streaming)
POST /v1/messages stream: true
Mesmo endpoint com stream: true no body. Retorna um stream SSE (Server-Sent Events) com eventos:
| Evento | Dados |
|---|---|
message_start | { type: "message_start", message: Message } |
content_block_start | { type: "content_block_start", index, content_block } |
content_block_delta | { type: "content_block_delta", index, delta } |
content_block_stop | { type: "content_block_stop", index } |
message_delta | { type: "message_delta", delta, usage } |
message_stop | { type: "message_stop" } |
ping | { type: "ping" } |
Exemplo curl (streaming)
curl -X POST http://localhost:4000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"stream": true,
"messages": [
{"role": "user", "content": "Tell me a story"}
]
}'
format: md
POST /v1/messages/count_tokens
POST /v1/messages/count_tokens
Conta tokens sem gerar resposta. Requer beta header token-counting-2024-11-01.
Request Body
{
model: string;
messages: Array<{ role: string; content: string | Array<ContentBlock> }>;
system?: string | Array<SystemBlock>;
tools?: Array<ToolDefinition>;
}
Response
{
input_tokens: number;
output_tokens?: number;
}
Exemplo curl
curl -X POST http://localhost:4000/v1/messages/count_tokens \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: token-counting-2024-11-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "Hello"}
]
}'
format: md
POST /v1/messages/batches
POST /v1/messages/batches
Cria um batch de mensagens para processamento assíncrono. Requer beta header managed-agents-2026-04-01.
Implementação: CUSTOM (necessita hook personalizado no LiteLLM)
Request Body
{
// Campos NDA
// Similar a POST /v1/messages mas com múltiplas requisições
}
Response
{
id: string; // "batch_..."
// status fields
}
format: md
GET /v1/messages/batches/{batch_id}
GET /v1/messages/batches/{batch_id}
Recupera o status de um batch.
Parâmetros Path
| Parâmetro | Tipo | Descrição |
|---|---|---|
batch_id | string | ID do batch (prefixo batch_) |