Skip to main content

format: md

Messages API

Endpoints core do modelo de linguagem. Estas são as únicas chamadas atualmente implementadas como NATIVO no LiteLLM Gateway.


format: md

POST /v1/messages

POST /v1/messages

Envia uma mensagem para o modelo e recebe uma resposta gerada.

Headers

HeaderValorObrigatório
AuthorizationBearer <token> ou x-api-key <key>Sim
anthropic-version2023-06-01Sim
anthropic-betamanaged-agents-2026-04-01, files-api-2025-04-14, skills-2025-10-02, user-profiles-2026-03-24, structured-outputs-2025-12-15, token-counting-2024-11-01, ccr-byoc-2025-07-29, oauth-2025-04-20Opcional

Request Body

{
model: string; // ID do modelo
messages: Array<{
role: "user" | "assistant";
content: string | Array<ContentBlock>;
}>;
max_tokens?: number; // Máximo de tokens na resposta
stream?: boolean; // Habilitar SSE streaming
thinking?: {
type: "enabled";
budget_tokens: number; // 0-2047=low, 2048-8191=medium, >=8192=high
};
temperature?: number; // 0-1, default 1.0
top_p?: number; // Nucleus sampling
top_k?: number; // Top-k sampling
stop_sequences?: string[]; // Sequências de parada
system?: string | Array<SystemBlock>; // System prompt
metadata?: Record<string, any>; // Metadados da requisição
tools?: Array<ToolDefinition>; // Ferramentas disponíveis
tool_choice?: ToolChoice; // Controle de seleção de ferramenta
}

Response (non-streaming)

{
id: string; // "msg_..."
type: "message";
role: "assistant";
content: Array<ContentBlock>; // TextBlock | ToolUseBlock | ThinkingBlock
model: string;
stop_reason: "end_turn" | "max_tokens" | "stop_sequence" | "tool_use" | null;
stop_sequence: string | null;
usage: {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens?: number;
cache_read_input_tokens?: number;
server_tool_use?: {
web_search_requests?: number;
web_fetch_requests?: number;
};
};
}

Exemplo curl

curl -X POST http://localhost:4000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello, Claude!"}
]
}'

Códigos de Erro

CódigoSignificado
400Bad request (schema inválido)
401Não autorizado (API key inválida)
403Proibido (sem acesso ao modelo)
404Modelo não encontrado
413Payload muito grande
429Rate limit excedido
500Erro interno do servidor
529Servidor temporariamente sobrecarregado

format: md

POST /v1/messages (Streaming)

POST /v1/messages stream: true

Mesmo endpoint com stream: true no body. Retorna um stream SSE (Server-Sent Events) com eventos:

EventoDados
message_start{ type: "message_start", message: Message }
content_block_start{ type: "content_block_start", index, content_block }
content_block_delta{ type: "content_block_delta", index, delta }
content_block_stop{ type: "content_block_stop", index }
message_delta{ type: "message_delta", delta, usage }
message_stop{ type: "message_stop" }
ping{ type: "ping" }

Exemplo curl (streaming)

curl -X POST http://localhost:4000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"stream": true,
"messages": [
{"role": "user", "content": "Tell me a story"}
]
}'

format: md

POST /v1/messages/count_tokens

POST /v1/messages/count_tokens

Conta tokens sem gerar resposta. Requer beta header token-counting-2024-11-01.

Request Body

{
model: string;
messages: Array<{ role: string; content: string | Array<ContentBlock> }>;
system?: string | Array<SystemBlock>;
tools?: Array<ToolDefinition>;
}

Response

{
input_tokens: number;
output_tokens?: number;
}

Exemplo curl

curl -X POST http://localhost:4000/v1/messages/count_tokens \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: token-counting-2024-11-01" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "Hello"}
]
}'

format: md

POST /v1/messages/batches

POST /v1/messages/batches

Cria um batch de mensagens para processamento assíncrono. Requer beta header managed-agents-2026-04-01.

Implementação: CUSTOM (necessita hook personalizado no LiteLLM)

Request Body

{
// Campos NDA
// Similar a POST /v1/messages mas com múltiplas requisições
}

Response

{
id: string; // "batch_..."
// status fields
}

format: md

GET /v1/messages/batches/&#123;batch_id&#125;

GET /v1/messages/batches/&#123;batch_id&#125;

Recupera o status de um batch.

Parâmetros Path

ParâmetroTipoDescrição
batch_idstringID do batch (prefixo batch_)

Response

{
id: string;
status: "in_progress" | "completed" | "failed" | "cancelled";
// ... outros campos
}

format: md

POST /v1/messages/batches//cancel

POST /v1/messages/batches//cancel

Cancela um batch em andamento.

Parâmetros Path

ParâmetroTipoDescrição
batch_idstringID do batch

Response

{
id: string;
status: "cancelled";
}

format: md

Schemas Compartilhados

ContentBlock

type ContentBlock = TextBlock | ToolUseBlock | ToolResultBlock | ThinkingBlock;

interface TextBlock {
type: "text";
text: string;
}

interface ToolUseBlock {
type: "tool_use";
id: string; // "tu_..."
name: string;
input: Record<string, any>;
}

interface ToolResultBlock {
type: "tool_result";
tool_use_id: string;
content: string | Array<ContentBlock>;
is_error?: boolean;
}

interface ThinkingBlock {
type: "thinking";
thinking: string;
signature?: string;
}

SystemBlock

type SystemBlock = {
type: "text";
text: string;
} | {
type: "document";
source: { type: "text"; media_type: "text/plain"; data: string; };
title?: string;
context?: string;
citations?: { enabled: boolean };
};

ToolDefinition

interface ToolDefinition {
name: string;
description?: string;
input_schema: {
type: "object";
properties?: Record<string, any>;
required?: string[];
};
type?: "custom" | "computer_20240619" | "bash_20241022" | "text_editor_20241022"
| "web_search" | "web_fetch" | "web_browser" | "mcp";
browser?: { hub_url?: string };
server_tool_use?: {
type: "web_search" | "web_fetch" | string;
};
}

interface ToolChoice {
type: "auto" | "any" | "required" | "tool";
name?: string;
disable_parallel_tool_use?: boolean;
}