Prompt Caching giảm chi phí API 90%

Prompt caching là gì?

Prompt caching lưu system prompt trên server API. Request sau tái sử dụng bản cached với giảm 90% giá input tokens. System prompt 2,000 tokens × 100 requests/ngày = tiết kiệm $5.67/ngày.

Cách hoạt động

Request đầu: Giá đầy đủ — prompt được cache server-side
Request sau (trong 5 phút): Giảm 90% — chỉ user message trả giá gốc
Sau 5 phút idle: Cache hết hạn, request tiếp tạo lại cache

Triển khai Python

Python

import anthropic

client = anthropic.Anthropic(
    api_key="izzi-YOUR_KEY_HERE",
    base_url="https://api.izziapi.com/anthropic"
)

SYSTEM_PROMPT = """Bạn là chuyên gia review code Python..."""

def review_code(code: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=3000,
        system=[{
            "type": "text",
            "text": SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"}
        }],
        messages=[{"role": "user", "content": f"Review code:\n{code}"}]
    )
    
    cached = getattr(response.usage, 'cache_read_input_tokens', 0)
    if cached > 0:
        print(f"💰 Cache HIT: {cached} tokens cached, tiết kiệm 90%")
    
    return response.content[0].text

So sánh chi phí

Metric	Không cache	Có cache
Chi phí system prompt/ngày	$2.10	$0.21
Tổng/tháng	$69.30	$27.72
Tiết kiệm/tháng	—	$41.58 (60%)

Best practices

Đặt nội dung tĩnh trước (system prompt, examples)
Giữ cache warm — request mỗi 5 phút khi đang dùng
Tối thiểu 1,024 tokens để cache hoạt động