1023 B

Project Brief: LLM API Benchmark Tool

1. Core Goal

Develop a command-line tool in Go to benchmark the performance of Large Language Model (LLM) APIs.

2. Key Features

  • Send configurable concurrent requests (streaming and non-streaming) to a specified LLM API endpoint.
  • Measure and report key performance metrics:
    • Request Success/Failure Rate
    • Queries Per Second (QPS)
    • Latency (min, max, average, percentiles)
    • Time To First Token (TTFT) for streaming and non-streaming (min, max, average, percentiles)
    • Token generation rate (tokens/second, average)
  • Support configurable parameters via a YAML file (API endpoint, key, model, concurrency, duration, rate limit, prompt size, streaming mode, timeouts, etc.).
  • Generate a summary report (initially console output, potentially HTML later).

3. Scope

  • Focus on HTTP-based LLM APIs (e.g., OpenAI compatible).
  • Initial implementation targets core metrics.
  • Extensibility for different client implementations or reporting formats is desirable.