21 lines
1023 B
Markdown
21 lines
1023 B
Markdown
# Project Brief: LLM API Benchmark Tool
|
|
|
|
## 1. Core Goal
|
|
Develop a command-line tool in Go to benchmark the performance of Large Language Model (LLM) APIs.
|
|
|
|
## 2. Key Features
|
|
- Send configurable concurrent requests (streaming and non-streaming) to a specified LLM API endpoint.
|
|
- Measure and report key performance metrics:
|
|
- Request Success/Failure Rate
|
|
- Queries Per Second (QPS)
|
|
- Latency (min, max, average, percentiles)
|
|
- Time To First Token (TTFT) for streaming and non-streaming (min, max, average, percentiles)
|
|
- Token generation rate (tokens/second, average)
|
|
- Support configurable parameters via a YAML file (API endpoint, key, model, concurrency, duration, rate limit, prompt size, streaming mode, timeouts, etc.).
|
|
- Generate a summary report (initially console output, potentially HTML later).
|
|
|
|
## 3. Scope
|
|
- Focus on HTTP-based LLM APIs (e.g., OpenAI compatible).
|
|
- Initial implementation targets core metrics.
|
|
- Extensibility for different client implementations or reporting formats is desirable.
|