# Project Brief: LLM API Benchmark Tool ## 1. Core Goal Develop a command-line tool in Go to benchmark the performance of Large Language Model (LLM) APIs. ## 2. Key Features - Send configurable concurrent requests (streaming and non-streaming) to a specified LLM API endpoint. - Measure and report key performance metrics: - Request Success/Failure Rate - Queries Per Second (QPS) - Latency (min, max, average, percentiles) - Time To First Token (TTFT) for streaming and non-streaming (min, max, average, percentiles) - Token generation rate (tokens/second, average) - Support configurable parameters via a YAML file (API endpoint, key, model, concurrency, duration, rate limit, prompt size, streaming mode, timeouts, etc.). - Generate a summary report (initially console output, potentially HTML later). ## 3. Scope - Focus on HTTP-based LLM APIs (e.g., OpenAI compatible). - Initial implementation targets core metrics. - Extensibility for different client implementations or reporting formats is desirable.