Progress (As of 2025-04-23 ~08:04 UTC+8)

1. What Works

Core Benchmark Execution: Can run concurrent requests (both streaming and non-streaming modes selectable via config) for a specified duration.
Configuration Loading: Reads parameters from config.yaml.
Concurrency Control: Limits active workers based on the concurrency setting.
HTTP Client: FastHTTPClient handles both request types:
- Non-streaming (Do) uses fasthttp.
- Streaming (Stream) uses net/http internally and processes SSE events.
Tokenizer Integration: Generates prompts and counts response tokens.
Basic Stats Collection:
- Records individual RequestResult (IsSuccess, Latency, TimeToFirstToken, TotalTokens) correctly for both streaming and non-streaming requests.
- Calculates and displays aggregate stats upon completion: Total Requests, Success/Fail counts, Success Rate, Avg QPS, Latency (Avg/Min/Max/Percentiles), TTFT (Avg/Min/Max/Percentiles), Avg Tokens/Second.

Error Reporting: Currently only tracks success/failure boolean. Need to capture and potentially report specific error messages (RequestResult.Error field).
Token Statistics: While individual token counts are recorded, aggregate token stats (e.g., average tokens per successful request, total tokens generated) are not yet calculated or displayed in the final report.
Non-Streaming TTFT: The current TTFT for non-streaming requests is approximated as the total latency. This might need refinement or clearer definition.
Streaming Timeout: The timeout for the entire streaming request is hardcoded in client.Stream. Consider making this configurable.
Report Generation: Final report is currently basic console output. Planning to implement HTML report generation (possibly using go-echarts).
Code Refinement: pkg/concurrency/manager.go's runWorker could be refactored for better readability/maintainability.
Testing: Need comprehensive integration tests covering various config scenarios (especially different prompt sizes, rate limits, etc.). Existing unit tests cover client and potentially other modules.