2.5 KiB

Progress (As of 2025-04-23 ~08:04 UTC+8)

1. What Works

  • Core Benchmark Execution: Can run concurrent requests (both streaming and non-streaming modes selectable via config) for a specified duration.
  • Configuration Loading: Reads parameters from config.yaml.
  • Concurrency Control: Limits active workers based on the concurrency setting.
  • HTTP Client: FastHTTPClient handles both request types:
    • Non-streaming (Do) uses fasthttp.
    • Streaming (Stream) uses net/http internally and processes SSE events.
  • Tokenizer Integration: Generates prompts and counts response tokens.
  • Basic Stats Collection:
    • Records individual RequestResult (IsSuccess, Latency, TimeToFirstToken, TotalTokens) correctly for both streaming and non-streaming requests.
    • Calculates and displays aggregate stats upon completion: Total Requests, Success/Fail counts, Success Rate, Avg QPS, Latency (Avg/Min/Max/Percentiles), TTFT (Avg/Min/Max/Percentiles), Avg Tokens/Second.

2. What's Left / Needs Refinement

  • Error Reporting: Currently only tracks success/failure boolean. Need to capture and potentially report specific error messages (RequestResult.Error field).
  • Token Statistics: While individual token counts are recorded, aggregate token stats (e.g., average tokens per successful request, total tokens generated) are not yet calculated or displayed in the final report.
  • Non-Streaming TTFT: The current TTFT for non-streaming requests is approximated as the total latency. This might need refinement or clearer definition.
  • Streaming Timeout: The timeout for the entire streaming request is hardcoded in client.Stream. Consider making this configurable.
  • Report Generation: Final report is currently basic console output. Planning to implement HTML report generation (possibly using go-echarts).
  • Code Refinement: pkg/concurrency/manager.go's runWorker could be refactored for better readability/maintainability.
  • Testing: Need comprehensive integration tests covering various config scenarios (especially different prompt sizes, rate limits, etc.). Existing unit tests cover client and potentially other modules.

3. Current Status

  • The main bug related to recording streaming results has been fixed.
  • The client initialization logic is corrected.
  • The tool is functional for basic benchmarking scenarios, especially streaming.
  • Ready to proceed with further feature enhancements or refinements.

4. Known Issues

  • None critical at the moment. Previous lint warnings have been addressed.