跳转至

Evaluations metrics dashboard(评估指标仪表盘)

Metrics from evaluation suite runs are collected in reports that can be viewed in the AIP Evals metrics dashboard. Here, you can view charts and statistics or compare aggregate results from evaluation functions and/or results from individual test cases. Note that metric objectives are not supported in the dashboard view.

The aggregate metrics view in the evaluations metrics dashboard

To access the dashboard select View metrics dashboard in the run results view on the Logic sidebar or the Run tests tab on the evaluation suite page.

Access the metrics dashboard

For deeper analysis and debugging, you can access the LLM trace viewer. Navigate to the View tests tab and double click into a test case to open the trace viewer. Here, you will be able to view execution information outlining how the function result was computed. If you are using a custom LLM as a judge evaluator, the LLM trace viewer will also include information about the decision-making process of the LLM judge.

Navigate from the metrics dashboard to the LLM trace viewer.


中文翻译

评估指标仪表盘

评估套件运行的指标会收集到报告中,您可以在 AIP Evals 评估指标仪表盘(metrics dashboard)中查看。在这里,您可以查看图表和统计数据,或比较评估函数的聚合结果以及/或单个测试用例的结果。请注意,指标目标(metric objectives)在仪表盘视图中不受支持。

评估指标仪表盘中的聚合指标视图

要访问仪表盘,请在 Logic 侧边栏的运行结果视图中选择 查看指标仪表盘(View metrics dashboard),或在评估套件页面上选择 运行测试(Run tests) 选项卡。

访问指标仪表盘

如需进行更深入的分析和调试,您可以访问 LLM 追踪查看器(LLM trace viewer)。导航至 查看测试(View tests) 选项卡,双击某个测试用例即可打开追踪查看器。在这里,您将能够查看说明函数结果计算方式的执行信息。如果您使用自定义 LLM 作为评判评估器(judge evaluator),LLM 追踪查看器还将包含 LLM 评判决策过程的相关信息。

从指标仪表盘导航至 LLM 追踪查看器