Instrumentation and telemetry in models(模型中的仪表化与遥测技术)¶
It is possible to emit certain types of telemetry from your models to allow for monitoring and debugging of inference workflows in production.
The capabilities described on this page apply to core machine learning models. For language models, see the language model adapters reference.
To learn how to view telemetry emitted by your models, see our AIP observability documentation.
Supported telemetry types¶
Models support custom logs and custom spans. Metrics are not user-defined; Foundry records the total execution duration for all inference calls to a model.
A single span over the total execution duration of an inference call, along with a single request log, is automatically created for every model invocation.
Environment dependencies¶
The OpenTelemetry libraries must be added explicitly to your model's environment. Add the following packages:
opentelemetry-apiandopentelemetry-sdk: Required to emit custom logs and custom spans.opentelemetry-instrumentation: Required to automatically instrument outbound network requests made from your model.
Logs¶
You can emit custom logs from your model and view them retroactively. The following example demonstrates how to emit logs from a model using the standard Python logging module.
Foundry will set up the global logger provider for the OpenTelemetry SDK, so logs written through Python's logging module are routed to Foundry automatically. If you want to use third-party libraries for logging, you must configure them to emit logs through a logger obtained from the global logger provider.
import logging
import palantir_models as pm
logger = logging.getLogger(__name__)
class ExampleModelAdapter(pm.ModelAdapter):
...
def predict(self, df_in):
logger.info("Running inference on %d rows.", len(df_in))
return self.model.predict(df_in)
Spans¶
You can also create custom spans in your model to track the duration of specific operations within a single inference call. The following example demonstrates how to create a custom span around a preprocessing step.
Foundry will set up the global tracer provider for the OpenTelemetry SDK, and you will be able to retrieve a tracer from it. If you want to use third-party libraries for tracing, you must configure them to emit traces through a tracer obtained from the global tracer provider.
import palantir_models as pm
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
class ExampleModelAdapter(pm.ModelAdapter):
...
def predict(self, df_in):
with tracer.start_as_current_span("preprocess"):
features = self._preprocess(df_in)
with tracer.start_as_current_span("inference"):
return self.model.predict(features)
中文翻译¶
模型中的仪表化与遥测技术¶
可以从模型中发出特定类型的遥测数据,以便在生产环境中监控和调试推理工作流。
本页面描述的功能适用于核心机器学习模型。对于语言模型,请参阅语言模型适配器参考文档。
如需了解如何查看模型发出的遥测数据,请参阅我们的AIP可观测性文档。
支持的遥测类型¶
模型支持自定义日志(custom logs)和自定义跨度(custom spans)。指标(metrics)不可由用户定义;Foundry会记录模型所有推理调用的总执行时长。
每次模型调用都会自动创建一个覆盖推理调用总执行时长的单一跨度,以及一条单一请求日志。
环境依赖项¶
必须将OpenTelemetry库显式添加到模型的环境中。请添加以下软件包:
opentelemetry-api和opentelemetry-sdk:用于发出自定义日志和自定义跨度。opentelemetry-instrumentation:用于自动对模型发出的出站网络请求进行仪表化。
日志¶
您可以从模型中发出自定义日志,并事后查看这些日志。以下示例演示了如何使用标准Python logging模块从模型发出日志。
Foundry会为OpenTelemetry SDK设置全局日志提供程序(global logger provider),因此通过Python的logging模块写入的日志会自动路由到Foundry。如果您想使用第三方日志库,则必须将其配置为通过从全局日志提供程序获取的日志记录器来发出日志。
import logging
import palantir_models as pm
logger = logging.getLogger(__name__)
class ExampleModelAdapter(pm.ModelAdapter):
...
def predict(self, df_in):
logger.info("正在对 %d 行数据执行推理。", len(df_in))
return self.model.predict(df_in)
跨度¶
您还可以在模型中创建自定义跨度,以跟踪单次推理调用中特定操作的持续时间。以下示例演示了如何在预处理步骤周围创建自定义跨度。
Foundry会为OpenTelemetry SDK设置全局追踪提供程序(global tracer provider),您将能够从中获取追踪器。如果您想使用第三方追踪库,则必须将其配置为通过从全局追踪提供程序获取的追踪器来发出追踪数据。
import palantir_models as pm
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
class ExampleModelAdapter(pm.ModelAdapter):
...
def predict(self, df_in):
with tracer.start_as_current_span("preprocess"):
features = self._preprocess(df_in)
with tracer.start_as_current_span("inference"):
return self.model.predict(features)