Troubleshooting automation performance(自动化性能故障排除)¶
This guide helps you identify the root cause of unexpected performance issues and provides solutions. This page covers common performance spike patterns, a systematic diagnostic process, and immediate mitigation steps to reduce resource consumption.
:::callout{theme="info"} For proactive guidance on designing efficient automations, see Performance best practices. :::
Common performance spike patterns¶
Below are common patterns to help identify and diagnose performance issues with automations.
Pattern 1: Active automation on frequently updating objects¶
Symptoms: The automation ran many times in a short period, generating unexpected resource consumption. For example, an automation with a large number of object updates can generate many automation runs, each potentially triggering downstream effects.
Root cause: An object type updates very frequently, and the automation is configured with an "on object update" condition without a time-based cap. Often this happens when an automation that was previously paused gets unpaused.
How to fix:
- Immediately pause the automation to stop further executions.
- Add a time-based condition with an appropriate interval to cap execution frequency.
- Verify that Single execution mode is enabled.
- Resume the automation and monitor execution counts closely for the next day.
Pattern 2: Chained automations snowball¶
Symptoms: Multiple automations are running in sequence, with execution counts growing exponentially.
Root cause: Automations form a chain where each automation edits objects, triggering the next automation in the sequence:
Automation Aedits objects- These edits trigger
Automation B, which processes each object separately Automation Bedits more objects, triggeringAutomation C- The multiplicative effect can turn initial updates into exponentially more executions
How to fix:
- Pause all downstream automations in the chain.
- Evaluate whether you can consolidate the logic into fewer automations.
- Ensure that any remaining automations use bulk processing (ObjectSet inputs and Single execution mode).
- Re-enable automations one at a time, monitoring execution counts after each.
Pattern 3: Inefficient function operations¶
Symptoms: Function execution time is high, and resource consumption scales poorly with object count.
Root cause: Functions contain loops that query the Ontology once per iteration instead of processing objects in bulk.
How to fix:
- Review the function code for loops with Ontology queries inside.
- Refactor to load all required objects upfront or use backend aggregations.
- Change action inputs to accept ObjectSets instead of individual objects when possible.
For comprehensive function optimization guidance, see Optimize function performance.
Pattern 4: Function self-calling loop¶
Symptoms: A single function is being called many times, often recursively.
Root cause: A function edits objects, and those edits trigger the same automation that calls the function, creating a loop. Without guards in place, this can continue until manually stopped.
How to fix:
- Add a status flag or timestamp to objects to prevent re-processing.
- Add conditional logic to check whether processing is needed before making edits.
- Consider whether the logic should be moved outside of Automate entirely.
Diagnostic process¶
To investigate a performance spike, follow these steps:
- Check automation execution history: Open the automation and review the execution history. Key questions to consider:
- How many times did it run in the last day?
- When did the spike in executions begin?
-
Is the frequency increasing over time?
-
Identify condition frequency: Examine the condition configuration and object update patterns. Key questions to consider:
- How often are objects being updated that meet the conditions?
- Is a time-based condition configured? If so, what is the interval?
-
Are updates happening more frequently than expected?
-
Trace automation chains: Use Autopilot or Workflow Lineage to understand dependencies. Key questions to consider:
- Which automations trigger other automations?
- What is the full chain of effects?
-
Are there potential snowball effects where one execution multiplies?
-
Review function implementation: Examine the functions being called by the automations. Key topics to investigate:
- Do functions contain loops with Ontology queries?
- Are bulk processing patterns being used correctly?
-
Check function execution times and external call counts.
-
Look for recursive conditions: Determine if automations are triggering themselves. Key questions to consider:
- Does the function edit objects that cause the same automation's conditions to be met again?
- Are there status flags or guards to prevent recursive processing?
- Does the execution history show rapid repeated calls?
Immediate mitigation steps¶
When a performance or resource consumption spike is found, take these actions in priority order:
- Stop the bleeding
-
Immediately pause the automation that is causing the spike. This prevents further resource consumption during investigation.
-
Assess impact
- Check Resource Management to understand the total resource impact.
- Identify any downstream automations that may also need to be paused.
-
Determine how far back the issue extends.
-
Apply quick fix
- Add a time-based condition if one is missing.
- Change execution mode to Single execution if it is set to multiple.
-
Add conditional logic to the function to skip unnecessary processing.
-
Monitor recovery
- Resume the automation with reduced frequency or limited scope.
- Watch execution counts closely for the next day.
- Verify that resource consumption returns to expected levels.
Tools and resources¶
Below are several resources for diagnostic information.
For execution history and cost breakdowns¶
- Automate execution history: Shows run counts, timing, and success or failure status
- Function logs: Contains error messages and execution timing details
- Ontology Manager: Shows query costs, though this is less useful for immediate troubleshooting
- Resource Management: Provides overall cost breakdown by service and resource type
Automation workflow overview¶
- Autopilot: Control center for managing and monitoring automation workflows at scale
- Workflow Lineage: Visualizes automation dependencies and chains
中文翻译¶
自动化性能故障排除¶
本指南帮助您识别意外性能问题的根本原因并提供解决方案。本文涵盖常见的性能峰值模式、系统化诊断流程以及降低资源消耗的即时缓解措施。
:::callout{theme="info"} 关于设计高效自动化的主动指导,请参阅性能最佳实践。 :::
常见性能峰值模式¶
以下常见模式可帮助识别和诊断自动化性能问题。
模式1:对频繁更新对象的活动自动化¶
症状: 自动化在短时间内运行多次,产生意外的资源消耗。例如,包含大量对象更新的自动化可能产生多次运行,每次都可能触发下游影响。
根本原因: 对象类型更新非常频繁,且自动化配置了"对象更新时"条件但未设置时间上限。这种情况通常发生在之前暂停的自动化被重新启用时。
修复方法:
- 立即暂停自动化以停止进一步执行。
- 添加适当间隔的时间条件以限制执行频率。
- 确认已启用单次执行模式。
- 恢复自动化并在接下来的一天密切监控执行次数。
模式2:链式自动化雪崩¶
症状: 多个自动化按顺序运行,执行次数呈指数级增长。
根本原因: 自动化形成链条,每个自动化编辑对象时触发序列中的下一个自动化:
自动化A编辑对象- 这些编辑触发
自动化B,后者分别处理每个对象 自动化B编辑更多对象,触发自动化C- 乘数效应可能将初始更新转化为指数级更多的执行
修复方法:
- 暂停链条中所有下游自动化。
- 评估是否可以将逻辑整合到更少的自动化中。
- 确保剩余自动化使用批量处理(ObjectSet 输入和单次执行模式)。
- 逐个重新启用自动化,并在每次启用后监控执行次数。
模式3:低效的函数操作¶
症状: 函数执行时间长,资源消耗随对象数量增加而扩展性差。
根本原因: 函数包含循环,每次迭代查询一次 Ontology,而不是批量处理对象。
修复方法:
- 检查函数代码中是否包含 Ontology 查询的循环。
- 重构代码以预先加载所有所需对象或使用后端聚合。
- 尽可能将操作输入改为接受 ObjectSet 而非单个对象。
有关全面的函数优化指导,请参阅优化函数性能。
模式4:函数自调用循环¶
症状: 单个函数被多次调用,通常是递归调用。
根本原因: 函数编辑对象,而这些编辑触发了调用该函数的同一自动化,形成循环。如果没有防护措施,这种情况会持续到手动停止。
修复方法:
- 为对象添加状态标志或时间戳以防止重复处理。
- 添加条件逻辑,在编辑前检查是否需要处理。
- 考虑是否应将逻辑完全移出 Automate。
诊断流程¶
要调查性能峰值,请按以下步骤操作:
- 检查自动化执行历史: 打开自动化并查看执行历史。需考虑的关键问题:
- 过去一天运行了多少次?
- 执行次数激增从何时开始?
-
频率是否随时间增加?
-
识别条件频率: 检查条件配置和对象更新模式。需考虑的关键问题:
- 满足条件的对象更新频率如何?
- 是否配置了时间条件?如果是,间隔是多少?
-
更新是否比预期更频繁?
-
追踪自动化链: 使用 Autopilot 或 Workflow Lineage 了解依赖关系。需考虑的关键问题:
- 哪些自动化触发了其他自动化?
- 完整的连锁效应是什么?
-
是否存在一次执行成倍增加的潜在雪崩效应?
-
审查函数实现: 检查自动化调用的函数。需调查的关键主题:
- 函数是否包含 Ontology 查询的循环?
- 是否正确使用了批量处理模式?
-
检查函数执行时间和外部调用次数。
-
查找递归条件: 确定自动化是否在触发自身。需考虑的关键问题:
- 函数编辑的对象是否会导致同一自动化的条件再次满足?
- 是否有状态标志或防护措施防止递归处理?
- 执行历史是否显示快速重复调用?
即时缓解措施¶
发现性能或资源消耗峰值时,按优先级顺序采取以下措施:
- 停止问题扩散
-
立即暂停导致峰值的自动化。这可在调查期间防止进一步资源消耗。
-
评估影响
- 检查资源管理以了解总体资源影响。
- 识别可能也需要暂停的下游自动化。
-
确定问题回溯到多远。
-
应用快速修复
- 如果缺少时间条件,则添加一个。
- 如果执行模式设置为多次,则改为单次执行模式。
-
在函数中添加条件逻辑以跳过不必要的处理。
-
监控恢复
- 以降低的频率或有限的范围恢复自动化。
- 在接下来的一天密切监控执行次数。
- 确认资源消耗恢复到预期水平。
工具和资源¶
以下是用于诊断信息的若干资源。
执行历史和成本明细¶
- Automate 执行历史: 显示运行次数、时间和成功或失败状态
- 函数日志: 包含错误消息和执行时间详情
- Ontology Manager: 显示查询成本,但对即时故障排除帮助较小
- 资源管理: 按服务和资源类型提供总体成本明细
自动化工作流概览¶
- Autopilot: 大规模管理和监控自动化工作流的控制中心
- Workflow Lineage: 可视化自动化依赖关系和链条