Ontology design: Anti-patterns(本体论设计:反模式)¶
Even experienced Ontology designers can fall into common design traps that seem reasonable initially but create significant problems as the Ontology grows. This section identifies recurring anti-patterns, explains why they occur, and provides concrete guidance for avoiding or resolving them.
Avoiding these anti-patterns will help you build an Ontology that accurately represents your business domain, reduces maintenance overhead, and enables powerful cross-functional workflows.
| Anti-pattern | Description | Solution |
|---|---|---|
| System Silos | Creating separate object types for each source system. | Merge data in pipelines; create unified object types. |
| The Kitchen Sink | Including unnecessary technical columns as properties. | Curate properties intentionally; exclude ETL metadata. |
| Department Silos | Each department creates their own version of shared entities. | Create shared object types; use properties and links for department-specific data. |
| The God Object | One object type represents multiple distinct entities. | Create distinct object types; use interfaces for shared characteristics. |
| The Golden Hammer | Relying too heavily on a single tool (action types, pipelines, or functions, for example) for every problem instead of choosing the right capability. | Match the tool to the job: batch or streaming pipelines for data processing, actions for human decisions, automations for event-driven reactions, functions for complex real-time logic. |
| Action Sprawl | Creating many single-property actions instead of cohesive business operations. | Design actions around business operations that bundle related changes into meaningful workflows. |
| The Time Machine | Modeling historical versions as separate objects or object types. | Use a single object per entity with linked history/amendment objects and time series properties. |
| The Misnomer | Using vague, generic, or misleading names for Ontology elements. | Use specific, descriptive names; qualify ambiguous properties; name links by relationship. |
Anti-pattern: System Silos¶
System Silos occur when you create separate object types for the same real-world entity based on the source system the data originates from, rather than modeling the entity itself.
Common causes¶
- Different teams own different source systems and build independently
- Uncertainty about how to merge data from multiple sources
- Desire to preserve system-specific fields without deciding what's essential
Example¶
Your organization has employee data in three systems: an HR system, a badge access system, and a project management tool. Instead of creating a single Employee object type, you create:
HR System EmployeeBadge System EmployeeProject Management Employee
Problems¶
| Problem | Impact |
|---|---|
| Fragmented view of reality | End users cannot see a unified view of an employee; they must navigate multiple object types to understand the full picture. |
| Duplicated effort | Action types, link types, and applications must be built multiple times for what is conceptually the same entity. |
| Inconsistent data | The same employee may have conflicting information across object types with no clear source of truth. |
| Complex maintenance | Changes to business logic must be replicated across all system-specific object types. |
Solution¶
Create a single object type representing the real-world entity and use data pipelines to merge information from multiple source systems into a unified backing dataset.
✗ Avoid ✓ Prefer
───────────────────────────── ─────────────────────────────
HR System Employee Employee
Badge System Employee → (backed by merged dataset
Project Management Employee from all three systems)
To implement this:
- Identify the primary key that uniquely identifies the entity across systems (for example, employee ID).
- Build a transform that joins data from all source systems.
- Define clear precedence rules for conflicting values (for example, HR system is authoritative for job title).
- Create a single object type backed by the merged dataset.
Anti-pattern: The Kitchen Sink¶
This anti-pattern (also known as "everything but the kitchen sink") occurs when object types include unnecessary columns from external systems that have no business relevance in the Ontology context, cluttering the data model with technical artifacts.
Common causes¶
- "Just in case" mentality (keeping fields that might be useful later)
- Lack of clarity on what fields are meaningful
- Direct mapping from source systems without curation
- Fear of losing data by excluding columns
Example¶
When creating a Customer object type from a CRM system integration, you include all available columns:
- customer_id ✓
- customer_name ✓
- email ✓
- _crm_extracted_at ✗
- _crm_received_at ✗
- _crm_batched_at ✗
- _crm_sequence ✗
- _crm_table_version ✗
- _crm_internal_record_id ✗
- last_etl_update_timestamp ✗
Problems¶
| Problem | Impact |
|---|---|
| Confusion | End users see irrelevant technical fields alongside business data. |
| Performance degradation | Unnecessary properties increase data scale, compute, index size, and slow down searches. |
| Obscured insights | Important business properties are buried among system metadata. |
Solution¶
Curate properties intentionally. Only include columns that have clear business meaning and will be useful for workflows.
Use these guidelines when deciding which properties to include:
| Include | Exclude |
|---|---|
| Business identifiers (customer ID, order number) | Pipeline metadata |
| Human-readable attributes (name, description) | Internal system IDs with no business meaning |
| Dates relevant to business processes | Timestamps only relevant to data engineering |
| Status fields needed for filtering or actions | Audit columns for pipeline debugging |
To implement this:
- Review each column and ask: "Would someone ever need to see, search, or filter by this?"
- Keep technical metadata in the backing dataset for debugging, but do not expose it as properties.
- Use property visibility settings to hide any borderline properties that must exist but are rarely needed.
- Document why each property exists and who uses it.
Anti-pattern: Department Silos¶
Department Silos occur when different departments create their own versions of the same object type, leading to a fragmented Ontology that mirrors organizational structure rather than business reality.
Common causes¶
- Departments work in isolation without cross-functional coordination
- Each team believes their view of the customer is unique
- Lack of governance or central Ontology design authority
- Teams want autonomy and control over "their" data
Example¶
Multiple departments need to work with customer data, and each creates their own object type:
- Sales team creates
Sales Customer - Support team creates
Support Customer - Finance team creates
Billing Customer - Marketing team creates
Marketing Contact
All four object types represent the same real-world entity: a customer.
Problems¶
| Problem | Impact |
|---|---|
| No single source of truth | Different departments have conflicting information about the same customer. |
| Impossible cross-functional workflows | Cannot easily answer questions like "Show me all interactions with this customer across sales, support, and billing". |
| Duplicated development | Each department builds redundant actions, links, and applications. |
| Governance nightmare | Data quality issues multiply; fixes in one object type do not propagate to others. |
Solution¶
Create shared object types that serve multiple departments, using properties and links to capture department-specific information where needed.
✗ Avoid ✓ Prefer
───────────────────────────── ─────────────────────────────
Sales Customer Customer
Support Customer → ├── salesStatus (property)
Billing Customer ├── supportTier (property)
Marketing Contact ├── billingAccountId (property)
└── Links to:
├── Sales Opportunities
├── Support Tickets
└── Invoices
To implement this:
- Identify entities that exist across departmental boundaries.
- Establish a cross-functional working group to define shared object types.
- Use properties to capture department-specific attributes on shared objects.
- Use link types to connect shared objects to department-specific objects (such as
Customer→Support Ticket). - Leverage object views or curated Workshop and OSDK applications if departments need different "views" of the same underlying entity.
- Use restricted views if specific properties can only be accessible by a specific team.
Anti-pattern: The God Object¶
The God Object anti-pattern occurs when a single object type is overloaded to represent multiple distinct real-world entities, resulting in a bloated, confusing, and unmaintainable object type.
Common causes¶
- Over-abstraction driven by superficial similarities ("they are all assets")
- Desire to minimize the number of object types
- Lack of clear entity definitions before building
- Scope creep as more use cases are added to an existing object type
Indicators¶
- An object type has many properties that are frequently null
- Property meanings change based on another property's value (such as type or category)
- You find yourself asking "What kind of
[Object]is this?" when viewing an object - Business rules and validations require extensive conditional logic based on object "type"
Example¶
You create an Asset object type intended to represent "anything valuable," which ends up including:
- Physical equipment (trucks, machinery)
- Software licenses
- Real estate properties
- Financial instruments
- Employees (as "human assets")
The object type has 150+ properties, most of which are null for any given object, and the meaning of properties like value, location, and status varies completely depending on what kind of "asset" the object represents.
Problems¶
| Problem | Impact |
|---|---|
| Semantic confusion | End users cannot understand what an Asset actually represents. |
| Sparse data | Most properties are null for most objects, making the data hard to interpret. |
| Impossible validation | Cannot enforce business rules because rules differ by entity type. |
| Poor search experience | Searching for Assets returns a mix of unrelated things. |
| Action type complexity | Actions must handle wildly different entity types with complex conditional logic. |
Solution¶
Create distinct object types for distinct real-world entities. Use interfaces to model shared characteristics when entities genuinely share common properties or behaviors.
✗ Avoid ✓ Prefer
───────────────────────────── ─────────────────────────────
Asset Equipment
- assetType Vehicle
- assetSubtype Software License
- value → Property (Real Estate)
- location Financial Instrument
- status
- 145 more properties... Interface: Depreciable Asset
- purchaseDate
- purchaseValue
- depreciationSchedule
To implement this:
- List the distinct real-world entities currently represented by the object type.
- Create separate object types for each distinct entity.
- Identify genuinely shared properties and behaviors.
- Use interfaces to model shared characteristics across object types.
- Migrate existing objects to appropriate new object types.
Anti-pattern: The Golden Hammer¶
The Golden Hammer anti-pattern occurs when you rely too heavily on a single tool to solve every problem, even when other approaches are more appropriate. The name comes from the saying: "If all you have is a hammer, everything looks like a nail" ↗.
This anti-pattern manifests in the overuse of action types in work better suited for pipelines, building pipelines for logic that should be event-driven automations, or writing functions for calculations that are better pre-computed in a transform.
Common causes¶
- Overreliance on a tool due to familiarity and visibility within the team
- Desire to give end users "control" over when computations happen
- Lack of familiarity with the full platform (including pipelines, automations, functions, and scheduled builds)
- Thinking exclusively in one layer (Ontology-first, pipeline-first, or code-first) without considering the full toolkit
Examples¶
Overreliance on action types:
You need to calculate aggregate metrics for a dashboard showing total sales by region. Instead of using a data pipeline to pre-compute these metrics, you create an action type called Calculate Regional Sales Totals that end users must manually trigger. Results are written back to objects via the action.
Overreliance on pipelines:
An alert object is created by a pipeline when sensor readings exceed a threshold. You want to automatically assign that alert to the on-call engineer and send a notification. Instead of using an automation that reacts to the new object, you build additional pipeline logic that tries to resolve the assignee and write the assignment into the backing dataset, mixing operational workflow logic into data integration.
Overreliance on functions:
You implement a simple property derivation like fullName = firstName + lastName as a function-backed column, adding runtime overhead and a code repository to maintain, when a single pipeline concat expression would suffice.
Problems¶
| Problem | Impact |
|---|---|
| Scalability limits | Each tool has different execution limits; using the wrong one hits ceilings early. |
| Unnecessary complexity | Maintaining logic in the wrong layer increases the number of moving parts. |
| User burden | End users must perform steps that the platform could handle automatically. |
| Performance issues | Real-time calculations via actions or functions are slower than pre-computed pipeline results. Conversely, scheduled pipelines are too slow for event-driven reactions. |
| Difficult debugging | When logic lives in the wrong layer, failures are harder to diagnose and resolve. |
Solution¶
Choose the right tool for the job based on your use case:
| Tool | Best for | Not ideal for |
|---|---|---|
| Action types | Human decisions, user-initiated edits to one or a few objects, input-driven changes that should apply immediately. | Batch calculations, scheduled updates, event-driven reactions with no human involvement. |
| Pipelines (batch) | Batch data processing, aggregations, cleansing, enrichment, pre-computing derived values on a schedule or on data arrival. | Real-time reactions to individual object changes, logic that requires human input. |
| Pipelines (streaming) | Continuous, low-latency data processing where results must stay current as source data arrives (real-time dashboards, live status tracking, continuous enrichment). | Infrequent updates where batch is sufficient, logic that requires human input, reacting to Ontology-level events (use automations). |
| Automations | Event-driven reactions to Ontology changes (object created, property updated, schedule triggered), orchestrating actions or notifications without user involvement. | Heavy data processing, complex multi-dataset joins, logic that requires human judgment. |
| Functions | Complex real-time computations across multiple objects, validation logic, derived values that depend on live Ontology state and cannot be pre-computed. | Simple derivations computable in a pipeline, batch processing of large datasets. |
| Schedules | Recurring pipeline builds, time-based or event-based orchestration of data refresh. | Reacting to individual object-level changes in real time. |
Examples of applying this guidance:
✗ Avoid ✓ Prefer
────────────────────────────────────────────── ──────────────────────────────────────────────
Action: "Calculate Regional Sales" → Pipeline that aggregates sales data daily
into a "Regional Sales Summary" object type.
Action: "Standardize Address Format" → Pipeline that cleanses addresses on ingestion.
Action: "Update Inventory Status" → Pipeline that sets status based on quantity
(based on quantity thresholds) thresholds during each sync.
Action: "Assign Risk Score" → Pipeline or model that calculates risk scores
(using a formula) and writes to the backing dataset.
Pipeline that assigns alerts to on-call → Automation that triggers an "Assign Alert"
engineers by writing to the backing dataset action when a new "Alert" object is created.
Pipeline that sends a notification when → Automation that monitors for the condition
an object meets a condition and sends a notification or triggers an action.
Batch pipeline polling every minute for → Streaming pipeline that continuously processes
new IoT sensor readings sensor data as it arrives.
Function-backed column for → Pipeline that computes fullName = firstName
fullName = firstName + " " + lastName + " " + lastName in the backing dataset.
Scheduled pipeline running every minute → Automation that reacts to the specific object
to check for objects needing follow-up change and triggers the follow-up immediately.
To implement this:
- Before creating an action type, ask: "Does this require human judgment or user input?" If not, it likely belongs in a pipeline or automation.
- Before adding logic to a pipeline, ask: "Is this a data transformation, or is it an operational workflow?" Data cleansing, aggregation, and enrichment belong in pipelines. Assigning work, sending notifications, and reacting to individual changes belong in automations.
- Before writing a function, ask: "Can this be pre-computed in the backing pipeline?" If the result only depends on source data columns and does not need live Ontology traversal, compute it upstream.
- Before building a polling pipeline (running every N minutes to detect changes), ask: "Can an automation react to this event directly?" Automations respond to Ontology changes in near-real-time without the overhead of scheduled builds. If the need is for continuous data processing from a source system, consider a streaming pipeline instead.
- Before defaulting to a batch pipeline, ask: "Does this data need to be continuously current?" If consumers depend on low-latency freshness, a streaming pipeline avoids the compromise of a batch schedule.
- Use automations to bridge the gap between "something changed" and "something should happen", without requiring a user to click a button or poll a pipeline.
Anti-pattern: Action Sprawl¶
Action Sprawl occurs when you create many narrowly-scoped action types that each modify a single property, rather than designing cohesive actions that represent meaningful business operations.
Common causes¶
- Thinking of actions as database column updates rather than business operations
- Building actions incrementally without considering the overall user experience
- Lack of understanding of how actions can bundle multiple property changes
- Mimicking CRUD operations from traditional application development
Indicators¶
- More than 10 action types for a single object type
- Multiple actions that are always performed in sequence
- Action names that read like
Set [Property]orUpdate [Property] - End users complaining about too many steps to complete a task
Example¶
For an Employee object type, instead of creating meaningful business actions, you create:
Update Employee First NameUpdate Employee Last NameUpdate Employee EmailUpdate Employee PhoneUpdate Employee DepartmentUpdate Employee Manager- ...and 20 more single-property actions
Problems¶
| Problem | Impact |
|---|---|
| Overwhelming experience | End users face a long, cluttered list of actions and struggle to find the right one. |
| Fragmented workflows | Simple updates require multiple action submissions to complete a single business task. |
| No cohesive business representation | Actions do not map to real-world processes, making the Ontology unintuitive. |
| Fragmented audit trails | History of changes is scattered across many small actions, making it difficult to understand what happened and why. |
Solution¶
Design action types around business operations, not database updates. Create actions that bundle related changes into meaningful workflows.
✗ Avoid ✓ Prefer
──────────────────────────────────────── ────────────────────────────────────────
Update Employee First Name Update Employee Contact Information
Update Employee Last Name → - firstName
Update Employee Email - lastName
Update Employee Phone - email
- phone
Update Employee Department Transfer Employee to New Department
Update Employee Manager → - newDepartment
Update Employee Location - newManager
- newLocation
- effectiveDate
Create Employee Record Onboard New Employee
Set Employee Start Date → - All required fields for a new hire
Assign Employee Badge - Triggers downstream workflows
Assign Employee Equipment (badge assignment, equipment request)
To implement this:
- Map out the real business processes that involve changing object data.
- Group related property changes into single actions that represent those processes.
- Use action parameters to allow optional fields within a cohesive action.
- Name actions after the business operation:
Transfer Employee,Approve Purchase Order,Escalate Support Ticket. - Use action rules and validation logic to enforce business constraints within the action.
Anti-pattern: The Time Machine¶
The Time Machine anti-pattern occurs when you model historical versions of an entity as separate objects or object types rather than using time series data, snapshots, or proper versioning strategies.
Common causes¶
- Desire to preserve a complete history of every change
- Misunderstanding of how to model temporal data in the Ontology
- Applying file-versioning mental models (v1, v2, v3) to object design
- Lack of awareness of time series properties or linked history patterns
Indicators¶
- Object type contains multiple objects representing the same real-world entity at different points in time
- Properties like
version,revision, orisCurrentexist to distinguish copies - Object counts grow proportionally with the number of changes rather than the number of entities
- End users are confused about which object to reference or link to
Example¶
To track changes to a Contract, you create:
Contract v1,Contract v2,Contract v3as separate objects within the same object type- Or worse:
Contract 2023,Contract 2024,Contract 2025as separate object types for each year
Each "version" is a full copy of the contract with slightly different property values, and links to other objects (such as Vendor or Department) are duplicated across all versions.
Problems¶
| Problem | Impact |
|---|---|
| Object count explosion | Every change creates a new object, rapidly inflating the Ontology with redundant data. |
| Ambiguous current state | It is difficult to identify which version is the "current" or authoritative version. |
| Ambiguous links | Links to contracts become unclear; which version should a Vendor or Department link to? |
| Complex reporting | Reporting across time periods requires filtering and deduplication logic that is error-prone. |
Solution¶
Use a single object per entity with properties for current state. Store historical changes in a separate linked object type, enable edits history, or leverage time series properties.
✗ Avoid ✓ Prefer
──────────────────────────────────────── ────────────────────────────────────────
Contract v1 (object) Contract (single object per contract)
Contract v2 (object) → - currentValue
Contract v3 (object) - currentStatus
- effectiveDate
— OR — - Links to:
└── Contract Amendments
Contract 2023 (object type) - amendmentDate
Contract 2024 (object type) - previousValue
Contract 2025 (object type) - newValue
- changeReason
To implement this:
- Use a single object per real-world entity with properties reflecting the current state.
- Create a separate linked object type (such as
Contract AmendmentorContract History) to capture historical changes. - Leverage time series properties for values that change frequently and need temporal tracking.
- Use the backing dataset or edits history to maintain full historical records for audit trails if needed.
Anti-pattern: The Misnomer¶
The Misnomer anti-pattern occurs when you use vague, generic, or misleading names for object types, properties, and link types that do not clearly communicate their meaning, leading to confusion and misinterpretation across the Ontology.
Common causes¶
- Using shorthand names that make sense to you but not to others
- Names are carried over directly from source system column names without translation
- Desire for brevity over clarity
- Lack of naming conventions or governance standards
- Assumption that context will make meaning obvious
Indicators¶
- End users frequently ask "What does this property mean?" or "What kind of
[Object]is this?" - The same name could reasonably refer to multiple different concepts
- Property names are single generic words like
value,type,status,date, ornamewithout qualification - Link types use generic labels like "related to" without specifying the nature of the relationship
Example¶
You create the following Ontology elements with ambiguous names:
- Object type:
Item(What kind of item? Product? Line item? Inventory item?) - Property:
value(Monetary value? Quantity? Score? Rating?) - Property:
type(Type of what? What are valid values?) - Property:
date(Created date? Modified date? Due date? Effective date?) - Link type:
Item→Related Item(How are they related? Parent-child? Substitute? Accessory?)
End users encountering these names must guess at their meaning or dig into documentation to understand what the data actually represents.
Problems¶
| Problem | Impact |
|---|---|
| Misinterpretation | End users cannot understand the Ontology without additional context, leading to incorrect analysis and decisions. |
| Steep learning curve | New team members must spend significant time learning what vague names actually mean. |
| Documentation dependency | Documentation becomes essential rather than supplementary, and falls out of date quickly. |
| Cross-team confusion | Different teams interpret the same vague names differently, leading to inconsistent usage. |
Solution¶
Use specific, descriptive names for all Ontology elements. Names should be self-documenting so that anyone can understand meaning without additional context.
✗ Avoid ✓ Prefer
──────────────────────────────────────── ────────────────────────────────────────
Object type: Item → Object type: Product
Object type: Sales Order Line Item
Object type: Warehouse Inventory Record
Property: value → Property: monetaryValue
Property: quantityOnHand
Property: riskScore
Property: type → Property: productCategory
Property: serviceTier
Property: date → Property: orderPlacedDate
Property: contractEffectiveDate
Link: Item → Related Item → Link: Product → Purchasing Customer
Link: Employee → Supervisor
Link: Equipment → Manufacturing Facility
To implement this:
- Establish naming conventions before building and enforce them through governance reviews.
- Use specific, descriptive names:
Product,Sales Order Line Item,Warehouse Inventory Record. - Qualify ambiguous properties:
monetaryValue,quantityOnHand,riskScore. - Name links explaining the relationship:
Purchasing Customers,Manufacturing Facility,Supervisor. - Add descriptions to all Ontology elements explaining their meaning and valid values.
- Review names with end users to ensure they are intuitive and unambiguous.
Building a successful Ontology¶
The anti-patterns described in this guide are common but avoidable. By focusing on the fundamental best practices (modeling reality rather than systems, curating properties intentionally, collaborating across teams, and choosing the right tools for each task), you can build an Ontology that scales with your organization's needs.
Remember that effective Ontology design is iterative. Start with clear entity definitions, involve stakeholders early, and refine your model as you learn what works. When you encounter challenges, revisit the principles in this guide to identify whether an anti-pattern may be emerging and course-correct before it becomes difficult to change.
中文翻译¶
本体论设计:反模式¶
即使是经验丰富的本体论(Ontology)设计者,也可能陷入常见的设计陷阱。这些陷阱最初看似合理,但随着本体论的扩展会引发严重问题。本节将识别反复出现的反模式,解释其成因,并提供具体的规避或解决方案。
避免这些反模式将帮助您构建一个准确反映业务领域、降低维护成本并支持强大跨职能工作流的本体论。
| 反模式 | 描述 | 解决方案 |
|---|---|---|
| 系统孤岛(System Silos) | 为每个源系统创建独立的对象类型。 | 在数据管道(Pipeline)中合并数据;创建统一的对象类型。 |
| 大杂烩(The Kitchen Sink) | 将不必要的技术列作为属性包含进来。 | 有目的地筛选属性;排除ETL元数据。 |
| 部门孤岛(Department Silos) | 每个部门都创建自己版本的共享实体。 | 创建共享对象类型;使用属性和链接存储部门特定数据。 |
| 上帝对象(The God Object) | 一个对象类型代表多个不同的实体。 | 创建不同的对象类型;使用接口(Interface)表示共享特征。 |
| 金锤子(The Golden Hammer) | 过度依赖单一工具(如操作类型、数据管道或函数)解决所有问题,而非选择合适的能力。 | 为任务匹配工具:批处理或流式管道用于数据处理,操作(Action)用于人工决策,自动化(Automation)用于事件驱动响应,函数(Function)用于复杂实时逻辑。 |
| 操作泛滥(Action Sprawl) | 创建大量单属性操作,而非设计内聚的业务操作。 | 围绕业务操作设计操作,将相关变更捆绑为有意义的工作流。 |
| 时间机器(The Time Machine) | 将历史版本建模为独立的对象或对象类型。 | 每个实体使用单一对象,并链接历史/修订对象及时间序列属性。 |
| 命名不当(The Misnomer) | 为本体论元素使用模糊、通用或误导性的名称。 | 使用具体、描述性的名称;限定模糊属性;按关系命名链接。 |
反模式:系统孤岛¶
当您根据数据来源的源系统而非实体本身,为同一现实世界实体创建独立的对象类型时,就会出现系统孤岛。
常见原因¶
- 不同团队拥有不同的源系统并独立构建
- 不确定如何合并来自多个源的数据
- 希望保留系统特定字段,而不决定哪些是必要的
示例¶
您的组织在三个系统中拥有员工数据:HR系统、门禁系统和项目管理工具。您没有创建单一的Employee对象类型,而是创建了:
HR System EmployeeBadge System EmployeeProject Management Employee
问题¶
| 问题 | 影响 |
|---|---|
| 现实视图碎片化 | 最终用户无法看到员工的统一视图;他们必须浏览多个对象类型才能了解全貌。 |
| 重复工作 | 操作类型、链接类型和应用程序必须为概念上相同的实体多次构建。 |
| 数据不一致 | 同一员工在不同对象类型中可能有冲突信息,且没有明确的数据源。 |
| 维护复杂 | 业务逻辑的变更必须在所有系统特定对象类型中重复实施。 |
解决方案¶
创建一个代表现实世界实体的单一对象类型,并使用数据管道将来自多个源系统的信息合并到统一的后台数据集(Backing Dataset)中。
✗ 避免 ✓ 推荐
───────────────────────────── ─────────────────────────────
HR System Employee Employee
Badge System Employee → (由来自三个系统的合并数据集支持)
Project Management Employee
实施步骤:
- 识别跨系统唯一标识实体的主键(例如员工ID)。
- 构建一个转换(Transform),连接所有源系统的数据。
- 为冲突值定义明确的优先级规则(例如,HR系统对职位名称具有权威性)。
- 创建一个由合并数据集支持的单一对象类型。
反模式:大杂烩¶
这种反模式(也称为"除了厨房水槽什么都往里放")发生在对象类型包含来自外部系统的不必要列时,这些列在本体论上下文中没有业务相关性,从而用技术产物污染了数据模型。
常见原因¶
- "以防万一"的心态(保留将来可能用到的字段)
- 不清楚哪些字段有意义
- 直接映射源系统而未进行筛选
- 担心排除列会丢失数据
示例¶
从CRM系统集成创建Customer对象类型时,您包含了所有可用列:
- customer_id ✓
- customer_name ✓
- email ✓
- _crm_extracted_at ✗
- _crm_received_at ✗
- _crm_batched_at ✗
- _crm_sequence ✗
- _crm_table_version ✗
- _crm_internal_record_id ✗
- last_etl_update_timestamp ✗
问题¶
| 问题 | 影响 |
|---|---|
| 混淆 | 最终用户看到不相关的技术字段与业务数据混杂在一起。 |
| 性能下降 | 不必要的属性增加了数据规模、计算量、索引大小,并拖慢搜索速度。 |
| 洞察被掩盖 | 重要的业务属性被埋没在系统元数据中。 |
解决方案¶
有目的地筛选属性。只包含具有明确业务含义且对工作流有用的列。
使用以下指南决定包含哪些属性:
| 包含 | 排除 |
|---|---|
| 业务标识符(客户ID、订单号) | 管道元数据 |
| 人类可读属性(名称、描述) | 无业务含义的内部系统ID |
| 与业务流程相关的日期 | 仅与数据工程相关的时间戳 |
| 过滤或操作所需的状态字段 | 用于管道调试的审计列 |
实施步骤:
- 检查每一列并问:"是否有人需要查看、搜索或按此过滤?"
- 将技术元数据保留在后台数据集中用于调试,但不要将其暴露为属性。
- 使用属性可见性设置隐藏那些必须存在但很少需要的边缘属性。
- 记录每个属性存在的原因及其使用者。
反模式:部门孤岛¶
当不同部门创建同一对象类型的各自版本时,就会出现部门孤岛,导致本体论碎片化,反映的是组织结构而非业务现实。
常见原因¶
- 部门各自为政,缺乏跨职能协调
- 每个团队认为他们对客户的看法是独特的
- 缺乏治理或中心化的本体论设计权威
- 团队希望拥有对"他们的"数据的自主权和控制权
示例¶
多个部门需要使用客户数据,每个部门都创建了自己的对象类型:
- 销售团队创建
Sales Customer - 支持团队创建
Support Customer - 财务团队创建
Billing Customer - 市场团队创建
Marketing Contact
所有四个对象类型代表同一个现实世界实体:客户。
问题¶
| 问题 | 影响 |
|---|---|
| 没有单一数据源 | 不同部门对同一客户有冲突的信息。 |
| 无法实现跨职能工作流 | 无法轻松回答诸如"显示此客户在销售、支持和账单方面的所有交互"之类的问题。 |
| 重复开发 | 每个部门构建冗余的操作、链接和应用程序。 |
| 治理噩梦 | 数据质量问题成倍增加;一个对象类型的修复不会传播到其他类型。 |
解决方案¶
创建服务于多个部门的共享对象类型,在需要时使用属性和链接捕获部门特定信息。
✗ 避免 ✓ 推荐
───────────────────────────── ─────────────────────────────
Sales Customer Customer
Support Customer → ├── salesStatus (属性)
Billing Customer ├── supportTier (属性)
Marketing Contact ├── billingAccountId (属性)
└── 链接到:
├── Sales Opportunities
├── Support Tickets
└── Invoices
实施步骤:
- 识别跨部门边界存在的实体。
- 建立跨职能工作组来定义共享对象类型。
- 使用属性在共享对象上捕获部门特定属性。
- 使用链接类型将共享对象连接到部门特定对象(如
Customer→Support Ticket)。 - 如果部门需要同一底层实体的不同"视图",利用对象视图或精心设计的Workshop和OSDK应用程序。
- 如果特定属性只能由特定团队访问,使用受限视图(Restricted View)。
反模式:上帝对象¶
上帝对象反模式发生在单个对象类型被过度加载以代表多个不同的现实世界实体时,导致对象类型臃肿、混乱且难以维护。
常见原因¶
- 由表面相似性驱动的过度抽象("它们都是资产")
- 希望最小化对象类型的数量
- 在构建前缺乏清晰的实体定义
- 随着更多用例添加到现有对象类型而出现的范围蔓延
指标¶
- 对象类型有许多经常为空的属性
- 属性含义根据另一个属性的值而变化(如类型或类别)
- 查看对象时,您会问自己"这是哪种
[对象]?" - 业务规则和验证需要基于对象"类型"的大量条件逻辑
示例¶
您创建了一个旨在代表"任何有价值的东西"的Asset对象类型,最终包括:
- 物理设备(卡车、机械)
- 软件许可证
- 房地产物业
- 金融工具
- 员工(作为"人力资产")
该对象类型有150多个属性,其中大多数对任何给定对象都为空,并且像value、location和status这样的属性含义完全取决于对象代表的是哪种"资产"。
问题¶
| 问题 | 影响 |
|---|---|
| 语义混淆 | 最终用户无法理解Asset实际代表什么。 |
| 数据稀疏 | 大多数属性对大多数对象为空,使数据难以解释。 |
| 无法验证 | 无法执行业务规则,因为规则因实体类型而异。 |
| 搜索体验差 | 搜索Assets会返回不相关事物的混合结果。 |
| 操作类型复杂 | 操作必须处理截然不同的实体类型,需要复杂的条件逻辑。 |
解决方案¶
为不同的现实世界实体创建不同的对象类型。当实体真正共享共同属性或行为时,使用接口(Interface)建模共享特征。
✗ 避免 ✓ 推荐
───────────────────────────── ─────────────────────────────
Asset Equipment
- assetType Vehicle
- assetSubtype Software License
- value → Property (Real Estate)
- location Financial Instrument
- status
- 145 more properties... Interface: Depreciable Asset
- purchaseDate
- purchaseValue
- depreciationSchedule
实施步骤:
- 列出当前由该对象类型代表的不同现实世界实体。
- 为每个不同实体创建独立的对象类型。
- 识别真正共享的属性和行为。
- 使用接口建模跨对象类型的共享特征。
- 将现有对象迁移到适当的新对象类型。
反模式:金锤子¶
金锤子反模式发生在您过度依赖单一工具解决所有问题时,即使其他方法更合适。这个名称来源于谚语:"如果你只有一把锤子,那么所有东西看起来都像钉子" ↗。
这种反模式表现为:在更适合使用管道的地方过度使用操作类型;为本应是事件驱动自动化的逻辑构建管道;或者为更适合在转换中预计算的计算编写函数。
常见原因¶
- 由于团队熟悉度和可见性而过度依赖某个工具
- 希望让最终用户"控制"计算何时发生
- 不熟悉完整平台(包括管道、自动化、函数和计划构建)
- 仅从单一层面思考(本体论优先、管道优先或代码优先),而未考虑完整工具集
示例¶
过度依赖操作类型:
您需要计算仪表板的聚合指标,显示按地区的总销售额。您没有使用数据管道预计算这些指标,而是创建了一个名为Calculate Regional Sales Totals的操作类型,最终用户必须手动触发。结果通过操作写回对象。
过度依赖管道:
当传感器读数超过阈值时,管道创建一个警报对象。您希望自动将该警报分配给值班工程师并发送通知。您没有使用响应新对象的自动化,而是构建了额外的管道逻辑来尝试解析分配者并将分配写入后台数据集,将操作工作流逻辑混入数据集成中。
过度依赖函数:
您实现了一个简单的属性派生,如fullName = firstName + lastName作为函数支持列,增加了运行时开销和需要维护的代码仓库,而单个管道concat表达式就足够了。
问题¶
| 问题 | 影响 |
|---|---|
| 可扩展性限制 | 每个工具有不同的执行限制;使用错误的工具会提前达到上限。 |
| 不必要的复杂性 | 在错误的层面维护逻辑增加了移动部件的数量。 |
| 用户负担 | 最终用户必须执行平台可以自动处理的步骤。 |
| 性能问题 | 通过操作或函数进行的实时计算比预计算的管道结果慢。相反,计划管道对于事件驱动响应来说太慢。 |
| 调试困难 | 当逻辑存在于错误层面时,故障更难诊断和解决。 |
解决方案¶
根据您的用例选择正确的工具:
| 工具 | 最适合 | 不适合 |
|---|---|---|
| 操作类型(Action Types) | 人工决策、用户发起的对一个或几个对象的编辑、应立即可用的输入驱动变更。 | 批量计算、计划更新、无需人工参与的事件驱动响应。 |
| 管道(批处理)(Pipelines (batch)) | 批量数据处理、聚合、清洗、丰富、按计划或数据到达时预计算派生值。 | 对单个对象变更的实时响应、需要人工输入的逻辑。 |
| 管道(流式)(Pipelines (streaming)) | 持续、低延迟的数据处理,结果必须在源数据到达时保持最新(实时仪表板、实时状态跟踪、持续丰富)。 | 批处理就足够的低频更新、需要人工输入的逻辑、响应本体论级别事件(使用自动化)。 |
| 自动化(Automations) | 对本体论变更的事件驱动响应(对象创建、属性更新、计划触发),无需用户参与即可编排操作或通知。 | 大量数据处理、复杂的多数据集连接、需要人工判断的逻辑。 |
| 函数(Functions) | 跨多个对象的复杂实时计算、验证逻辑、依赖于实时本体论状态且无法预计算的派生值。 | 可在管道中计算的简单派生、大数据集的批处理。 |
| 计划(Schedules) | 定期管道构建、基于时间或事件的数 据刷新编排。 | 实时响应单个对象级别的变更。 |
应用此指南的示例:
✗ 避免 ✓ 推荐
────────────────────────────────────────────── ──────────────────────────────────────────────
操作:"Calculate Regional Sales" → 管道:每日聚合销售数据到"Regional Sales Summary"对象类型。
操作:"Standardize Address Format" → 管道:在数据摄入时清洗地址。
操作:"Update Inventory Status" → 管道:每次同步时根据数量阈值设置状态。
(基于数量阈值)
操作:"Assign Risk Score" → 管道或模型:计算风险评分并写入后台数据集。
(使用公式)
管道:通过写入后台数据集将警报分配给值班工程师 → 自动化:当新的"Alert"对象创建时触发"Assign Alert"操作。
管道:当对象满足条件时发送通知 → 自动化:监控条件并发送通知或触发操作。
批处理管道:每分钟轮询新的IoT传感器读数 → 流式管道:持续处理到达的传感器数据。
函数支持列:fullName = firstName + " " + lastName → 管道:在后台数据集中计算fullName = firstName + " " + lastName。
计划管道:每分钟运行检查需要跟进的对象 → 自动化:响应特定对象变更并立即触发跟进。
实施步骤:
- 在创建操作类型之前,问:"这需要人工判断或用户输入吗?"如果不是,它很可能属于管道或自动化。
- 在向管道添加逻辑之前,问:"这是数据转换,还是操作工作流?"数据清洗、聚合和丰富属于管道。分配工作、发送通知和响应单个变更属于自动化。
- 在编写函数之前,问:"这可以在后台管道中预计算吗?"如果结果仅依赖于源数据列且不需要实时本体论遍历,请在上游计算。
- 在构建轮询管道之前(每N分钟运行以检测变更),问:"自动化可以直接响应此事件吗?"自动化近乎实时地响应本体论变更,无需计划构建的开销。如果需要从源系统进行持续数据处理,请考虑流式管道。
- 在默认使用批处理管道之前,问:"这些数据需要持续保持最新吗?"如果消费者依赖低延迟的新鲜度,流式管道避免了批处理计划的折衷。
- 使用自动化来弥合"某些东西发生了变化"和"某些事情应该发生"之间的差距,无需用户点击按钮或轮询管道。
反模式:操作泛滥¶
操作泛滥发生在您创建许多范围狭窄的操作类型,每个只修改单个属性,而不是设计代表有意义业务操作的内聚操作时。
常见原因¶
- 将操作视为数据库列更新而非业务操作
- 增量构建操作而未考虑整体用户体验
- 不了解操作如何捆绑多个属性变更
- 模仿传统应用程序开发中的CRUD操作
指标¶
- 单个对象类型有超过10个操作类型
- 多个操作总是按顺序执行
- 操作名称类似于
Set [属性]或Update [属性] - 最终用户抱怨完成任务需要太多步骤
示例¶
对于Employee对象类型,您没有创建有意义的业务操作,而是创建了:
Update Employee First NameUpdate Employee Last NameUpdate Employee EmailUpdate Employee PhoneUpdate Employee DepartmentUpdate Employee Manager- ...以及另外20个单属性操作
问题¶
| 问题 | 影响 |
|---|---|
| 体验不堪重负 | 最终用户面对一长串杂乱的操作列表,难以找到正确的操作。 |
| 工作流碎片化 | 简单的更新需要多次操作提交才能完成单个业务任务。 |
| 缺乏内聚的业务表示 | 操作不映射到现实世界流程,使本体论不直观。 |
| 审计追踪碎片化 | 变更历史分散在许多小操作中,难以理解发生了什么以及为什么。 |
解决方案¶
围绕业务操作而非数据库更新来设计操作类型。创建将相关变更捆绑为有意义工作流的操作。
✗ 避免 ✓ 推荐
──────────────────────────────────────── ────────────────────────────────────────
Update Employee First Name Update Employee Contact Information
Update Employee Last Name → - firstName
Update Employee Email - lastName
Update Employee Phone - email
- phone
Update Employee Department Transfer Employee to New Department
Update Employee Manager → - newDepartment
Update Employee Location - newManager
- newLocation
- effectiveDate
Create Employee Record Onboard New Employee
Set Employee Start Date → - 新员工的所有必填字段
Assign Employee Badge - 触发下游工作流
Assign Employee Equipment (徽章分配、设备请求)
实施步骤:
- 梳理涉及更改对象数据的真实业务流程。
- 将相关属性变更分组到代表这些流程的单个操作中。
- 使用操作参数允许内聚操作中的可选字段。
- 以业务操作命名操作:
Transfer Employee、Approve Purchase Order、Escalate Support Ticket。 - 使用操作规则和验证逻辑在操作内执行业务约束。
反模式:时间机器¶
时间机器反模式发生在您将实体的历史版本建模为独立的对象或对象类型,而不是使用时间序列数据、快照或适当的版本控制策略时。
常见原因¶
- 希望保留每次变更的完整历史
- 误解如何在本体论中建模时间数据
- 将文件版本控制思维模型(v1、v2、v3)应用于对象设计
- 不了解时间序列属性或链接历史模式
指标¶
- 对象类型包含代表同一现实世界实体在不同时间点的多个对象
- 存在像
version、revision或isCurrent这样的属性来区分副本 - 对象数量与变更次数成正比增长,而非与实体数量成正比
- 最终用户对引用或链接到哪个对象感到困惑
示例¶
为了跟踪Contract的变更,您创建了:
Contract v1、Contract v2、Contract v3作为同一对象类型中的独立对象- 或者更糟:
Contract 2023、Contract 2024、Contract 2025作为每年的独立对象类型
每个"版本"都是合同的完整副本,属性值略有不同,并且到其他对象(如Vendor或Department)的链接在所有版本中重复。
问题¶
| 问题 | 影响 |
|---|---|
| 对象数量爆炸 | 每次变更都会创建一个新对象,迅速用冗余数据膨胀本体论。 |
| 当前状态模糊 | 难以识别哪个版本是"当前"或权威版本。 |
| 链接模糊 | 到合同的链接变得不清晰;Vendor或Department应该链接到哪个版本? |
| 报告复杂 | 跨时间段的报告需要过滤和去重逻辑,容易出错。 |
解决方案¶
每个实体使用单一对象,属性反映当前状态。将历史变更存储在单独的链接对象类型中,启用编辑历史,或利用时间序列属性。
✗ 避免 ✓ 推荐
──────────────────────────────────────── ────────────────────────────────────────
Contract v1 (对象) Contract (每个合同一个对象)
Contract v2 (对象) → - currentValue
Contract v3 (对象) - currentStatus
- effectiveDate
— 或 — - 链接到:
└── Contract Amendments
Contract 2023 (对象类型) - amendmentDate
Contract 2024 (对象类型) - previousValue
Contract 2025 (对象类型) - newValue
- changeReason
实施步骤:
- 每个现实世界实体使用单一对象,属性反映当前状态。
- 创建单独的链接对象类型(如
Contract Amendment或Contract History)来捕获历史变更。 - 利用时间序列属性处理频繁变化且需要时间跟踪的值。
- 如果需要审计追踪,使用后台数据集或编辑历史维护完整的变更记录。
反模式:命名不当¶
命名不当反模式发生在您为对象类型、属性和链接类型使用模糊、通用或误导性的名称时,这些名称不能清晰传达其含义,导致整个本体论中的混淆和误解。
常见原因¶
- 使用对您有意义但对他人无意义的简写名称
- 名称直接从源系统列名继承而来,未经转换
- 追求简洁而非清晰
- 缺乏命名约定或治理标准
- 假设上下文会使含义显而易见
指标¶
- 最终用户经常问"这个属性是什么意思?"或"这是哪种
[对象]?" - 同一名称可能合理地指代多个不同概念
- 属性名称是单个通用词,如
value、type、status、date或name,没有限定 - 链接类型使用通用标签如"related to",未指定关系的性质
示例¶
您创建了以下具有模糊名称的本体论元素:
- 对象类型:
Item(哪种项目?产品?订单行项目?库存项目?) - 属性:
value(货币价值?数量?分数?评级?) - 属性:
type(什么类型?有效值是什么?) - 属性:
date(创建日期?修改日期?到期日期?生效日期?) - 链接类型:
Item→Related Item(它们如何相关?父子?替代品?配件?)
遇到这些名称的最终用户必须猜测其含义或深入研究文档才能理解数据实际代表什么。
问题¶
| 问题 | 影响 |
|---|---|
| 误解 | 最终用户无法在没有额外上下文的情况下理解本体论,导致错误的分析和决策。 |
| 学习曲线陡峭 | 新团队成员必须花费大量时间学习模糊名称的实际含义。 |
| 文档依赖 | 文档变得必不可少而非补充,且很快过时。 |
| 跨团队混淆 | 不同团队对相同模糊名称有不同的解释,导致使用不一致。 |
解决方案¶
为所有本体论元素使用具体、描述性的名称。名称应具有自文档性,使任何人都能在没有额外上下文的情况下理解其含义。
✗ 避免 ✓ 推荐
──────────────────────────────────────── ────────────────────────────────────────
对象类型:Item → 对象类型:Product
对象类型:Sales Order Line Item
对象类型:Warehouse Inventory Record
属性:value → 属性:monetaryValue
属性:quantityOnHand
属性:riskScore
属性:type → 属性:productCategory
属性:serviceTier
属性:date → 属性:orderPlacedDate
属性:contractEffectiveDate
链接:Item → Related Item → 链接:Product → Purchasing Customer
链接:Employee → Supervisor
链接:Equipment → Manufacturing Facility
实施步骤:
- 在构建前建立命名约定,并通过治理审查强制执行。
- 使用具体、描述性的名称:
Product、Sales Order Line Item、Warehouse Inventory Record。 - 限定模糊属性:
monetaryValue、quantityOnHand、riskScore。 - 命名链接以解释关系:
Purchasing Customers、Manufacturing Facility、Supervisor。 - 为所有本体论元素添加描述,解释其含义和有效值。
- 与最终用户一起审查名称,确保其直观且无歧义。
构建成功的本体论¶
本指南中描述的反模式很常见,但可以避免。通过关注基本最佳实践(建模现实而非系统、有目的地筛选属性、跨团队协作、为每个任务选择正确的工具),您可以构建一个随组织需求扩展的本体论。
请记住,有效的本体论设计是迭代的。从清晰的实体定义开始,尽早让利益相关者参与,并在学习过程中完善您的模型。当遇到挑战时,重新审视本指南中的原则,以识别是否出现了反模式,并在其变得难以更改之前及时纠正。