Data Lineage questions（数据血缘(Data Lineage)常见问题）¶

The following are some frequently asked questions about Data Lineage.

For general information, view our Data Lineage documentation.

How can I see the backing and writeback datasets for my object type in Data Lineage?
What datasets in my pipeline also have a specific column?
Who was the last person to modify a resource on this pipeline?
How can I find which of my datasets have open transactions?
Where are most of the datasets used in the pipeline stored?
How can I share my unsaved Data Lineage graph?
Why is my dataset not up-to-date?

How can I see the backing and writeback datasets for my object type in Data Lineage?¶

First, add your object to the Data Lineage graph by searching for it in the right panel (the tab with a magnifying glass icon). Select Object types to filter your search, then enter the name of the object for which you want to view the backing and writeback datasets.
Next, select the arrow on the left side of your Object type to show its ancestors. This should produce one ancestor node if your object type is read-only and two ancestor nodes if your object type has writeback enabled. Make sure Resource overview is selected in the Node color options dropdown to see your Writeback Dataset colored as per the legend in the top right. Backing schema dataset colors depend on the transform type used.
Your writeback and backing datasets for an object type will also have a small globe icon in the top right.

Return to top

What datasets in my pipeline also have a specific column?¶

First, ensure all desired datasets in your pipeline have been added to the Data Lineage graph.
Next, select desired datasets by using the Select mode in the Tools toggle in the upper left corner of the canvas.
Then open the Histogram of selection properties from the right side panel. Under the section titled Frequent columns, you will see the most frequent columns by column name in your selection.

Selecting one of these columns will highlight the datasets in your selection that contain this column.

Return to top

Who was the last person to modify a resource on this pipeline?¶

First, ensure that all datasets of interest in your pipeline have been added to the Data Lineage graph.
Next, select datasets by using Select mode from the Tools toggle in the upper left corner of your screen. Then, open the Histogram of selection properties from the right side panel.
Under the Last Modified section, you will see the last user(s) to modify datasets in your selection. Selecting a username will highlight the datasets that user last modified within the graph.

Return to top

How can I find which of my datasets have open transactions?¶

In the dropdown menu in the top right side, choose Build Status. Now, you should be able to see if any dataset is currently running. Any such dataset has an open transaction.

Return to top

Where are most of the datasets used in the pipeline stored?¶

First, ensure that all datasets of interest in your pipeline have been added to the Data Lineage graph
Next, select all datasets of interest with the Select mode from the Tools toggle in the upper left corner of your screen. Then, open the Histogram of selection properties from the right side panel.
Under the section titled Frequent folder paths, you will see the most common folder paths for resources in your selection.

Selecting a golden path will highlight the resources in this path on the graph. Hovering over a folder path will show you the full path.

You can select multiple properties in the Histogram of selection properties panel such that the graph highlights all resources that satisfy your selection.

Return to top

To share your unsaved Data Lineage, select the arrow in the top right corner near Save. Once there, you can see a quick share link.

Return to top

Why is my dataset not up-to-date?¶

There are a few reasons why your dataset may not be up-to-date.

Consider the following reasons why your dataset may not be up-to-date:

Is your dataset build failing?
Is there an upstream dataset that has not built and is not up-to-date?
Have you received up-to-date data from the source?

You can easily answer these questions in Data Lineage:

First, verify the status of each of the resources in your pipeline by opening up the dataset of interest in Data Lineage and then right-clicking on the node.
Then, select Expand node.... You can view all ancestor nodes for that dataset by selecting the double left arrow above Expand parents....
Next, select the Build status option in the Node color options dropdown menu in the top right to view the build status of every resource in your pipeline. This view of your pipeline will make it easier to diagnose stale datasets.

Return to top

中文翻译¶

数据血缘(Data Lineage)常见问题¶

以下是关于数据血缘(Data Lineage)的一些常见问题。

如需了解常规信息，请参阅我们的数据血缘文档。

如何在数据血缘中查看我的对象类型(Object type)的支撑数据集(Backing dataset)和回写数据集(Writeback dataset)？
我的数据管道(Pipeline)中还有哪些数据集包含特定列？
谁是最后一个修改此数据管道上资源(Resource)的人？
如何查找我的哪些数据集存在未提交的事务(Transaction)？
数据管道中使用的大多数数据集存储在哪里？
如何分享我未保存的数据血缘图？
为什么我的数据集不是最新的？

如何在数据血缘中查看我的对象类型的支撑数据集和回写数据集？¶

首先，在右侧面板（带有放大镜图标的选项卡）中搜索您的对象，将其添加到数据血缘图中。选择对象类型以过滤搜索，然后输入您想查看其支撑数据集和回写数据集的对象名称。
接下来，点击对象类型左侧的箭头以显示其祖先节点(Ancestor)。如果您的对象类型是只读的，这将生成一个祖先节点；如果启用了回写功能，则生成两个祖先节点。确保在节点颜色选项下拉菜单中选择了资源概览，以便根据右上角的图例为回写数据集着色。支撑架构数据集的颜色取决于所使用的转换(Transform)类型。
对象类型的回写数据集和支撑数据集右上角还会有一个小型地球图标。

返回顶部