跳转至

Data Lineage questions(数据血缘(Data Lineage)常见问题)

The following are some frequently asked questions about Data Lineage.

For general information, view our Data Lineage documentation.


How can I see the backing and writeback datasets for my object type in Data Lineage?

  • First, add your object to the Data Lineage graph by searching for it in the right panel (the tab with a magnifying glass icon). Select Object types to filter your search, then enter the name of the object for which you want to view the backing and writeback datasets.

  • Next, select the arrow on the left side of your Object type to show its ancestors. This should produce one ancestor node if your object type is read-only and two ancestor nodes if your object type has writeback enabled. Make sure Resource overview is selected in the Node color options dropdown to see your Writeback Dataset colored as per the legend in the top right. Backing schema dataset colors depend on the transform type used.

  • Your writeback and backing datasets for an object type will also have a small globe icon in the top right.

Return to top


What datasets in my pipeline also have a specific column?

  1. First, ensure all desired datasets in your pipeline have been added to the Data Lineage graph.
  2. Next, select desired datasets by using the Select mode in the Tools toggle in the upper left corner of the canvas.
  3. Then open the Histogram of selection properties from the right side panel. Under the section titled Frequent columns, you will see the most frequent columns by column name in your selection.

Selecting one of these columns will highlight the datasets in your selection that contain this column.

Return to top


Who was the last person to modify a resource on this pipeline?

  • First, ensure that all datasets of interest in your pipeline have been added to the Data Lineage graph.
  • Next, select datasets by using Select mode from the Tools toggle in the upper left corner of your screen. Then, open the Histogram of selection properties from the right side panel.
  • Under the Last Modified section, you will see the last user(s) to modify datasets in your selection. Selecting a username will highlight the datasets that user last modified within the graph.

Return to top


How can I find which of my datasets have open transactions?

In the dropdown menu in the top right side, choose Build Status. Now, you should be able to see if any dataset is currently running. Any such dataset has an open transaction.

Return to top


Where are most of the datasets used in the pipeline stored?

  • First, ensure that all datasets of interest in your pipeline have been added to the Data Lineage graph
  • Next, select all datasets of interest with the Select mode from the Tools toggle in the upper left corner of your screen. Then, open the Histogram of selection properties from the right side panel.
  • Under the section titled Frequent folder paths, you will see the most common folder paths for resources in your selection.

Selecting a golden path will highlight the resources in this path on the graph. Hovering over a folder path will show you the full path.

You can select multiple properties in the Histogram of selection properties panel such that the graph highlights all resources that satisfy your selection.

Return to top


How can I share my unsaved Data Lineage graph?

To share your unsaved Data Lineage, select the arrow in the top right corner near Save. Once there, you can see a quick share link.

Return to top


Why is my dataset not up-to-date?

There are a few reasons why your dataset may not be up-to-date.

Consider the following reasons why your dataset may not be up-to-date:

  • Is your dataset build failing?
  • Is there an upstream dataset that has not built and is not up-to-date?
  • Have you received up-to-date data from the source?

You can easily answer these questions in Data Lineage:

  1. First, verify the status of each of the resources in your pipeline by opening up the dataset of interest in Data Lineage and then right-clicking on the node.

  2. Then, select Expand node.... You can view all ancestor nodes for that dataset by selecting the double left arrow above Expand parents....

  3. Next, select the Build status option in the Node color options dropdown menu in the top right to view the build status of every resource in your pipeline. This view of your pipeline will make it easier to diagnose stale datasets.

Return to top


中文翻译

数据血缘(Data Lineage)常见问题

以下是关于数据血缘(Data Lineage)的一些常见问题。

如需了解常规信息,请参阅我们的数据血缘文档


如何在数据血缘中查看我的对象类型的支撑数据集和回写数据集?

  • 首先,在右侧面板(带有放大镜图标的选项卡)中搜索您的对象,将其添加到数据血缘图中。选择对象类型以过滤搜索,然后输入您想查看其支撑数据集和回写数据集的对象名称。

  • 接下来,点击对象类型左侧的箭头以显示其祖先节点(Ancestor)。如果您的对象类型是只读的,这将生成一个祖先节点;如果启用了回写功能,则生成两个祖先节点。确保在节点颜色选项下拉菜单中选择了资源概览,以便根据右上角的图例为回写数据集着色。支撑架构数据集的颜色取决于所使用的转换(Transform)类型。

  • 对象类型的回写数据集和支撑数据集右上角还会有一个小型地球图标。

返回顶部


我的数据管道中还有哪些数据集包含特定列?

  1. 首先,确保数据管道中所有需要的数据集都已添加到数据血缘图中。
  2. 接下来,使用画布左上角工具切换按钮中的选择模式来选中所需的数据集。
  3. 然后从右侧面板打开所选内容属性直方图。在高频列部分,您将看到所选内容中按列名排列的最常见列。

选择其中一列将高亮显示所选内容中包含该列的数据集。

返回顶部


谁是最后一个修改此数据管道上资源的人?

  • 首先,确保数据管道中所有感兴趣的数据集都已添加到数据血缘图中。
  • 接下来,使用屏幕左上角工具切换按钮中的“选择”模式选中数据集。然后,从右侧面板打开所选内容属性直方图
  • 最后修改部分,您将看到最后修改所选数据集中数据的用户。选择用户名将在图中高亮显示该用户最后修改的数据集。

返回顶部


如何查找我的哪些数据集存在未提交的事务?

在右上角的下拉菜单中,选择构建状态。现在,您应该能够查看是否有任何数据集正在运行。任何此类数据集都存在未提交的事务。

返回顶部


数据管道中使用的大多数数据集存储在哪里?

  • 首先,确保数据管道中所有感兴趣的数据集都已添加到数据血缘图中。
  • 接下来,使用屏幕左上角工具切换按钮中的选择模式选中所有感兴趣的数据集。然后,从右侧面板打开所选内容属性直方图
  • 高频文件夹路径部分,您将看到所选内容中资源最常见的文件夹路径。

选择黄金路径将在图上高亮显示该路径下的资源。将鼠标悬停在文件夹路径上可显示完整路径。

您可以在所选内容属性直方图面板中选择多个属性,以便图高亮显示满足您选择条件的所有资源。

返回顶部


如何分享我未保存的数据血缘图?

要分享未保存的数据血缘图,请点击右上角“保存”附近的箭头。在那里,您可以看到一个快速分享链接。

返回顶部


为什么我的数据集不是最新的?

您的数据集不是最新的可能有以下几个原因。

请考虑以下导致数据集未更新的原因:

  • 您的数据集构建是否失败?
  • 是否存在尚未构建且不是最新的上游数据集?
  • 您是否从源端接收到了最新数据?

您可以在数据血缘中轻松找到这些问题的答案:

  1. 首先,在数据血缘中打开感兴趣的数据集,然后右键点击节点,验证数据管道中每个资源的状态。

  2. 然后,选择展开节点...。您可以通过选择展开父节点...上方的双左箭头来查看该数据集的所有祖先节点。

  3. 接下来,在右上角的节点颜色选项下拉菜单中选择构建状态选项,以查看数据管道中每个资源的构建状态。这种数据管道视图将使诊断陈旧数据集变得更加容易。

返回顶部