跳转至

Extract document metadata(提取文档元数据(Extract document metadata))

Supported in: Batch, Faster

Extracts metadata fields from a document.

Expression categories: Media

Declared arguments

  • Media reference: The column containing media references to PDF files in a media set.
    Expression\
  • Metadata to include: Select the metadata columns to include in the output.
    Set\>

Output type: Struct

Examples

Example 1: Base case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Author, Page Count, Document Title]
Media Reference Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
author: Jane Doe,
page_count: 23,
title: Document Title,
}

Example 2: Base case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Title]
Media Reference Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
title: Who Framed Roger Rabbit - Final Script,
}

Example 3: Base case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: [Document Author, Page Count]
Media Reference Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
author: John Smith,
page_count: 78,
}

Example 4: Null case

Argument values:

  • Media reference: Media Reference
  • Metadata to include: []
Media Reference Output
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} null


中文翻译


提取文档元数据(Extract document metadata)

支持:批处理(Batch)、快速处理(Faster)

从文档中提取元数据字段。

表达式类别: 媒体(Media)

声明参数(Declared arguments)

  • 媒体引用(Media reference): 包含媒体集中 PDF 文件媒体引用的列。
    表达式\
  • 要包含的元数据(Metadata to include): 选择要包含在输出中的元数据列。
    集合\<枚举\<字节数(Bytes)、文档作者(Document author)、文档标题(Document title)、页数(Page count)>>

输出类型: 结构体(Struct)

示例

示例 1:基础情况

参数值:

  • 媒体引用: Media Reference
  • 要包含的元数据: [Document AuthorPage CountDocument Title]
Media Reference 输出(Output)
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
author: Jane Doe,
page_count: 23,
title: Document Title,
}

示例 2:基础情况

参数值:

  • 媒体引用: Media Reference
  • 要包含的元数据: [Document Title]
Media Reference 输出(Output)
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
title: Who Framed Roger Rabbit - Final Script,
}

示例 3:基础情况

参数值:

  • 媒体引用: Media Reference
  • 要包含的元数据: [Document AuthorPage Count]
Media Reference 输出(Output)
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} {
author: John Smith,
page_count: 78,
}

示例 4:空值情况

参数值:

  • 媒体引用: Media Reference
  • 要包含的元数据: []
Media Reference 输出(Output)
{"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} null