Extract document metadata(提取文档元数据(Extract document metadata))¶
Supported in: Batch, Faster
Extracts metadata fields from a document.
Expression categories: Media
Declared arguments¶
- Media reference: The column containing media references to PDF files in a media set.
Expression\ - Metadata to include: Select the metadata columns to include in the output.
Set\>
Output type: Struct
Examples¶
Example 1: Base case¶
Argument values:
- Media reference:
Media Reference - Metadata to include: [
Document Author,Page Count,Document Title]
| Media Reference | Output |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { author: Jane Doe, page_count: 23, title: Document Title, } |
Example 2: Base case¶
Argument values:
- Media reference:
Media Reference - Metadata to include: [
Document Title]
| Media Reference | Output |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { title: Who Framed Roger Rabbit - Final Script, } |
Example 3: Base case¶
Argument values:
- Media reference:
Media Reference - Metadata to include: [
Document Author,Page Count]
| Media Reference | Output |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { author: John Smith, page_count: 78, } |
Example 4: Null case¶
Argument values:
- Media reference:
Media Reference - Metadata to include: []
| Media Reference | Output |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | null |
中文翻译¶
提取文档元数据(Extract document metadata)¶
支持:批处理(Batch)、快速处理(Faster)
从文档中提取元数据字段。
表达式类别: 媒体(Media)
声明参数(Declared arguments)¶
- 媒体引用(Media reference): 包含媒体集中 PDF 文件媒体引用的列。
表达式\ - 要包含的元数据(Metadata to include): 选择要包含在输出中的元数据列。
集合\<枚举\<字节数(Bytes)、文档作者(Document author)、文档标题(Document title)、页数(Page count)>>
输出类型: 结构体(Struct)
示例¶
示例 1:基础情况¶
参数值:
- 媒体引用:
Media Reference - 要包含的元数据: [
Document Author、Page Count、Document Title]
| Media Reference | 输出(Output) |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { author: Jane Doe, page_count: 23, title: Document Title, } |
示例 2:基础情况¶
参数值:
- 媒体引用:
Media Reference - 要包含的元数据: [
Document Title]
| Media Reference | 输出(Output) |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { title: Who Framed Roger Rabbit - Final Script, } |
示例 3:基础情况¶
参数值:
- 媒体引用:
Media Reference - 要包含的元数据: [
Document Author、Page Count]
| Media Reference | 输出(Output) |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | { author: John Smith, page_count: 78, } |
示例 4:空值情况¶
参数值:
- 媒体引用:
Media Reference - 要包含的元数据: []
| Media Reference | 输出(Output) |
|---|---|
| {"mimeType":"application/pdf","reference":{"type":"mediaSetItem","mediaSetItem":{"mediaSetRid":"ri.mio.test.media-set.1","mediaItemRid":"ri.mio.test.media-item.1"}}} | null |