跳转至

Media(媒体(Media))

Functions enable you to access and modify media in TypeScript v2, Python, and TypeScript v1. TypeScript v2 and Python use the Media type to read, upload, and transform media, and support media uploads through Ontology edits and the OSDK. TypeScript v1 functions provide a MediaItem type with built-in operations for working with different kinds of media without external libraries.

If you need any operations that don't currently exist out-of-the-box, you will likely need to use external libraries or write your own custom code. Learn more about adding dependencies to functions repositories.

TypeScript v2 and Python

Use Ontology edit functions to upload media and create objects in the Ontology. Once uploaded, you can read and download media files from objects for use in your application. Learn more about media sets in Foundry.

You can construct Ontology edits in TypeScript v2 and Python functions by uploading media to the Ontology to obtain a Media instance. The Media type wraps a MediaReference and exposes higher-level operations for fetching contents, fetching metadata, and attaching media to objects. You can use the Media to construct an Ontology edit, or pass existing media into the function as a parameter.

Use as a function input or output type

Functions can take in a Media as an input, create temporary media by uploading data with uploadMedia, or retrieve Media from a media reference property on an object. Functions can return a Media type as well, whether it has been temporarily uploaded, or if it came from an object's media reference property. In a function, you can fetch the byte contents of the Media, fetch its metadata, or attach it to an Ontology object via Ontology edits. In Python, you can also fetch the full per-variant metadata; in TypeScript v2, fetchMetadata currently exposes only the high-level fields (mediaType, sizeBytes, path).

```typescript tab="TypeScript v2" import type { Media } from "@osdk/client";

export default async function echoMedia(media: Media): Promise { return media; }

```python tab="Python"
from functions.api import function, Media
# The Media type may also be imported from foundry_sdk_runtime
# from foundry_sdk_runtime.media import Media

@function
def echo_media(media: Media) -> Media:
    return media

Upload media

Use the Ontology SDK uploadMedia (TypeScript v2) and client.ontology.media.upload_media (Python) helpers to upload raw bytes within a function. Both return a Media, which you can then edit an Ontology object media property with an Ontology edit or return from the function.

```typescript tab="TypeScript v2" import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function uploadMediaItem( client: Client, body: string, fileName: string, ): Promise { const blob = new Blob([body], { type: "text/plain" }); const media: Media = await uploadMedia( client, { data: blob, fileName } ); return media; }

```python tab="Python"
from ontology_sdk import FoundryClient
from foundry_sdk_runtime.media import Media
from functions.api import function

@function(beta=True)
def upload_media(body: str, media_set_filename: str) -> Media:
    client = FoundryClient()
    media: Media = client.ontology.media.upload_media(
        body=body.encode("utf8"),
        filename=media_set_filename,
    )
    return media

```python tab="Python (async)" from ontology_sdk import FoundryClient from foundry_sdk_runtime.media import Media from functions.api import function

@function(beta=True) async def upload_media(body: str, media_set_filename: str) -> Media: client = FoundryClient() media_coroutine = client.ontology.media.async_upload_media( body=body.encode("utf8"), filename=media_set_filename, ) # media_coroutine is awaitable. return await media_coroutine

:::callout{theme="info"}
Uploading media is temporary, unless set to an Ontology object's media reference property. When the Ontology edits are applied, the media is then persisted on the Ontology object property.
:::

### Upload media in Ontology edit functions

Whether you uploaded media within a function or received a `Media` as an input to the function, you can update media properties on existing Ontology objects or create new Ontology objects with `Media` parameters.

```typescript tab="TypeScript v2"
// Ensure you are using TypeScript OSDK 2.16 or greater

import type { Client, Media } from "@osdk/client";
import { Aircraft } from "@ontology-sdk/sdk";
import type { Edits } from "@osdk/functions";
import { createEditBatch, uploadMedia } from "@osdk/functions";

async function uploadTextToNewPlane(client: Client): Promise<Edits.Object<Aircraft>[]> {
    const batch = createEditBatch<Edits.Object<Aircraft>>(client);
    const blob = new Blob(["Hello, world"], { type: "text/plain" });
    const media: Media = await uploadMedia(
        client,
        { data: blob, fileName: "/planes/aircraft.txt" }
    );
    batch.create(Aircraft, { myMediaProperty: media, /* ... */ });
    return batch.getEdits();
}

export default uploadTextToNewPlane;

```python tab="Python"

Ensure you are using Python OSDK 2.198 or greater

from ontology_sdk import FoundryClient from ontology_sdk.ontology.objects import Aircraft from functions.api import function, OntologyEdit from foundry_sdk_runtime.media import Media

@function(beta=True, edits=[Aircraft]) def upload_text_to_new_plane() -> list[OntologyEdit]: client = FoundryClient() edits = client.ontology.edits() media: Media = client.ontology.media.upload_media( body="Hello, world".encode("utf8"), filename="/planes/aircraft.txt", ) edits.objects.Aircraft.create( pk = "primary_key", my_media_property=media, # ... ) return edits.get_edits()

:::callout{theme="info"}
In TypeScript OSDK generator versions before 2.20, `uploadMedia` returned a `MediaReference`. Starting in version 2.20, `uploadMedia` returns a `Media`, which wraps the underlying `MediaReference` and exposes higher-level operations such as `fetchContents`, `fetchMetadata`, and `getMediaReference()`. You can pass a `Media` directly into `createEditBatch` operations.
:::

### Passing a media reference parameter on action type

Action parameters of type media reference can be passed to the function as a parameter.

The screenshot below shows an action passing a media parameter to its backing function.

<img src="./media/media-tutorial-media-action-parameter.png" alt = "action" width="600"/>

### Media Ontology SDK operations

:::callout{theme="info"}
The methods below work on any `Media` instance, including those returned from `upload_media` and those exposed as `Media` properties on object types.
:::

#### Retrieve media bytes data

You can access the raw data stored on the `Media`. The signature for the method is as follows:

```typescript tab="TypeScript v2"
fetchContents(): Promise<Response>;

// "Response" is a standard interface on the JavaScript Fetch API
// https://developer.mozilla.org/en-US/docs/Web/API/Response
const mediaContents: Response = await myAircraft.myMediaProperty.fetchContents();

if (mediaContents.ok) {
    const mediaMimeType = mediaContents.headers.get("Content-Type");

    // Blob is a standard JavaScript type, representing a file-like object of immutable, raw data.
    // https://developer.mozilla.org/en-US/docs/Web/API/Blob
    // https://developer.mozilla.org/en-US/docs/Web/API/Response/blob
    const mediaBlob: Blob = await mediaContents.blob();
}

```python tab="Python" get_media_content(self) -> BytesIO: ...

from io import BytesIO

https://docs.python.org/3/library/io.html#io.BytesIO

raw_data: BytesIO = my_aircraft.my_media_property.get_media_content()

### Get media metadata

You can retrieve the metadata of the `Media`:

```typescript tab="TypeScript v2"
fetchMetadata(): Promise<MediaMetadata>;

// Example usage:
const mediaMetadata = await myAircraft.myMediaProperty.fetchMetadata();
const sizeBytes = mediaMetadata.sizeBytes;
const mediaType = mediaMetadata.mediaType;

```python tab="Python" from foundry_sdk_runtime.media import MediaMetadata

Example usage:

media_metadata: MediaMetadata = my_aircraft.my_media_property.get_media_metadata() path = media_metadata.path size_bytes = media_metadata.size_bytes media_type = media_metadata.media_type

In Python, `get_media_full_metadata()` returns a `MediaFullMetadata` whose `item_metadata` is a discriminated union over the media type. Narrow on the variant class (or check `item_metadata.type`) to access type-specific fields:

```python tab="Python"
get_media_full_metadata(self) -> MediaFullMetadata: ...

# Narrow on the variant class (or check item_metadata.type) to access type-specific fields.
# Other variants include AudioMediaItemMetadata, VideoMediaItemMetadata,
# SpreadsheetMediaItemMetadata, Model3dMediaItemMetadata, DicomMediaItemMetadata,
# EmailMediaItemMetadata, and UntypedMediaItemMetadata. See the full schema:
# https://github.com/palantir/foundry-platform-python/blob/develop/docs/v2/MediaSets/models/MediaItemMetadata.md

from foundry_sdk.v2.media_sets.models import (
    DocumentMediaItemMetadata,
    ImageryMediaItemMetadata,
)
from foundry_sdk_runtime.media import MediaFullMetadata

full_metadata: MediaFullMetadata = my_aircraft.my_media_property.get_media_full_metadata()
item = full_metadata.item_metadata

if isinstance(item, DocumentMediaItemMetadata):
    page_count = item.pages
    title = item.title
elif isinstance(item, ImageryMediaItemMetadata):
    dimensions = item.dimensions
    bands = item.bands

Transform media

:::callout{theme="warning"} Media transformations are in the beta stage of development. Functionality may change during active development. :::

You can transform media items (such as rotating, resizing, or re-encoding images, slicing or rendering PDF pages, or running OCR) and wait for the result. The transformation job is submitted, it is polled to completion, and the transformed content is returned.

In TypeScript v2, transformations are exposed through @osdk/api/unstable as an experimental helper. In Python, call client.ontology.media.transform_and_wait on a generated FoundryClient. The async variant async_transform_and_wait takes the same arguments and can be awaited.

``typescript tab="TypeScript v2" // Ensure you are using @osdk/api 2.8.0 or greater for transformAndWait. // "MediaTransformation" is a discriminated union: // each variant ($image,$video,$audio,$documentToText,$documentToImage,$documentToDocument,$audioToText`, etc.) // selects a transformation kind, with its own encoding and operation fields. // See the "MediaTransformation" type definition for a full set of variants and operations: // https://github.com/palantir/osdk-ts/blob/main/packages/api/src/experimental/MediaTransformation.ts

import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function rotateImage( client: Client, media: Media, ): Promise { const transformation: MediaTransformation = { $image: { $encoding: "jpg", $operations: [{ $rotate: { $angle: "DEGREE_180" } }], }, };

const result: Response = await client(
    __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait,
).transformAndWait({
    mediaReference: media.getMediaReference(),
    transformation,
    options: { pollIntervalMs: 3000, pollTimeoutMs: 30000 },
});

if (!result.ok) {
    // The transformation failed; inspect result.status / result.text() for details.
    throw new Error(`Transformation failed with status ${result.status}`);
}

// Re-upload the transformed bytes so the function returns a Media.
return uploadMedia(client, { data: await result.blob(), fileName: "rotated.jpg" });

} python tab="Python" from foundry_sdk.v2.media_sets.models import ( ImageTransformation, JpgFormat, RotateImageOperation, ) from foundry_sdk_runtime.errors import ( MediaTransformationFailedError, MediaTransformationTimeoutError, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient @function(beta=True) def image_transform(document: Media) -> Media: client = FoundryClient() transformation = ImageTransformation( encoding=JpgFormat(), operations=[RotateImageOperation(angle="DEGREE_180")], ) try: transformed_bytes: bytes = client.ontology.media.transform_and_wait( media_reference=document.get_media_reference(), transformation=transformation, poll_interval_seconds=3.0, poll_timeout_seconds=30.0, ) except MediaTransformationFailedError: # The transformation job reported FAILED status. raise except MediaTransformationTimeoutError: # poll_timeout_seconds elapsed before the job completed. raise # Re-upload the transformed bytes so the function returns a Media. return client.ontology.media.upload_media(body=transformed_bytes, filename="rotated.jpg") ```

Example: Run page-by-page OCR on a PDF with bounding box output

This workflow takes a PDF (uploaded to a media set or attached to an object) and runs OCR on every page, requesting hOCR output. hOCR is HTML with bbox attributes on every detected word and line, so you can extract both the recognized text and its bounding box coordinates from the same response. Each transform_and_wait call returns the bytes for one page; iterate to cover the whole document.

```typescript tab="TypeScript v2" import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import type { Integer } from "@osdk/functions";

export default async function ocrPdfPages( client: Client, media: Media, pageCount: Integer, ): Promise { const transformAndWait = client( __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, ).transformAndWait; const mediaReference = media.getMediaReference();

const pageResults: string[] = [];
for (let pageNumber = 0; pageNumber < pageCount; pageNumber++) {
    const transformation: MediaTransformation = {
        $documentToText: {
            $operation: {
                $ocrOnPage: {
                    $pageNumber: pageNumber,
                    $parameters: {
                        $outputFormat: { $hocr: {} },
                        $languages: [{ $language: "ENG" }],
                    },
                },
            },
        },
    };

    const result = await transformAndWait({
        mediaReference,
        transformation,
        options: { pollTimeoutMs: 120_000 },
    });
    if (!result.ok) {
        throw new Error(`OCR failed on page ${pageNumber}: ${result.status}`);
    }
    pageResults.push(await result.text());
}
return pageResults;

} python tab="Python" from foundry_sdk.v2.media_sets.models import ( DocumentMediaItemMetadata, DocumentToTextTransformation, OcrHocrOutputFormat, OcrLanguageWrapper, OcrOnPageOperation, OcrParameters, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient @function(beta=True) def ocr_pdf_pages(document: Media) -> list[bytes]: """Run OCR on every page of a PDF and return the hOCR bytes per page. Each hOCR document includes bbox attributes on detected words, lines, and paragraphs; parse with any HTML parser to recover both text and bounding boxes in a single pass. """ client = FoundryClient() metadata = document.get_media_full_metadata().item_metadata if not isinstance(metadata, DocumentMediaItemMetadata) or metadata.pages is None: raise ValueError("Expected a PDF document with a known page count") media_reference = document.get_media_reference() page_results: list[bytes] = [] for page_number in range(metadata.pages): transformation = DocumentToTextTransformation( operation=OcrOnPageOperation( page_number=page_number, parameters=OcrParameters( output_format=OcrHocrOutputFormat(), languages=[OcrLanguageWrapper(language="ENG")], ), ), ) hocr_bytes: bytes = client.ontology.media.transform_and_wait( media_reference=media_reference, transformation=transformation, poll_timeout_seconds=120.0, ) page_results.append(hocr_bytes) return page_results ```

Dense pages can push OCR runtime well past the default function timeout. See Manage published functions to configure function execution timeouts.

Example: Render PDF pages as images and slice ranges

For workflows that need the visual rendering of each page (for downstream image annotation, embedding, or display), use $documentToImage with $renderPage to get a PNG/JPG image of a specific page. To extract a sub-range of the PDF as its own PDF document, use $documentToDocument with $slicePdfRange. Each function below re-uploads the transformed bytes so it can return a Media. Each function is its own module; a registered function is the module's export default.

Render a single page as a PNG image:

```typescript tab="TypeScript v2" import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function renderFirstPageAsPng( client: Client, media: Media, ): Promise { const transformation: MediaTransformation = { $documentToImage: { $encoding: "png", $operation: { $renderPage: { $pageNumber: 0, $width: 1200 } }, }, }; const result = await client( __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, ).transformAndWait({ mediaReference: media.getMediaReference(), transformation }); if (!result.ok) { throw new Error(Render failed: ${result.status}); } // Re-upload the rendered page so the function returns a Media. return uploadMedia(client, { data: await result.blob(), fileName: "page.png" }); }

```python tab="Python"
from foundry_sdk.v2.media_sets.models import (
    DocumentToImageTransformation,
    PngFormat,
    RenderPageOperation,
)
from foundry_sdk_runtime.media import Media
from functions.api import function
from ontology_sdk import FoundryClient

@function(beta=True)
def render_first_page_as_png(document: Media) -> Media:
    """Render page 0 of a PDF at 1200px wide as a PNG and return it as a Media."""
    client = FoundryClient()
    transformation = DocumentToImageTransformation(
        encoding=PngFormat(),
        operation=RenderPageOperation(page_number=0, width=1200),
    )
    rendered_png: bytes = client.ontology.media.transform_and_wait(
        media_reference=document.get_media_reference(),
        transformation=transformation,
    )
    # Re-upload the rendered page so the function returns a Media.
    return client.ontology.media.upload_media(body=rendered_png, filename="page.png")

Slice a page range into a new PDF document:

```typescript tab="TypeScript v2" import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function sliceFirstTenPages( client: Client, media: Media, ): Promise { const transformation: MediaTransformation = { $documentToDocument: { $encoding: "pdf", $operation: { $slicePdfRange: { $startPageInclusive: 0, $endPageExclusive: 10, $strictlyEnforceEndPage: false, }, }, }, }; const result = await client( __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, ).transformAndWait({ mediaReference: media.getMediaReference(), transformation }); if (!result.ok) { throw new Error(Slice failed: ${result.status}); } // Re-upload the sliced PDF so the function returns a Media. return uploadMedia(client, { data: await result.blob(), fileName: "slice.pdf" }); }

```python tab="Python"
from foundry_sdk.v2.media_sets.models import (
    DocumentToDocumentTransformation,
    PdfFormat,
    SlicePdfRangeOperation,
)
from foundry_sdk_runtime.media import Media
from functions.api import function
from ontology_sdk import FoundryClient

@function(beta=True)
def slice_first_ten_pages(document: Media) -> Media:
    """Return a new PDF containing pages 0-9 of the input PDF as a Media."""
    client = FoundryClient()
    transformation = DocumentToDocumentTransformation(
        encoding=PdfFormat(),
        operation=SlicePdfRangeOperation(
            start_page_inclusive=0,
            end_page_exclusive=10,
            strictly_enforce_end_page=False,  # tolerate documents shorter than 10 pages
        ),
    )
    sliced_pdf: bytes = client.ontology.media.transform_and_wait(
        media_reference=document.get_media_reference(),
        transformation=transformation,
    )
    # Re-upload the sliced PDF so the function returns a Media.
    return client.ontology.media.upload_media(body=sliced_pdf, filename="slice.pdf")

Example: Annotate every page with detected bounding boxes

To produce a visual debugging output (each PDF page rendered with its OCR-detected bounding boxes drawn on top) chain three transformations for every page. For each page, render the page as an image, OCR the same page to recover word/line bounding boxes, then re-upload the rendered image and annotate it with $image.$annotate. The page count comes from get_media_full_metadata(), which is currently available in Python only. Each step calls transform_and_wait and feeds the bytes of the previous step into the next as a fresh upload, and each annotated page is re-uploaded so the function returns one Media per page.

```python tab="Python" from foundry_sdk.v2.media_sets.models import ( AnnotateImageOperation, Annotation, BoundingBox, BoundingBoxGeometry, DocumentMediaItemMetadata, DocumentToImageTransformation, DocumentToTextTransformation, ImageTransformation, OcrHocrOutputFormat, OcrLanguageWrapper, OcrOnPageOperation, OcrParameters, PngFormat, RenderPageOperation, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient

@function(beta=True) def annotate_pdf_with_ocr_boxes(document: Media) -> list[Media]: """Render every page of a PDF, OCR each page to find text bounding boxes, draw them on the rendered image, and return one annotated Media per page.""" client = FoundryClient() media_reference = document.get_media_reference()

# Use the full metadata (Python only) to discover the page count.
metadata = document.get_media_full_metadata().item_metadata
if not isinstance(metadata, DocumentMediaItemMetadata) or metadata.pages is None:
    raise ValueError("Expected a PDF document with a known page count")

annotated_pages: list[Media] = []
for page_number in range(metadata.pages):
    # 1. Render the page as a PNG.
    rendered_png: bytes = client.ontology.media.transform_and_wait(
        media_reference=media_reference,
        transformation=DocumentToImageTransformation(
            encoding=PngFormat(),
            operation=RenderPageOperation(page_number=page_number, width=1200),
        ),
    )

    # 2. OCR the same page in hOCR mode to get word-level bounding boxes.
    hocr_bytes: bytes = client.ontology.media.transform_and_wait(
        media_reference=media_reference,
        transformation=DocumentToTextTransformation(
            operation=OcrOnPageOperation(
                page_number=page_number,
                parameters=OcrParameters(
                    output_format=OcrHocrOutputFormat(),
                    languages=[OcrLanguageWrapper(language="ENG")],
                ),
            ),
        ),
        poll_timeout_seconds=120.0,
    )

    # 3. Parse hOCR for bounding boxes in image pixels.
    # The parse_hocr_bounding_boxes helper is omitted here; see the note below the example.
    boxes: list[tuple[str, BoundingBox]] = parse_hocr_bounding_boxes(hocr_bytes)

    # 4. Re-upload the rendered PNG as a temporary media item.
    rendered_media = client.ontology.media.upload_media(
        body=rendered_png, filename=f"page-{page_number}.png"
    )

    # 5. Annotate the rendered page with a Media transformation.
    annotated_bytes: bytes = client.ontology.media.transform_and_wait(
        media_reference=rendered_media.get_media_reference(),
        transformation=ImageTransformation(
            encoding=PngFormat(),
            operations=[
                AnnotateImageOperation(
                    annotations=[
                        Annotation(
                            geometry=BoundingBoxGeometry(bounding_box=box),
                            label=label,
                        )
                        for label, box in boxes
                    ],
                ),
            ],
        ),
    )

    # 6. Re-upload the annotated page so the function returns a Media.
    annotated_pages.append(
        client.ontology.media.upload_media(
            body=annotated_bytes, filename=f"page-{page_number}-annotated.png"
        )
    )

return annotated_pages

python tab="Python (async)" import asyncio from foundry_sdk.v2.media_sets.models import ( AnnotateImageOperation, Annotation, BoundingBox, BoundingBoxGeometry, DocumentMediaItemMetadata, DocumentToImageTransformation, DocumentToTextTransformation, ImageTransformation, OcrHocrOutputFormat, OcrLanguageWrapper, OcrOnPageOperation, OcrParameters, PngFormat, RenderPageOperation, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient @function(beta=True) async def annotate_pdf_with_ocr_boxes(document: Media) -> list[Media]: """Render every page of a PDF, OCR each page, annotate it, and return one Media per page.""" client = FoundryClient() media_reference = document.get_media_reference() metadata = document.get_media_full_metadata().item_metadata if not isinstance(metadata, DocumentMediaItemMetadata) or metadata.pages is None: raise ValueError("Expected a PDF document with a known page count") async def annotate_page(page_number: int) -> Media: # Render the page as a PNG and OCR the same page concurrently. # Both transformations read from the same source document and are independent, # so asyncio.gather lets them poll in parallel instead of one after the other. rendered_png, hocr_bytes = await asyncio.gather( client.ontology.media.async_transform_and_wait( media_reference=media_reference, transformation=DocumentToImageTransformation( encoding=PngFormat(), operation=RenderPageOperation(page_number=page_number, width=1200), ), ), client.ontology.media.async_transform_and_wait( media_reference=media_reference, transformation=DocumentToTextTransformation( operation=OcrOnPageOperation( page_number=page_number, parameters=OcrParameters( output_format=OcrHocrOutputFormat(), languages=[OcrLanguageWrapper(language="ENG")], ), ), ), poll_timeout_seconds=120.0, ), ) # Parse hOCR for bounding boxes (see the sync example) and re-upload the # rendered PNG as a temporary media item, both concurrently. boxes, rendered_media = await asyncio.gather( async_parse_hocr_bounding_boxes(hocr_bytes), client.ontology.media.async_upload_media( body=rendered_png, filename=f"page-{page_number}.png", ), ) # Annotate the rendered page with a Media transformation. annotated_bytes: bytes = await client.ontology.media.async_transform_and_wait( media_reference=rendered_media.get_media_reference(), transformation=ImageTransformation( encoding=PngFormat(), operations=[ AnnotateImageOperation( annotations=[ Annotation( geometry=BoundingBoxGeometry(bounding_box=box), label=label, ) for label, box in boxes ], ), ], ), ) # Re-upload the annotated page so the function returns a Media. return await client.ontology.media.async_upload_media( body=annotated_bytes, filename=f"page-{page_number}-annotated.png", ) # Process every page concurrently. return list(await asyncio.gather(*(annotate_page(p) for p in range(metadata.pages)))) ```

The parse_hocr_bounding_boxes helper is omitted here. Any HTML parser (such as lxml or BeautifulSoup) can extract class="ocrx_word" elements and their title="bbox X1 Y1 X2 Y2 ..." attributes, which you convert into BoundingBox(left=X1, top=Y1, width=X2-X1, height=Y2-Y1).

TypeScript v1

:::callout{theme="warning"} Foundry enacts strict memory limits when executing TypeScript v1 functions. To ensure you do not exceed those memory limits, you should only interact with media files under 20MB. :::

:::callout{theme="warning"} Uploading media within a function is not supported in TypeScript v1. The examples below cover passing existing media into Ontology edits and operating on media properties of object types. :::

Setting existing media on an object

Use Ontology edit functions to attach existing media items to objects:

```typescript tab="TypeScript v1" import { OntologyEditFunction, MediaItem } from "@foundry/functions-api"; import { Aircraft } from "@foundry/ontology-api";

export class MyFunctions { @OntologyEditFunction() public async setExistingMediaToObject( aircraft: Aircraft, mediaItem: MediaItem ): Promise { // Ontology Edits with passed in MediaItems are supported aircraft.myMediaProperty = mediaItem; } }

### Media item parameter on object types

The following example shows the `isAudio` media operations on a media reference property of an object type:

```typescript tab="TypeScript v1"
MediaItem.isAudio(objectType.mediaReferenceProperty)

Read raw media data

You can access a media item by selecting the media reference property on the object. The signature for the method is as follows:

```typescript tab="TypeScript v1" // Blob is a standard JavaScript type, representing a file-like object of immutable, raw data. // https://developer.mozilla.org/en-US/docs/Web/API/Blob readAsync(): Promise;

### Get media metadata

You can access a media item's metadata. The signature for the method is as follows:

```typescript tab="TypeScript v1"
getMetadataAsync(): Promise<IMediaMetadata>;

Type guards

Type guards in TypeScript v1 allow you to access functionality that is specific to certain media types. The following type guards can be used on media item metadata:

  • isAudioMetadata()
  • isDicomMetadata()
  • isDocumentMetadata()
  • isImageryMetadata()
  • isSpreadsheetMetadata()
  • isUntypedMetadata()
  • isVideoMetadata()

As an example, you could use the imagery type guard to pull out image specific metadata fields:

```typescript tab="TypeScript v1" const metadata = await myObject.mediaReference?.getMetadataAsync(); if (isImageryMetadata(metadata)) { const imageWidth = metadata.dimensions?.width; ... }

You can also use type guards on the media item namespace, which then gives you access to more methods on the type-specific media item. The type guards you can use here are:

* `MediaItem.isAudio()`
* `MediaItem.isDicom()`
* `MediaItem.isDocument()`
* `MediaItem.isImagery()`
* `MediaItem.isSpreadsheet()`
* `MediaItem.isVideo()`

### Document-specific operations

#### Text extraction

To extract text from a document, you can either use optical character recognition (OCR) or extract embedded text on the media item.

For machine-generated PDFs, it may be faster and/or more accurate to extract text embedded digitally in the PDF rather than using optical character recognition (OCR). Below is an example of text extraction usage:

```typescript tab="TypeScript v1"
extractTextAsync(options: IDocumentExtractTextOptions): Promise<string[]>;

When using TypeScript v1, the following can optionally be provided as an object:

  • startPage: The zero-indexed start page (inclusive, can be empty)
  • endPage: The zero-indexed end page (exclusive, can be empty).

If both the startPage and endPage are left empty, the text for all pages in the document will be returned.

For non-machine-generated PDFs, it would be best to use the OCR method for extracting text.

```typescript tab="TypeScript v1" ocrAsync(options: IDocumentOcrOptions): Promise;

The following can optionally be provided as a TypeScript object:

* `startPage`: The zero-indexed start page (inclusive).
* `endPage`: The zero-indexed end page (exclusive).
* `languages`: A list of languages to recognize (can be empty).
* `scripts`: A list of scripts to recognize (can be empty).
* `outputType`: Specifies the output type as `text` or `hocr`.

Remember that you need to use type guards in order to access media-type specific operations. Here's an example of using the `isDocument()` type guard to then perform OCR text extraction:

```typescript tab="TypeScript v1"
import { MediaItem } from "@foundry/functions-api";
import { ArxivPaper } from "@foundry/ontology-api";

@Function()
public async firstPageText(paper: ArxivPaper): Promise<string | undefined> {
    if (MediaItem.isDocument(paper.mediaReference)) {
        const text = (await paper.mediaReference.ocrAsync({ endPage: 1, languages: [], scripts: [], outputType: 'text' }))[0];
        return text;
    }

    return undefined;
}

Audio-specific operations

Transcription

Audio media items support transcription using the transcribe method. The signature is as follows:

```typescript tab="TypeScript v1" transcribeAsync(options: IAudioTranscriptionOptions): Promise;

The following can optionally be passed in to specify how the transcription should run:

* `language`: The language to transcribe, passed using the `TranscriptionLanguage` enum.
* `performanceMode`: Runs transcriptions in `More Economical` or `More Performant` mode, passed using the `TranscriptionPerformanceMode` enum.
* `outputFormat`: Specifies the output format by passing an object of `type` `plainTextNoSegmentData` (plain text) or `pttml`. `pttml` is a [TTML-like ↗](https://en.wikipedia.org/wiki/Timed_Text_Markup_Language) format where the object also takes a Boolean `addTimestamps` parameter if the type is `plainTextNoSegmentData`.

An example of providing options for transcription:

```typescript tab="TypeScript v1"
import { Function, MediaItem, TranscriptionLanguage, TranscriptionPerformanceMode } from "@foundry/functions-api";
import { AudioFile } from "@foundry/ontology-api";

@Function()
public async transcribeAudioFile(file: AudioFile): Promise<string|undefined> {
    if (MediaItem.isAudio(file.mediaReference)) {
        return await file.mediaReference.transcribeAsync({
            language: TranscriptionLanguage.ENGLISH,
            performanceMode: TranscriptionPerformanceMode.MORE_ECONOMICAL,
            outputFormat: {type: "plainTextNoSegmentData", addTimestamps: true}
        });
    }

    return undefined;
}


中文翻译

媒体(Media)

函数(Functions)使您能够在 TypeScript v2PythonTypeScript v1 中访问和修改媒体。TypeScript v2 和 Python 使用 Media 类型来读取、上传和转换媒体,并通过 本体论编辑(Ontology edits) 和 OSDK 支持媒体上传。TypeScript v1 函数提供了一个 MediaItem 类型,其中包含用于处理不同类型媒体的内置操作,无需外部库。

如果您需要任何当前未内置的操作,您可能需要使用外部库或编写自己的自定义代码。了解有关向函数仓库添加依赖项的更多信息。

TypeScript v2 和 Python

使用本体论编辑函数(Ontology edit functions)上传媒体并在本体论(Ontology)中创建对象。上传后,您可以从对象中读取和下载媒体文件,以便在应用程序中使用。了解有关 Foundry 中媒体集(media sets)的更多信息。

您可以通过将媒体上传到本体论(Ontology)以获取 Media 实例,从而在 TypeScript v2 和 Python 函数中构建本体论编辑(Ontology edits)。Media 类型包装了一个 MediaReference,并公开了用于获取内容、获取元数据以及将媒体附加到对象的更高级别操作。您可以使用 Media 来构建本体论编辑(Ontology edit),或者将现有媒体作为参数传递给函数。

用作函数输入或输出类型

函数可以将 Media 作为输入,通过 uploadMedia 上传数据来创建临时媒体,或者从对象的媒体引用属性中检索 Media。函数也可以返回 Media 类型,无论它是临时上传的,还是来自对象的媒体引用属性。在函数中,您可以获取 Media 的字节内容、获取其元数据,或通过本体论编辑(Ontology edits)将其附加到本体论(Ontology)对象。在 Python 中,您还可以获取完整的按变体划分的元数据;在 TypeScript v2 中,fetchMetadata 目前仅公开高级字段(mediaTypesizeBytespath)。

```typescript tab="TypeScript v2" import type { Media } from "@osdk/client";

export default async function echoMedia(media: Media): Promise { return media; }

```python tab="Python"
from functions.api import function, Media
# Media 类型也可以从 foundry_sdk_runtime 导入
# from foundry_sdk_runtime.media import Media

@function
def echo_media(media: Media) -> Media:
    return media

上传媒体(Upload media)

使用 Ontology SDK 的 uploadMedia(TypeScript v2)和 client.ontology.media.upload_media(Python)辅助函数在函数内上传原始字节。两者都返回一个 Media,然后您可以使用本体论编辑(Ontology edit)来编辑本体论(Ontology)对象媒体属性,或从函数返回该 Media

```typescript tab="TypeScript v2" import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function uploadMediaItem( client: Client, body: string, fileName: string, ): Promise { const blob = new Blob([body], { type: "text/plain" }); const media: Media = await uploadMedia( client, { data: blob, fileName } ); return media; }

```python tab="Python"
from ontology_sdk import FoundryClient
from foundry_sdk_runtime.media import Media
from functions.api import function

@function(beta=True)
def upload_media(body: str, media_set_filename: str) -> Media:
    client = FoundryClient()
    media: Media = client.ontology.media.upload_media(
        body=body.encode("utf8"),
        filename=media_set_filename,
    )
    return media

```python tab="Python (async)" from ontology_sdk import FoundryClient from foundry_sdk_runtime.media import Media from functions.api import function

@function(beta=True) async def upload_media(body: str, media_set_filename: str) -> Media: client = FoundryClient() media_coroutine = client.ontology.media.async_upload_media( body=body.encode("utf8"), filename=media_set_filename, ) # media_coroutine 是可等待的(awaitable)。 return await media_coroutine

:::callout{theme="info"}
上传媒体是临时的,除非将其设置到本体论(Ontology)对象的媒体引用属性上。当应用本体论编辑(Ontology edits)时,媒体随后会持久化到本体论(Ontology)对象属性上。
:::

### 在本体论编辑函数(Ontology edit functions)中上传媒体

无论您是在函数内上传了媒体,还是将 `Media` 作为函数输入接收,您都可以更新现有本体论(Ontology)对象上的媒体属性,或使用 `Media` 参数创建新的本体论(Ontology)对象。

```typescript tab="TypeScript v2"
// 确保您使用的是 TypeScript OSDK 2.16 或更高版本

import type { Client, Media } from "@osdk/client";
import { Aircraft } from "@ontology-sdk/sdk";
import type { Edits } from "@osdk/functions";
import { createEditBatch, uploadMedia } from "@osdk/functions";

async function uploadTextToNewPlane(client: Client): Promise<Edits.Object<Aircraft>[]> {
    const batch = createEditBatch<Edits.Object<Aircraft>>(client);
    const blob = new Blob(["Hello, world"], { type: "text/plain" });
    const media: Media = await uploadMedia(
        client,
        { data: blob, fileName: "/planes/aircraft.txt" }
    );
    batch.create(Aircraft, { myMediaProperty: media, /* ... */ });
    return batch.getEdits();
}

export default uploadTextToNewPlane;

```python tab="Python"

确保您使用的是 Python OSDK 2.198 或更高版本

from ontology_sdk import FoundryClient from ontology_sdk.ontology.objects import Aircraft from functions.api import function, OntologyEdit from foundry_sdk_runtime.media import Media

@function(beta=True, edits=[Aircraft]) def upload_text_to_new_plane() -> list[OntologyEdit]: client = FoundryClient() edits = client.ontology.edits() media: Media = client.ontology.media.upload_media( body="Hello, world".encode("utf8"), filename="/planes/aircraft.txt", ) edits.objects.Aircraft.create( pk = "primary_key", my_media_property=media, # ... ) return edits.get_edits()

:::callout{theme="info"}
在 2.20 版本之前的 TypeScript OSDK 生成器中,`uploadMedia` 返回一个 `MediaReference`。从 2.20 版本开始,`uploadMedia` 返回一个 `Media`,它包装了底层的 `MediaReference`,并公开了更高级别的操作,如 `fetchContents`、`fetchMetadata` 和 `getMediaReference()`。您可以将 `Media` 直接传递给 `createEditBatch` 操作。
:::

### 在操作类型(action type)上传递媒体引用参数

媒体引用类型的操作参数可以作为参数传递给函数。

下面的截图显示了一个操作将其媒体参数传递给其支持函数。

<img src="./media/media-tutorial-media-action-parameter.png" alt = "action" width="600"/>

### 媒体 Ontology SDK 操作(Media Ontology SDK operations)

:::callout{theme="info"}
以下方法适用于任何 `Media` 实例,包括从 `upload_media` 返回的实例以及作为对象类型上的 `Media` 属性公开的实例。
:::

#### 检索媒体字节数据(Retrieve media bytes data)

您可以访问存储在 `Media` 上的原始数据。该方法的签名如下:

```typescript tab="TypeScript v2"
fetchContents(): Promise<Response>;

// "Response" 是 JavaScript Fetch API 上的标准接口
// https://developer.mozilla.org/en-US/docs/Web/API/Response
const mediaContents: Response = await myAircraft.myMediaProperty.fetchContents();

if (mediaContents.ok) {
    const mediaMimeType = mediaContents.headers.get("Content-Type");

    // Blob 是一种标准的 JavaScript 类型,表示一个类似文件的对象,包含不可变的原始数据。
    // https://developer.mozilla.org/en-US/docs/Web/API/Blob
    // https://developer.mozilla.org/en-US/docs/Web/API/Response/blob
    const mediaBlob: Blob = await mediaContents.blob();
}

```python tab="Python" get_media_content(self) -> BytesIO: ...

from io import BytesIO

https://docs.python.org/3/library/io.html#io.BytesIO

raw_data: BytesIO = my_aircraft.my_media_property.get_media_content()

### 获取媒体元数据(Get media metadata)

您可以检索 `Media` 的元数据:

```typescript tab="TypeScript v2"
fetchMetadata(): Promise<MediaMetadata>;

// 示例用法:
const mediaMetadata = await myAircraft.myMediaProperty.fetchMetadata();
const sizeBytes = mediaMetadata.sizeBytes;
const mediaType = mediaMetadata.mediaType;

```python tab="Python" from foundry_sdk_runtime.media import MediaMetadata

示例用法:

media_metadata: MediaMetadata = my_aircraft.my_media_property.get_media_metadata() path = media_metadata.path size_bytes = media_metadata.size_bytes media_type = media_metadata.media_type

在 Python 中,`get_media_full_metadata()` 返回一个 `MediaFullMetadata`,其 `item_metadata` 是媒体类型上的一个可区分联合(discriminated union)。缩小到变体类(或检查 `item_metadata.type`)以访问特定类型的字段:

```python tab="Python"
get_media_full_metadata(self) -> MediaFullMetadata: ...

# 缩小到变体类(或检查 item_metadata.type)以访问特定类型的字段。
# 其他变体包括 AudioMediaItemMetadata、VideoMediaItemMetadata、
# SpreadsheetMediaItemMetadata、Model3dMediaItemMetadata、DicomMediaItemMetadata、
# EmailMediaItemMetadata 和 UntypedMediaItemMetadata。查看完整模式:
# https://github.com/palantir/foundry-platform-python/blob/develop/docs/v2/MediaSets/models/MediaItemMetadata.md

from foundry_sdk.v2.media_sets.models import (
    DocumentMediaItemMetadata,
    ImageryMediaItemMetadata,
)
from foundry_sdk_runtime.media import MediaFullMetadata

full_metadata: MediaFullMetadata = my_aircraft.my_media_property.get_media_full_metadata()
item = full_metadata.item_metadata

if isinstance(item, DocumentMediaItemMetadata):
    page_count = item.pages
    title = item.title
elif isinstance(item, ImageryMediaItemMetadata):
    dimensions = item.dimensions
    bands = item.bands

转换媒体(Transform media)

:::callout{theme="warning"} 媒体转换(Media transformations)处于测试阶段。功能在活跃开发期间可能会发生变化。 :::

您可以转换媒体项目(例如旋转、调整大小或重新编码图像、切片或渲染 PDF 页面,或运行 OCR)并等待结果。转换作业被提交,轮询直至完成,并返回转换后的内容。

在 TypeScript v2 中,转换通过 @osdk/api/unstable 作为实验性辅助函数公开。在 Python 中,在生成的 FoundryClient 上调用 client.ontology.media.transform_and_wait。异步变体 async_transform_and_wait 接受相同的参数并且可以等待(await)。

``typescript tab="TypeScript v2" // 确保您使用的是 @osdk/api 2.8.0 或更高版本以使用 transformAndWait。 // "MediaTransformation" 是一个可区分联合(discriminated union): // 每个变体($image$video$audio$documentToText$documentToImage$documentToDocument$audioToText` 等) // 选择一种转换类型,具有其自己的编码和操作字段。 // 查看 "MediaTransformation" 类型定义以获取完整的变体和操作集: // https://github.com/palantir/osdk-ts/blob/main/packages/api/src/experimental/MediaTransformation.ts

import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function rotateImage( client: Client, media: Media, ): Promise { const transformation: MediaTransformation = { $image: { $encoding: "jpg", $operations: [{ $rotate: { $angle: "DEGREE_180" } }], }, };

const result: Response = await client(
    __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait,
).transformAndWait({
    mediaReference: media.getMediaReference(),
    transformation,
    options: { pollIntervalMs: 3000, pollTimeoutMs: 30000 },
});

if (!result.ok) {
    // 转换失败;检查 result.status / result.text() 以获取详细信息。
    throw new Error(`Transformation failed with status ${result.status}`);
}

// 重新上传转换后的字节,以便函数返回一个 Media。
return uploadMedia(client, { data: await result.blob(), fileName: "rotated.jpg" });

} python tab="Python" from foundry_sdk.v2.media_sets.models import ( ImageTransformation, JpgFormat, RotateImageOperation, ) from foundry_sdk_runtime.errors import ( MediaTransformationFailedError, MediaTransformationTimeoutError, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient @function(beta=True) def image_transform(document: Media) -> Media: client = FoundryClient() transformation = ImageTransformation( encoding=JpgFormat(), operations=[RotateImageOperation(angle="DEGREE_180")], ) try: transformed_bytes: bytes = client.ontology.media.transform_and_wait( media_reference=document.get_media_reference(), transformation=transformation, poll_interval_seconds=3.0, poll_timeout_seconds=30.0, ) except MediaTransformationFailedError: # 转换作业报告了 FAILED 状态。 raise except MediaTransformationTimeoutError: # 在作业完成之前 poll_timeout_seconds 已超时。 raise # 重新上传转换后的字节,以便函数返回一个 Media。 return client.ontology.media.upload_media(body=transformed_bytes, filename="rotated.jpg") ```

示例:在 PDF 上逐页运行 OCR 并输出边界框

此工作流获取一个 PDF(上传到媒体集或附加到对象),并在每一页上运行 OCR,请求 hOCR 输出。hOCR 是带有 bbox 属性的 HTML,这些属性位于每个检测到的单词和行上,因此您可以从同一响应中提取识别的文本及其边界框坐标。每次 transform_and_wait 调用返回一页的字节;迭代以覆盖整个文档。

```typescript tab="TypeScript v2" import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import type { Integer } from "@osdk/functions";

export default async function ocrPdfPages( client: Client, media: Media, pageCount: Integer, ): Promise { const transformAndWait = client( __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, ).transformAndWait; const mediaReference = media.getMediaReference();

const pageResults: string[] = [];
for (let pageNumber = 0; pageNumber < pageCount; pageNumber++) {
    const transformation: MediaTransformation = {
        $documentToText: {
            $operation: {
                $ocrOnPage: {
                    $pageNumber: pageNumber,
                    $parameters: {
                        $outputFormat: { $hocr: {} },
                        $languages: [{ $language: "ENG" }],
                    },
                },
            },
        },
    };

    const result = await transformAndWait({
        mediaReference,
        transformation,
        options: { pollTimeoutMs: 120_000 },
    });
    if (!result.ok) {
        throw new Error(`OCR failed on page ${pageNumber}: ${result.status}`);
    }
    pageResults.push(await result.text());
}
return pageResults;

} python tab="Python" from foundry_sdk.v2.media_sets.models import ( DocumentMediaItemMetadata, DocumentToTextTransformation, OcrHocrOutputFormat, OcrLanguageWrapper, OcrOnPageOperation, OcrParameters, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient @function(beta=True) def ocr_pdf_pages(document: Media) -> list[bytes]: """在 PDF 的每一页上运行 OCR,并返回每页的 hOCR 字节。 每个 hOCR 文档都包含检测到的单词、行和段落上的 bbox 属性; 使用任何 HTML 解析器在一次遍历中恢复文本和边界框。 """ client = FoundryClient() metadata = document.get_media_full_metadata().item_metadata if not isinstance(metadata, DocumentMediaItemMetadata) or metadata.pages is None: raise ValueError("Expected a PDF document with a known page count") media_reference = document.get_media_reference() page_results: list[bytes] = [] for page_number in range(metadata.pages): transformation = DocumentToTextTransformation( operation=OcrOnPageOperation( page_number=page_number, parameters=OcrParameters( output_format=OcrHocrOutputFormat(), languages=[OcrLanguageWrapper(language="ENG")], ), ), ) hocr_bytes: bytes = client.ontology.media.transform_and_wait( media_reference=media_reference, transformation=transformation, poll_timeout_seconds=120.0, ) page_results.append(hocr_bytes) return page_results ```

密集页面可能会将 OCR 运行时间推至远超默认函数超时时间。请参阅管理已发布的函数以配置函数执行超时。

示例:将 PDF 页面渲染为图像并切片范围

对于需要每页视觉渲染的工作流(用于下游图像注释、嵌入或显示),使用 $documentToImage$renderPage 获取特定页面的 PNG/JPG 图像。要将 PDF 的子范围提取为单独的 PDF 文档,请使用 $documentToDocument$slicePdfRange。下面的每个函数都会重新上传转换后的字节,以便它可以返回一个 Media。每个函数都是其自己的模块;注册的函数是模块的 export default

将单个页面渲染为 PNG 图像:

```typescript tab="TypeScript v2" import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function renderFirstPageAsPng( client: Client, media: Media, ): Promise { const transformation: MediaTransformation = { $documentToImage: { $encoding: "png", $operation: { $renderPage: { $pageNumber: 0, $width: 1200 } }, }, }; const result = await client( __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, ).transformAndWait({ mediaReference: media.getMediaReference(), transformation }); if (!result.ok) { throw new Error(Render failed: ${result.status}); } // 重新上传渲染后的页面,以便函数返回一个 Media。 return uploadMedia(client, { data: await result.blob(), fileName: "page.png" }); }

```python tab="Python"
from foundry_sdk.v2.media_sets.models import (
    DocumentToImageTransformation,
    PngFormat,
    RenderPageOperation,
)
from foundry_sdk_runtime.media import Media
from functions.api import function
from ontology_sdk import FoundryClient

@function(beta=True)
def render_first_page_as_png(document: Media) -> Media:
    """将 PDF 的第 0 页渲染为 1200px 宽的 PNG,并将其作为 Media 返回。"""
    client = FoundryClient()
    transformation = DocumentToImageTransformation(
        encoding=PngFormat(),
        operation=RenderPageOperation(page_number=0, width=1200),
    )
    rendered_png: bytes = client.ontology.media.transform_and_wait(
        media_reference=document.get_media_reference(),
        transformation=transformation,
    )
    # 重新上传渲染后的页面,以便函数返回一个 Media。
    return client.ontology.media.upload_media(body=rendered_png, filename="page.png")

将页面范围切片为新的 PDF 文档:

```typescript tab="TypeScript v2" import { __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, type MediaTransformation, } from "@osdk/api/unstable"; import type { Client, Media } from "@osdk/client"; import { uploadMedia } from "@osdk/functions";

export default async function sliceFirstTenPages( client: Client, media: Media, ): Promise { const transformation: MediaTransformation = { $documentToDocument: { $encoding: "pdf", $operation: { $slicePdfRange: { $startPageInclusive: 0, $endPageExclusive: 10, $strictlyEnforceEndPage: false, }, }, }, }; const result = await client( __EXPERIMENTAL__NOT_SUPPORTED_YET__transformAndWait, ).transformAndWait({ mediaReference: media.getMediaReference(), transformation }); if (!result.ok) { throw new Error(Slice failed: ${result.status}); } // 重新上传切片后的 PDF,以便函数返回一个 Media。 return uploadMedia(client, { data: await result.blob(), fileName: "slice.pdf" }); }

```python tab="Python"
from foundry_sdk.v2.media_sets.models import (
    DocumentToDocumentTransformation,
    PdfFormat,
    SlicePdfRangeOperation,
)
from foundry_sdk_runtime.media import Media
from functions.api import function
from ontology_sdk import FoundryClient

@function(beta=True)
def slice_first_ten_pages(document: Media) -> Media:
    """返回一个包含输入 PDF 第 0-9 页的新 PDF,作为 Media。"""
    client = FoundryClient()
    transformation = DocumentToDocumentTransformation(
        encoding=PdfFormat(),
        operation=SlicePdfRangeOperation(
            start_page_inclusive=0,
            end_page_exclusive=10,
            strictly_enforce_end_page=False,  # 容忍少于 10 页的文档
        ),
    )
    sliced_pdf: bytes = client.ontology.media.transform_and_wait(
        media_reference=document.get_media_reference(),
        transformation=transformation,
    )
    # 重新上传切片后的 PDF,以便函数返回一个 Media。
    return client.ontology.media.upload_media(body=sliced_pdf, filename="slice.pdf")

示例:使用检测到的边界框注释每一页

要生成可视化调试输出(每个 PDF 页面都绘制了其 OCR 检测到的边界框),请为每一页链接三个转换。对于每一页,将页面渲染为图像,对同一页进行 OCR 以恢复单词/行边界框,然后重新上传渲染后的图像并使用 $image.$annotate 对其进行注释。页数来自 get_media_full_metadata(),目前仅在 Python 中可用。每个步骤调用 transform_and_wait 并将上一步的字节作为新的上传输入下一步,并且每个注释页面都被重新上传,以便函数为每页返回一个 Media

```python tab="Python" from foundry_sdk.v2.media_sets.models import ( AnnotateImageOperation, Annotation, BoundingBox, BoundingBoxGeometry, DocumentMediaItemMetadata, DocumentToImageTransformation, DocumentToTextTransformation, ImageTransformation, OcrHocrOutputFormat, OcrLanguageWrapper, OcrOnPageOperation, OcrParameters, PngFormat, RenderPageOperation, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient

@function(beta=True) def annotate_pdf_with_ocr_boxes(document: Media) -> list[Media]: """渲染 PDF 的每一页,对每一页进行 OCR 以查找文本边界框, 将它们绘制在渲染的图像上,并为每页返回一个注释后的 Media。""" client = FoundryClient() media_reference = document.get_media_reference()

# 使用完整元数据(仅限 Python)来发现页数。
metadata = document.get_media_full_metadata().item_metadata
if not isinstance(metadata, DocumentMediaItemMetadata) or metadata.pages is None:
    raise ValueError("Expected a PDF document with a known page count")

annotated_pages: list[Media] = []
for page_number in range(metadata.pages):
    # 1. 将页面渲染为 PNG。
    rendered_png: bytes = client.ontology.media.transform_and_wait(
        media_reference=media_reference,
        transformation=DocumentToImageTransformation(
            encoding=PngFormat(),
            operation=RenderPageOperation(page_number=page_number, width=1200),
        ),
    )

    # 2. 以 hOCR 模式对同一页进行 OCR 以获取单词级边界框。
    hocr_bytes: bytes = client.ontology.media.transform_and_wait(
        media_reference=media_reference,
        transformation=DocumentToTextTransformation(
            operation=OcrOnPageOperation(
                page_number=page_number,
                parameters=OcrParameters(
                    output_format=OcrHocrOutputFormat(),
                    languages=[OcrLanguageWrapper(language="ENG")],
                ),
            ),
        ),
        poll_timeout_seconds=120.0,
    )

    # 3. 解析 hOCR 以获取图像像素中的边界框。
    # parse_hocr_bounding_boxes 辅助函数在此省略;请参阅示例下方的注释。
    boxes: list[tuple[str, BoundingBox]] = parse_hocr_bounding_boxes(hocr_bytes)

    # 4. 重新上传渲染后的 PNG 作为临时媒体项。
    rendered_media = client.ontology.media.upload_media(
        body=rendered_png, filename=f"page-{page_number}.png"
    )

    # 5. 使用 Media 转换注释渲染后的页面。
    annotated_bytes: bytes = client.ontology.media.transform_and_wait(
        media_reference=rendered_media.get_media_reference(),
        transformation=ImageTransformation(
            encoding=PngFormat(),
            operations=[
                AnnotateImageOperation(
                    annotations=[
                        Annotation(
                            geometry=BoundingBoxGeometry(bounding_box=box),
                            label=label,
                        )
                        for label, box in boxes
                    ],
                ),
            ],
        ),
    )

    # 6. 重新上传注释后的页面,以便函数返回一个 Media。
    annotated_pages.append(
        client.ontology.media.upload_media(
            body=annotated_bytes, filename=f"page-{page_number}-annotated.png"
        )
    )

return annotated_pages

python tab="Python (async)" import asyncio from foundry_sdk.v2.media_sets.models import ( AnnotateImageOperation, Annotation, BoundingBox, BoundingBoxGeometry, DocumentMediaItemMetadata, DocumentToImageTransformation, DocumentToTextTransformation, ImageTransformation, OcrHocrOutputFormat, OcrLanguageWrapper, OcrOnPageOperation, OcrParameters, PngFormat, RenderPageOperation, ) from foundry_sdk_runtime.media import Media from functions.api import function from ontology_sdk import FoundryClient @function(beta=True) async def annotate_pdf_with_ocr_boxes(document: Media) -> list[Media]: """渲染 PDF 的每一页,对每一页进行 OCR,注释它,并为每页返回一个 Media。""" client = FoundryClient() media_reference = document.get_media_reference() metadata = document.get_media_full_metadata().item_metadata if not isinstance(metadata, DocumentMediaItemMetadata) or metadata.pages is None: raise ValueError("Expected a PDF document with a known page count") async def annotate_page(page_number: int) -> Media: # 并发地将页面渲染为 PNG 并对同一页进行 OCR。 # 两个转换都从同一个源文档读取并且是独立的, # 因此 asyncio.gather 允许它们并行轮询,而不是一个接一个。 rendered_png, hocr_bytes = await asyncio.gather( client.ontology.media.async_transform_and_wait( media_reference=media_reference, transformation=DocumentToImageTransformation( encoding=PngFormat(), operation=RenderPageOperation(page_number=page_number, width=1200), ), ), client.ontology.media.async_transform_and_wait( media_reference=media_reference, transformation=DocumentToTextTransformation( operation=OcrOnPageOperation( page_number=page_number, parameters=OcrParameters( output_format=OcrHocrOutputFormat(), languages=[OcrLanguageWrapper(language="ENG")], ), ), ), poll_timeout_seconds=120.0, ), ) # 解析 hOCR 以获取边界框(请参阅同步示例)并重新上传 # 渲染后的 PNG 作为临时媒体项,两者并发进行。 boxes, rendered_media = await asyncio.gather( async_parse_hocr_bounding_boxes(hocr_bytes), client.ontology.media.async_upload_media( body=rendered_png, filename=f"page-{page_number}.png", ), ) # 使用 Media 转换注释渲染后的页面。 annotated_bytes: bytes = await client.ontology.media.async_transform_and_wait( media_reference=rendered_media.get_media_reference(), transformation=ImageTransformation( encoding=PngFormat(), operations=[ AnnotateImageOperation( annotations=[ Annotation( geometry=BoundingBoxGeometry(bounding_box=box), label=label, ) for label, box in boxes ], ), ], ), ) # 重新上传注释后的页面,以便函数返回一个 Media。 return await client.ontology.media.async_upload_media( body=annotated_bytes, filename=f"page-{page_number}-annotated.png", ) # 并发处理每一页。 return list(await asyncio.gather(*(annotate_page(p) for p in range(metadata.pages)))) ```

parse_hocr_bounding_boxes 辅助函数在此省略。任何 HTML 解析器(例如 lxmlBeautifulSoup)都可以提取 class="ocrx_word" 元素及其 title="bbox X1 Y1 X2 Y2 ..." 属性,您可以将其转换为 BoundingBox(left=X1, top=Y1, width=X2-X1, height=Y2-Y1)

TypeScript v1

:::callout{theme="warning"} Foundry 在执行 TypeScript v1 函数时会实施严格的内存限制。为确保您不超过这些内存限制,您只应与 20MB 以下的媒体文件进行交互。 :::

:::callout{theme="warning"} TypeScript v1 不支持在函数内上传媒体。下面的示例涵盖了将现有媒体传递到本体论编辑(Ontology edits)以及对对象类型的媒体属性进行操作。 :::

在对象上设置现有媒体(Setting existing media on an object)

使用本体论编辑函数(Ontology edit functions)将现有媒体项附加到对象:

```typescript tab="TypeScript v1" import { OntologyEditFunction, MediaItem } from "@foundry/functions-api"; import { Aircraft } from "@foundry/ontology-api";

export class MyFunctions { @OntologyEditFunction() public async setExistingMediaToObject( aircraft: Aircraft, mediaItem: MediaItem ): Promise { // 支持使用传入的 MediaItems 进行本体论编辑(Ontology Edits) aircraft.myMediaProperty = mediaItem; } }

### 对象类型上的媒体项参数(Media item parameter on object types)

以下示例显示了对象类型的媒体引用属性上的 `isAudio` 媒体操作:

```typescript tab="TypeScript v1"
MediaItem.isAudio(objectType.mediaReferenceProperty)

读取原始媒体数据(Read raw media data)

您可以通过选择对象上的媒体引用属性来访问媒体项。该方法的签名如下:

```typescript tab="TypeScript v1" // Blob 是一种标准的 JavaScript 类型,表示一个类似文件的对象,包含不可变的原始数据。 // https://developer.mozilla.org/en-US/docs/Web/API/Blob readAsync(): Promise;

### 获取媒体元数据(Get media metadata)

您可以访问媒体项的元数据。该方法的签名如下:

```typescript tab="TypeScript v1"
getMetadataAsync(): Promise<IMediaMetadata>;

类型守卫(Type guards)

TypeScript v1 中的类型守卫允许您访问特定于某些媒体类型的功能。以下类型守卫可用于媒体项元数据:

  • isAudioMetadata()
  • isDicomMetadata()
  • isDocumentMetadata()
  • isImageryMetadata()
  • isSpreadsheetMetadata()
  • isUntypedMetadata()
  • isVideoMetadata()

例如,您可以使用图像类型守卫来提取图像特定的元数据字段:

typescript tab="TypeScript v1" const metadata = await myObject.mediaReference?.getMetadataAsync(); if (isImageryMetadata(metadata)) { const imageWidth = metadata.dimensions?.width; ... }

您还可以在媒体项命名空间上使用类型守卫,这使您可以访问特定类型媒体项上的更多方法。您可以在此处使用的类型守卫有:

  • MediaItem.isAudio()
  • MediaItem.isDicom()
  • MediaItem.isDocument()
  • MediaItem.isImagery()
  • MediaItem.isSpreadsheet()
  • MediaItem.isVideo()

文档特定操作(Document-specific operations)

文本提取(Text extraction)

要从文档中提取文本,您可以使用光学字符识别 (OCR) 或提取媒体项上嵌入的文本。

对于机器生成的 PDF,提取 PDF 中数字嵌入的文本可能