Azure OpenAIでGPT-Image-1をPythonのopenaiライブラリから使用する方法まとめ【注意点・トークン消費検証】

はじめに

GPT-Image-1は、OpenAIが提供する最新の画像生成モデルです。

この記事では、PythonのopenaiライブラリからGPT-Image-1を使用するにあたって導入から実装、料金や注意点まで体系的に整理します。

GPT-Image-1の概要

GPT-Image-1は、テキストからの画像生成や既存画像の編集を行うことができる画像生成モデルです。
以下の2つの機能が提供されています。

Image Genaration (画像生成)
Image Edit (画像編集)

OpenAI公式: GPT-Image-1

Image Genaration (画像生成)

入力したテキストプロンプトを元に画像を生成するAPIです。
gpt-image-1ではストリーミングを設定することができ、ストリーミングを有効にすると生成途中の中間画像を生成させることができます。

中間画像

完成画像

Image Edit (画像編集)

既存の画像に対して、画像の編集し、新たな画像を生成することが可能です。

入力したプロンプトにもとづき画像を編集 (例:背景を赤色にして、人物画像を追加して)
複数の入力画像を合成して、新たな画像を生成 (例: 2つの画像を合成して)
入力画像とともにマスク画像を指定することで、マスク画像に表示されている画像のみを編集対象にできる (例: 背景は同じまま、男性の画像を女性に変更する)
マスク画像は入力画像の一部を透過したもので透明になっている部分のみが編集対象

以下にImage Editでマスク画像を使って既存の画像を編集した例を示します。

元画像

マスク画像

編集画像

料金体系

Azure OpenAIでは以下の料金体系で提供されています。
GPT-5と比較すると入力テキストのトークンの料金も4倍に上がっています。
※GPT-Image-1 Globalの料金

モデル名	バージョン	提供状態	入力テキスト料金[$/1Mトークン]	入力画像[$/1Mトークン]	出力料金[$/1Mトークン]
GPT-Image-1	gpt-image-1	GA	5	10	40
GPT-5	gpt-5-2025-08-07	GA	1.250	-	2.000
GPT-5-mini	gpt-5-mini-2025-08-07	GA	0.025	-	2.000
GPT-5-nano	gpt-5-nano-2025-08-07	GA	0.005	-	0.400

クォリティ別 1画像出力当たり推定料金

low（入力テキストトークンを20, 画像出力トークンを280トークンと仮定）
medium（入力テキストトークンを20, 画像出力トークンを1080トークンと仮定）
high（入力テキストトークンを20, 画像出力トークンを4180トークンと仮定）

クォリティ	入力トークン数	出力トークン数	入力料金（$）	出力料金（$）	合計料金（$）
low	20	280	0.0001	0.0112	0.0113
medium	20	1080	0.0001	0.0432	0.0433
high	20	4180	0.0001	0.1672	0.1673

1画像あたりの推定コストは約0.01〜0.17ドル（1.5〜25円程度、1ドル=150円換算)

Azure OpenAI価格

設定可能な主なパラメータ

英語名	日本語名	型・指定例	説明	備考
`prompt`	プロンプト	`str`	生成したい画像内容のテキスト説明	必須項目
`background`	背景	`"transparent"` / `"opaque"` / `"auto"`	背景の透明度（透明／不透明／自動）
`moderation`	モデレーション	`"low"` / `"auto"`	コンテンツフィルタの強さ（強度）	オフにはできない。安全システムによる拒否の可能性あり
`n`	枚数	`int`（1～10）	生成する画像の枚数	最大10枚まで
`output_compression`	出力圧縮	`int`（0～100）	画像の圧縮率（jpeg出力時のみ）
`output_format`	出力フォーマット	`"png"` / `"jpeg"`	出力画像のファイル形式
`quality`	クオリティ	`"low"` / `"medium"` / `"high"` / `"auto"`	画像品質：計算リソースの使用量と描画精度
`input_fidelity`	入力忠実度	`"low"` / `"medium"` / `"high"` / `"auto"`	既存の画像の顔の特徴や被写体の質感などをどれぐらい保持させるか
`size`	解像度	`'1024x1024'`, `'1024x1536'`, `'1536x1024'`など	画像の解像度
`stream`	ストリーム	`bool`（`True` / `False`）	ストリーミングで画像を受け取るかどうか	ストリーミング時は部分画像の送信が可能
`partial_images`	部分画像	`int`（0～3）	ストリーミング時に部分画像を何回送るか

注意点

Azure OpenAIのGPT-Image-1のデフォルのトークンクォーターは20Kが上限なので使い過ぎるとクォーター上限に達する
- 実際に使用してみたところ消費されるトークン数は以下の通り
- クォリティ:lowで1画像当たり約300トークン
- クォリティ:mediumで1画像当たり約1100トークン
- クォリティ:highで1画像当たり約4200トークン
Edit API使用時はデプロイメント名も、モデル名を同一のgpt-image-1に設定しないとAPIがエラー終了する

'openai.BadRequestError: Error code: 400 - {\'error\': {\'message\': "Model not supported with Responses API. Supported models are: [\'gpt-image-1\', \'gpt-image-0721-mini-alpha\']", \'type\': \'invalid_request_error\', \'param\': None, \'code\': None}}\n']

response_formatは指定できず、base64での受信のみ (DALL-EのようなURL形式での受信は非対応)

unknown parameter: 'response_format'.", 'type': 'invalid_request_error', 'param': 'response_format', 'code': 'unknown_parameter'

moderationはAzureのコンテンツフィルターとは別に設定されており、安全性の観点からかオフにはできない模様で、不適切と判断されたリクエストは拒否されることがあります

### ストリーミングなしの場合

openai.BadRequestError: Error code: 400 - {'error': {'message': 'Your request was rejected by the safety system. If you believe this is an error, contact us at oai-support@microsoft.com and include the request ID xxxxxx. safety_violations=[illicit].', 'type': 'image_generation_user_error', 'param': None, 'code': 'moderation_blocked'}} 

### ストリーミングの場合
openai.APIError: Your request was rejected by the safety system. If you believe this is an error, contact us at oai-support@microsoft.com and include the request ID xxxxxx."

Python コードからAPIを実行

以下の2ファイルを用意します。

requirements.txt

azure-identity
aiohttp
openai

main.py

対話形式で画像生成に必要なパラメーターを設定し、画像を生成することができます。 generate(画像生成)と、edit(画像編集) の2種類の選択が可能です。 TODOのOpenAIリソース名とAPIキーに自身の環境の値を設定してください。

import asyncio
import base64
import traceback
import time
import os
from typing import Union
from openai import (
    AsyncAzureOpenAI,
    BadRequestError,
    APIError,
)
from azure.identity.aio import (
    DefaultAzureCredential,
    get_bearer_token_provider
)

SCOPES = "https://cognitiveservices.azure.com/.default"
OPENAI_RESOURCE_NAME = "TODO"
ENDPOINT = f"https://{OPENAI_RESOURCE_NAME}.openai.azure.com/"
MODEL_DEPLOYMENT_NAME = "gpt-image-1"
API_VERSION = "2025-04-01-preview"
API_KEY = "TODO"
OUTPUT = "./output"
BASENAME = "image"
PROMPT = "生成AIチャットボットのアイコンを作成せよ"
EDIT_PROMPT = "2つの画像を組み合わせて、背景の色を灰色にして"
MASK_PROMPT = "笑顔の犬"

async def generate_image(
        user_prompt: str,
        background: str = "auto",
        moderation: str = "low",
        output_compression: int = 100,
        output_format: str = "png",
        output_file_basename: str = "image",
        quality: str = "low",
        timeout: int = 600,
        n: int = 1,
        size: str = "1024x1024",
        stream_flag: bool = False,
        partial_images: int = 0
    ):
    """画像生成
    Args:
        user_prompt (str): ユーザープロンプト
        background (str, optional): 透明 or 不透明 or 自動. Defaults to "auto".
        moderation (str, optional): 生成画像のモデレーションの厳しさ. Defaults to "low".
        output_compression (int, optional): 0-100, jpegの場合のみ有効. Defaults to 100.
        output_format (str, optional): png or jpeg. Defaults to "png".
        output_file_basename (str, optional): 生成画像のファイル名ベース. Defaults to "image".
        quality (str, optional): low, medium, high, auto. Defaults to "low".
        timeout (int, optional): タイムアウト秒数. Defaults to 600.
        n (int, optional): 生成する画像の数: 1 - 10. Defaults to 1.
        size (str, optional): auto, 1024x1024, 1024x1536, 1536x1024. Defaults to "1024x1024".
        stream_flag (bool, optional): ストリーミングフラグ. Defaults to False.
        partial_images (int, optional): 生成する部分画像の数 <=3 (streamingの場合のみ有���). Defaults to 0.
    Returns:
        list: 生成された画像のファイル名リスト
    """
    print("Begin to generate image")

    image_file_names = []

    credential = DefaultAzureCredential()
    # noinspection PyTypeChecker
    token_provider = get_bearer_token_provider(
        credential,
        SCOPES
    )
    client = AsyncAzureOpenAI(
        azure_endpoint=ENDPOINT,
        api_key=API_KEY,
        api_version=API_VERSION
    )

    # 実行時間計測開始
    begin_time = time.time()
    total_tokens = 0

    try:
        if stream_flag:
            print("Streaming mode")
            file_basename = f"{output_file_basename}_streaming_{quality}"
            # noinspection PyTypeChecker
            response = await client.images.generate(
                model=MODEL_DEPLOYMENT_NAME,
                prompt=user_prompt,
                background=background,
                moderation=moderation,
                output_compression=output_compression,
                output_format=output_format,
                quality=quality,
                timeout=timeout,
                size=size,
                stream=stream_flag,
                n=n, # 生成する画像の数: streamingの場合は1のみ
                partial_images=partial_images # streamingの場合のみ生成する部分画像の数 <=3
            )
            print(f"response [{type(response)}], attributes: [{response.__dict__.keys()}]")
            image_size = 0
            # openai.AsyncStream
            async for event in response:
                created_at = event.created_at
                print(f"[{type(event)}]: event_type:[{event.type}]")

                # openai.types.image_gen_partial_image_event.ImageGenPartialImageEvent
                if event.type == "image_generation.partial_image":
                    idx = event.partial_image_index
                    image_base64 = event.b64_json
                    image_bytes = base64.b64decode(image_base64)
                    image_size = len(image_bytes)
                    file_name = f"{file_basename}_{idx}.{output_format}"
                    with open(file_name, "wb") as f:
                        f.write(image_bytes)
                        print(f"No [{idx}] image saved as [{file_name}], size: [{image_size}] bytes")
                    image_file_names.append(file_name)

                # openai.types.image_gen_completed_event.ImageGenCompletedEvent
                elif event.type == "image_generation.completed":
                    image_base64 = event.b64_json
                    image_bytes = base64.b64decode(image_base64)
                    image_size = len(image_bytes)
                    file_name = f"{file_basename}.{output_format}"
                    with open(file_name, "wb") as f:
                        f.write(image_bytes)
                        print(f"Last image saved as [{file_name}], size: [{image_size}] bytes")
                    image_file_names.append(file_name)

                    # トークン使用量
                    input_token = event.usage.input_tokens
                    input_image_token = event.usage.input_tokens_details.image_tokens
                    input_text_token = event.usage.input_tokens_details.text_tokens
                    output_token = event.usage.output_tokens
                    total_tokens = input_token + output_token
                    print(
                        f"[{created_at}], quality: [{quality}], size: [{size}], compression: [{output_compression}], input_token: [{input_token}], input_image_token: [{input_image_token}], input_text_token: [{input_text_token}], output_token: [{output_token}], total_tokens: [{total_tokens}], image_size: [{image_size}] bytes")
        else:
            print("Non streaming mode")
            file_basename = f"{output_file_basename}_non_streaming_{quality}"
            # openai.types.images_response.ImagesResponse
            # noinspection PyTypeChecker
            response = await client.images.generate(
                model=MODEL_DEPLOYMENT_NAME,
                prompt=user_prompt,
                background=background,
                moderation=moderation,
                output_compression=output_compression,
                output_format=output_format,
                quality=quality,
                timeout=timeout,
                n=n,
                size=size,
                stream=stream_flag
            )
            print(f"response [{type(response)}], attributes: [{response.__dict__.keys()}]")
            created = response.created
            input_token = response.usage.input_tokens
            input_image_token = response.usage.input_tokens_details.image_tokens
            input_text_token = response.usage.input_tokens_details.text_tokens
            output_token = response.usage.output_tokens
            total_tokens = input_token + output_token

            image_data_list = response.data
            for idx, image in enumerate(image_data_list):
                image_base64 = image.b64_json
                image_bytes = base64.b64decode(image_base64)
                image_size = len(image_bytes)
                file_name = f"{file_basename}_{idx}.{output_format}"
                with open(file_name, "wb") as f:
                    f.write(image_bytes)
                    print(f"Image [{idx}] saved as [{file_name}]")
                print(f"[{created}], No: [{idx}], quality: [{response.quality}], size: [{response.size}], compression: [{output_compression}], input_token: [{input_token}], input_image_token: [{input_image_token}], input_text_token: [{input_text_token}], output_token: [{output_token}], total_tokens: [{total_tokens}], image_size: [{image_size}] bytes")
                image_file_names.append(file_name)

    except BaseException as e:
        stack_traces = list(traceback.TracebackException.from_exception(e).format())
        exception_message = stack_traces[-1].replace("\n", "")

        if isinstance(e, (BadRequestError, APIError)):
            # コンテンツフィルターでエラーになった場合
            if exception_message.find("Your request was rejected by the safety system") != -1:
                print(f"Warning: Your request was rejected by the safety system. <{user_prompt}>")
            else:
                print(f"BadRequestError: [{exception_message}] <{stack_traces}>")
        else:
            print(f"Error: [{exception_message}] <{stack_traces}>")

    finally:
        # 実行時間計測終了
        end_time = time.time()
        execution_time = end_time - begin_time
        await client.close()
        await credential.close()
        print("Client closed")
        print(f"End to generate image. execution time: [{execution_time}] seconds, list [{image_file_names}], total_tokens: [{total_tokens}]")

# noinspection PyTypeChecker
async def edit_image(
        image: list,
        user_prompt: str,
        background: str = "auto",
        output_compression: int = 100,
        output_format: str = "png",
        output_file_basename: str = "image",
        input_fidelity: str ="low",
        quality: str = "low",
        timeout: int = 600,
        n: int = 1,
        size: str = "1024x1024",
        stream_flag: bool = False,
        partial_images: int = 0,
        mask: Union[tuple, str, bytes, bytearray, None] = None
):
    """画像編集
    Args:
        image: Union[FileTypes, SequenceNotStr[FileTypes]]: 画像ファイル or 画像ファイルリスト
        user_prompt (str): ユーザープロンプト
        background (str, optional): 透明 or 不透明 or 自動. Defaults to "auto".
        output_compression (int, optional): 0-100, jpegの場合のみ有効. Defaults to 100.
        output_format (str, optional): png or jpeg. Defaults to "png".
        output_file_basename (str, optional): 生成画像のファイル名ベース. Defaults to "image".
        input_fidelity (str, optional): 入力画像の史実度 low, high. Defaults to "low".
        quality (str, optional): low, medium, high, auto. Defaults to "low".
        timeout (int, optional): タイムアウト秒数. Defaults to 600.
        n (int, optional): 生成する画像の数: 1 - 10. Defaults to 1.
        size (str, optional): auto, 1024x1024, 1024x1536, 1536x1024. Defaults to "1024x1024".
        stream_flag (bool, optional): ストリーミングフラグ. Defaults to False.
        partial_images (int, optional): 生成する部分画像の数 <=3 (streamingの場合のみ有効). Defaults to 0.
        mask (FileTypes | NotGiven, optional): マスク画像ファイル. Defaults to NOT_GIVEN.
    Returns:
        list: 生成された画像のファイル名リスト
    """
    print("Begin to edit image")

    image_file_names = []

    credential = DefaultAzureCredential()
    # noinspection PyTypeChecker
    token_provider = get_bearer_token_provider(
        credential,
        SCOPES
    )
    client = AsyncAzureOpenAI(
        azure_endpoint=ENDPOINT,
        api_key=API_KEY,
        api_version=API_VERSION
    )

    # 実行時間計測開始
    begin_time = time.time()
    total_tokens = 0

    try:
        if stream_flag:
            print("Streaming mode")
            file_basename = f"{output_file_basename}_streaming_{quality}_{input_fidelity}"
            print(f"file_basename: {file_basename}, mask: {mask is not None}")
            if mask is None:
                # noinspection PyTypeChecker
                response = await client.images.edit(
                    model=MODEL_DEPLOYMENT_NAME,
                    prompt=user_prompt,
                    background=background,
                    image=image,
                    input_fidelity=input_fidelity,
                    output_compression=output_compression,
                    output_format=output_format,
                    quality=quality,
                    timeout=timeout,
                    size=size,
                    stream=stream_flag,
                    n=n, # 生成する画像の数: streamingの場合は1のみ
                    partial_images=partial_images # streamingの場合のみ生成する部分画像の数 <=3
                )
            else:
                # noinspection PyTypeChecker
                response = await client.images.edit(
                    model=MODEL_DEPLOYMENT_NAME,
                    prompt=user_prompt,
                    background=background,
                    image=image,
                    input_fidelity=input_fidelity,
                    output_compression=output_compression,
                    output_format=output_format,
                    quality=quality,
                    timeout=timeout,
                    size=size,
                    stream=stream_flag,
                    n=n, # 生成する画像の数: streamingの場合は1のみ
                    partial_images=partial_images, # streamingの場合のみ生成する部分画像の��� <=3
                    mask=mask
                )
            print(f"response [{type(response)}], attributes: [{response.__dict__.keys()}]")
            image_size = 0
            # openai.AsyncStream
            async for event in response:
                created_at = event.created_at
                print(f"[{type(event)}]: event_type:[{event.type}]")

                # openai.types.image_edit_partial_image_event.ImageGenPartialImageEvent
                if event.type == "image_edit.partial_image":
                    idx = event.partial_image_index
                    image_base64 = event.b64_json
                    image_bytes = base64.b64decode(image_base64)
                    image_size = len(image_bytes)
                    file_name = f"{file_basename}_{idx}.{output_format}"
                    with open(file_name, "wb") as f:
                        f.write(image_bytes)
                        print(f"No [{idx}] image saved as [{file_name}], size: [{image_size}] bytes")
                    image_file_names.append(file_name)

                # openai.types.image_edit_completed_event.ImageGenCompletedEvent
                elif event.type == "image_edit.completed":
                    image_base64 = event.b64_json
                    image_bytes = base64.b64decode(image_base64)
                    image_size = len(image_bytes)
                    file_name = f"{file_basename}.{output_format}"
                    with open(file_name, "wb") as f:
                        f.write(image_bytes)
                        print(f"Last image saved as [{file_name}], size: [{image_size}] bytes")
                    image_file_names.append(file_name)

                    # トークン使用量
                    input_token = event.usage.input_tokens
                    input_image_token = event.usage.input_tokens_details.image_tokens
                    input_text_token = event.usage.input_tokens_details.text_tokens
                    output_token = event.usage.output_tokens
                    total_tokens = input_token + output_token
                    print(
                        f"[{created_at}], quality: [{quality}], size: [{size}], compression: [{output_compression}], input_token: [{input_token}], input_image_token: [{input_image_token}], input_text_token: [{input_text_token}], output_token: [{output_token}], total_tokens: [{total_tokens}], image_size: [{image_size}] bytes")
        else:
            print("Non streaming mode")
            file_basename = f"{output_file_basename}_non_streaming_{quality}_{input_fidelity}"
            print(f"file_basename: {file_basename}, mask: {mask is not None}")
            # openai.types.images_response.ImagesResponse
            # noinspection PyTypeChecker
            if mask is None:
                response = await client.images.edit(
                    model=MODEL_DEPLOYMENT_NAME,
                    prompt=user_prompt,
                    background=background,
                    image=image,
                    input_fidelity=input_fidelity,
                    output_compression=output_compression,
                    output_format=output_format,
                    quality=quality,
                    timeout=timeout,
                    size=size,
                    stream=stream_flag,
                    n=n # 生成する画像の数: streamingの場合は1のみ
                )
            else:
                response = await client.images.edit(
                    model=MODEL_DEPLOYMENT_NAME,
                    prompt=user_prompt,
                    background=background,
                    image=image,
                    input_fidelity=input_fidelity,
                    output_compression=output_compression,
                    output_format=output_format,
                    quality=quality,
                    timeout=timeout,
                    size=size,
                    stream=stream_flag,
                    n=n, # 生成する画像の数: streamingの場合は1のみ
                    mask=mask
                )
            print(f"response [{type(response)}], attributes: [{response.__dict__.keys()}]")
            created = response.created
            input_token = response.usage.input_tokens
            input_image_token = response.usage.input_tokens_details.image_tokens
            input_text_token = response.usage.input_tokens_details.text_tokens
            output_token = response.usage.output_tokens
            total_tokens = input_token + output_token

            image_data_list = response.data
            for idx, image in enumerate(image_data_list):
                image_base64 = image.b64_json
                image_bytes = base64.b64decode(image_base64)
                image_size = len(image_bytes)
                file_name = f"{file_basename}_edit_{quality}_{idx}.{output_format}"
                with open(file_name, "wb") as f:
                    f.write(image_bytes)
                    print(f"Image [{idx}] saved as [{file_name}]")
                print(f"[{created}], No: [{idx}], quality: [{response.quality}], size: [{response.size}], compression: [{output_compression}], input_token: [{input_token}], input_image_token: [{input_image_token}], input_text_token: [{input_text_token}], output_token: [{output_token}], total_tokens: [{total_tokens}], image_size: [{image_size}] bytes")
                image_file_names.append(file_name)

    except BaseException as e:
        stack_traces = list(traceback.TracebackException.from_exception(e).format())
        exception_message = stack_traces[-1].replace("\n", "")

        if isinstance(e, (BadRequestError, APIError)):
            # コンテンツフィルターでエラーになった場合
            if exception_message.find("Your request was rejected by the safety system") != -1:
                print(f"Warning: Your request was rejected by the safety system. <{user_prompt}>")
            else:
                print(f"BadRequestError: [{exception_message}] <{stack_traces}>")
        else:
            print(f"Error: [{exception_message}] <{stack_traces}>")

    finally:
        # 実行時間計測終了
        end_time = time.time()
        execution_time = end_time - begin_time
        await client.close()
        await credential.close()
        print("Client closed")
        print(f"End to edit image. execution time: [{execution_time}] seconds, list [{image_file_names}], total_tokens: [{total_tokens}]")


if __name__ == "__main__":
    user_prompt = PROMPT
    images = []
    input_fidelity = "low"
    output_image_file_base = f"{BASENAME}"
    mask_file_type = None

    if not os.path.exists(OUTPUT):
        os.makedirs(OUTPUT)
        print(f"Directory {OUTPUT} created.")

    # ユーザー入力
    prompt_type = input("please input prompt type [generate, edit] (default: generate): ") or "generate"
    if prompt_type == "generate":
        user_prompt = input(f"please input prompt (default: {PROMPT}): ") or PROMPT
    else:
        input_fidelity = input(f"please input input quality [low, high] (default: low): ") or "low"
        if prompt_type == "edit":
            user_prompt = input(f"please input edit prompt (default: {EDIT_PROMPT}): ") or EDIT_PROMPT

            # デフォルト画像パス
            default_path = "./image_0.png"
            default_mask_path = "./mask.png"

            # ファイルパスをテキスト入力で受け取る
            continue_response = "y"
            while continue_response.lower() == 'y':
                exist_file_path = input(f"Please input file path (default: {default_path}): ") or default_path
                with open(exist_file_path, "rb") as image_file:
                    image_data = image_file.read()
                    file_type = (os.path.basename(exist_file_path), image_data, "image/png")
                    images.append(file_type)
                continue_response = input("Do you want to add another image? [y/N]: ") or "N"

            # デフォルト画像パス
            # ファイル選択ダイアログ
            mask_file_path = input(f"Please input mask file path (default: None): ") or None
            if not mask_file_path is None:
                with open(mask_file_path, "rb") as mask_file:
                    mask_data = mask_file.read()
                    mask_file_type = (os.path.basename(mask_file_path), mask_data, "image/png")

    stream_flag_str = input(f"please input stream flag [True, False] (default: False): ") or "False"
    stream_flag = bool(stream_flag_str == "True")
    if stream_flag:
        n = 1
        partial_images = int(input(f"please input number of partial images to generate (1-3) (streaming only) (default: 2): ") or 2)
    else:
        n = int(input(f"please input number of images to generate (1-10) (default: 1): ") or 1)
        partial_images = 0
    quality = input(f"please input quality [low, medium, high, auto] (default: low): ") or "low"
    background = input(f"please input background [transparent, opaque, auto] (default: auto): ") or "auto" # 透明 or 不透明 or 自動
    moderation = input(f"please input moderation [low, auto] (default: low): ") or "low" # 生成画像のモデレーションの厳しさ
    output_format = input(f"please input output format [png, jpeg] (default: png): ") or "png"
    if output_format == "jpeg":
        output_compression = int(input(f"please input output compression [0-100] (default: 100): ") or 100)
    else:
        output_compression = 100
    output_image_file_base = input(f"please input output image file base name (default: {BASENAME}): ") or BASENAME
    output_image_file_base = f"{OUTPUT}/{output_image_file_base}"
    size = input(f"please input image size [auto, 1024x1024, 1024x1536, 1536x1024] (default: 1024x1024): ") or "1024x1024"
    timeout = int(input(f"please input timeout in seconds (default: 600): ") or 600)

    # 画像生成
    if prompt_type == "generate":
        asyncio.run(generate_image(
            user_prompt=user_prompt,
            background=background,
            moderation=moderation,
            output_compression=output_compression,
            output_format=output_format,
            output_file_basename=output_image_file_base,
            quality=quality,
            timeout=timeout,
            n=n,
            size=size,
            stream_flag=stream_flag,
            partial_images=partial_images
        ))

    # エディットモード ----------------------------------------------------------------------------
    # 2つの画像を組み合わせ
    elif prompt_type == "edit":
        asyncio.run(edit_image(
            image=images,
            user_prompt=user_prompt,
            background=background,
            output_compression=output_compression,
            output_format=output_format,
            output_file_basename=output_image_file_base,
            input_fidelity=input_fidelity,
            quality=quality,
            timeout=timeout,
            n=n,
            size=size,
            stream_flag=stream_flag,
            partial_images=partial_images,
            mask=mask_file_type
        ))

トークン消費と性能の実測結果

Azure OpenAIでGPT-Image-1で各パラメータを変更した場合のトークン数の推移を計測結果した結果を以下に記載しました。

画像生成（Generate）

クォリティの変更

クォリティをlow → medium → highに変更した場合の推移です。

入力トークン：変化なし
出力トークン：大きく増加
実行時間：やや増加

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	png	low	-	1024x1024	100	20	272	1,104,884	23
Generate	なし	1	png	medium	-	1024x1024	100	20	1056	1,255,448	24
Generate	なし	1	png	high	-	1024x1024	100	20	4160	1,339,727	39

クォリティを上げると出力トークン数は大きく上昇する
クォリティを上げると実行時間も伸びる傾向があるが上昇度合いは微小

解像度の変更

1024x1024 → 1536x1024に解像度を上げた場合の実行結果を以下に示します。

入力トークン：変化なし
出力トークン：増加
実行時間：変化なし

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	png	low	-	1024x1024	100	20	272	1,104,884	23
Generate	なし	1	png	low	-	1536x1024	100	20	400	1,317,319	23

1024x1024 → 1536x1024に解像度を上げると、トークン消費は増える、実行時間の増減は微小

出力画像の種類の変更

画像をPNG、JPEGに変更した場合の実行結果です。

入力トークン：変化なし
出力トークン：変化なし
実行時間：減少

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	png	low	-	1024x1024	100	20	272	1,104,884	23
Generate	なし	1	jpeg	low	-	1024x1024	100	20	272	62,478	13

PNGより低画質の落ちるJPEGでもトークン数は同じ、実行時間は若干JPEGの方が速い

圧縮率の変更

画像を圧縮率 10%に変更した場合の実行結果です。

入力トークン：変化なし
出力トークン：変化なし
実行時間：ほぼ変化なし

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	jpeg	low	-	1024x1024	100	20	272	62,478	13
Generate	なし	1	jpeg	low	-	1024x1024	10	20	272	28,813	14

圧縮率を下げてもトークン数、実行時間の大きな増減はなし

出力画像ファイル数を変更

出力画像ファイル数を変更した場合の実行結果を以下に示します。

入力トークン：増加
出力トークン：増加
実行時間：増加

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	png	low	-	1024x1024	100	20	272	1,104,884	23
Generate	なし	2	png	low	-	1024x1024	100	40	584	745,768 ～ 950,497	21
Generate	なし	10	png	low	-	1024x1024	100	200	2720	918,774 ～ 1,119,246	40

出力画面ファイル数に比例して入力トークンが増える (入力プロンプトは再利用されず、画像生成の度に消費される模様)
出力画面ファイル数に比例して出力トークンも増え、実行時間も長くなる

ストリーミング有効時

ストリーミングを有効にして、中間画像の生成枚数を変化させた場合の実行結果を以下に示します。

入力トークン：増加
出力トークン：増加
実行時間：ほぼ変化なし

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	png	low	-	1024x1024	100	20	272	1,104,884	23
Generate	あり	1	png	low	-	1024x1024	100	20	372	1,273,421	21
Generate	あり	2	png	low	-	1024x1024	100	20	472	1,205,977	22
Generate	あり	3	png	low	-	1024x1024	100	20	472	1,345,869	20

ストリーミングなしの場合よりもありの場合の方が出力トークンが多い、実行時間はほぼ変わらない
ストリーミングで中間画像を生成する枚数を増やすと、出力トークンも増える、実行時間はほぼ変わらない

画像編集（Edit）

忠実度を変更した場合

既存の画像の編集において史実度を変更した場合の推移は以下の通りです。
※使用する既存画像は約1.2MBのPNGファイルです

入力トークン：大幅増加
出力トークン：変化なし
実行時間：やや増加

API	ストリーミング	画像枚数	形式	クオリティ	忠実度	解像度	圧縮率	入力トークン	出力トークン	画像サイズ(Byte)	実行時間(秒)
Generate	なし	1	png	low	-	1024x1024	100	20	272	1,104,884	23
Edit	なし	1	png	low	low	1024x1024	100	444	272	1,136,540	31
Edit	なし	1	png	low	high	1024x1024	100	8708	272	1,228,977	37

画像編集(Image Edit)では、テキストのみ画像生成(Image Generation)よりも入力画像分、入力トークンが増える
忠実度を「low→high」に上げると入力トークンが跳ね上がる (444→8,708)、実行時間については大きな変動はない

まとめ

Azure OpenAIでGPT-Image-1を使えば、Pythonから直接画像生成・編集を実装できます。
解像度や品質設定によってトークン量が大きく異なり、料金と処理時間が大きく変わるため、最初は低解像度・低品質でテストし、要件に合わせて調整する必要があるでしょう。