Friday, April 25, 2025

How one can Generate and Edit Pictures Utilizing OpenAI gpt-image-1 API

The final time OpenAI’s ChatGPT launched a picture technology mannequin, it rapidly went viral throughout the web. Individuals have been captivated by the power to create Ghibli-style portraits of themselves, turning private reminiscences into animated paintings. Now, ChatGPT is taking issues a step additional with a brand new natively multimodal mannequin “gpt-image-1” which powers picture technology immediately inside ChatGPT and is now out there by way of API. On this article we are going to discover the important thing options of OpenAI’s gpt-image-1 mannequin and find out how to use it for picture technology and enhancing.

What’s gpt-image-1?

gpt-image-1 is the most recent and most superior multimodal language mannequin from OpenAI. It stands out for its capability to generate high-quality photos whereas incorporating real-world information into the visible content material. Whereas gpt-image-1 is really useful for its strong efficiency, the picture API additionally helps different specialised fashions like DALL·E 2 and DALL·E 3.

OpenAI gpt-image-1
Supply: OpenAI

The Picture API gives three key endpoints, every designed for particular duties:

  • Generations: Create photos from scratch utilizing a textual content immediate.
  • Edits: Modify present photos utilizing a brand new immediate, both partially or fully.
  • Variations: Generate variations of an present picture (out there with DALL·E 2 solely).
OpenAI gpt-image-1 API - endpoints
Supply: OpenAI

Additionally Learn: Imagen 3 vs DALL-E 3: Which is the Higher Mannequin for Pictures?

Key Options of gpt-image-1

gpt-image-1 gives a number of key options:

  • Excessive-fidelity photos: Produces detailed and correct visuals.
  • Various visible types: Helps a variety of aesthetics, from picture reasonable to summary.
  • Exact picture enhancing: Permits focused modifications to generated photos.
  • Wealthy world information: Understands advanced prompts with contextual accuracy.
  • Constant textual content rendering: Renders textual content inside photos reliably.

Availability

The OpenAI API permits customers to generate and edit photos from textual content prompts utilizing the GPT Picture or DALL·E fashions. At current, picture technology is accessible completely by means of the Picture API, although help for the Responses API is actively being developed.

To learn extra about gpt-image-1 click on right here.

gpt-image-1 Pricing

Earlier than diving into find out how to use and deploy the mannequin, it’s necessary to grasp the pricing to make sure its efficient and budget-conscious utilization.

The gpt-image-1 mannequin is priced per token, with totally different charges for textual content and picture tokens:

  • Textual content enter tokens (prompts): $5 per 1M tokens
  • Picture enter tokens (uploaded photos): $10 per 1M tokens
  • Picture output tokens (generated photos): $40 per 1M tokens

In sensible phrases, this roughly equates to:

  • ~$0.02 for a low-quality sq. picture
  • ~$0.07 for a medium-quality sq. picture
  • ~$0.19 for a high-quality sq. picture

For extra detailed pricing by picture high quality and determination, confer with the official pricing web page right here.

OpenAI gpt-image-1 API - image sizes and pricing
Supply: OpenAI

Observe: This mannequin generates photos by first creating specialised picture tokens. Subsequently, each latency and total price depend upon the variety of tokens used. Bigger picture dimensions and better high quality settings require extra tokens, growing each time and value.

How one can Entry gpt-image-1?

To generate the API key for gpt-image-1:

  1. Register to the OpenAI platform
  2. Go to Venture > API Keys
  3. Confirm your account

For this, first, go to: https://platform.openai.com/settings/group/common. Then, click on on “Confirm Group” to start out the verification course of. It’s quire just like any KYC verification, the place relying on the nation, you’ll be requested to add a photograph ID, after which confirm it with a selfie.

It’s possible you’ll comply with this documentation supplied by Open AI to raised perceive the verification course of.

Additionally Learn: How one can Use DALL-E 3 API for Picture Era?

gpt-image-1: Arms-on Software

Lastly it’s time to see how we are able to generate photos utilizing the gpt-image-1 API.

We shall be utilizing the picture technology endpoint to create photos based mostly on textual content prompts. By default, the API returns a single picture, however we are able to set the n parameter to generate a number of photos without delay in a single request.

Earlier than operating our most important code, we have to first run the code for set up and organising the atmosphere.

!pip set up openai import os os.environ['OPENAI_API_KEY'] = ""

Producing Pictures Utilizing gpt-image-1

Now, let’s attempt producing a picture utilizing this new mannequin.

Enter Code:

from openai import OpenAI import base64 consumer = OpenAI() immediate = """ A serene, peaceable park scene the place people and pleasant robots are having fun with the day collectively - some are strolling, others are taking part in video games or sitting on benches beneath timber. The environment is heat and harmonious, with delicate daylight filtering by means of the leaves. """ end result = consumer.photos.generate(     mannequin="gpt-image-1",     immediate=immediate ) image_base64 = end result.information[0].b64_json image_bytes = base64.b64decode(image_base64) # Save the picture to a file with open("utter_bliss.png", "wb") as f:     f.write(image_bytes)

Output:

image generated using OpenAI gpt-image-1 API

Modifying Pictures Utilizing gpt-image-1

gpt-image-1 gives a lot of picture enhancing choices. The picture edits endpoint lets us:

  • Edit present photos
  • Generate new photos utilizing different photos as a reference
  • Edit elements of a picture by importing a picture and masks indicating which areas ought to be changed (a course of generally known as inpainting)

Modifying an Picture Utilizing a Masks

Let’s attempt enhancing a picture utilizing a masks. We’ll add a picture and supply a masks to specify which elements of it ought to be edited.

input image for editing

The clear areas of the masks shall be changed based mostly on the immediate, whereas the colored areas will stay unchanged.

Now, let me ask the mannequin so as to add Elon Musk to my uploaded picture.

Enter Code:

from openai import OpenAI consumer = OpenAI() end result = consumer.photos.edit(     mannequin="gpt-image-1",     picture=open("/content material/analytics_vidhya_1024.png", "rb"),     masks=open("/content material/mask_alpha_1024.png", "rb"),     immediate="Elon Musk standing in entrance of Firm Emblem" ) image_base64 = end result.information[0].b64_json image_bytes = base64.b64decode(image_base64) # Save the picture to a file with open("Elon_AV.png", "wb") as f:     f.write(image_bytes)

Output:

edited picture

Factors to notice whereas enhancing a picture utilizing gpt-image-1:

  • The picture you wish to edit and the corresponding masks have to be in the identical format and dimensions, and every ought to be lower than 25MB in dimension.
  • The immediate you give can be utilized to explain your complete new picture, not simply the portion being edited.
  • Should you provide a number of enter photos, the masks shall be utilized solely to the primary picture.
  • The masks picture should embody an alpha channel. Should you’re utilizing a picture enhancing instrument to create the masks, make sure that it’s saved with an alpha channel enabled.
  • If in case you have a black-and-white picture, you should use a program so as to add an alpha channel and convert it into a sound masks as supplied under:
from PIL import Picture from io import BytesIO # 1. Load your black & white masks as a grayscale picture masks = Picture.open("/content material/analytics_vidhya_masked.jpeg").convert("L") # 2. Convert it to RGBA so it has house for an alpha channel mask_rgba = masks.convert("RGBA") # 3. Then use the masks itself to fill that alpha channel mask_rgba.putalpha(masks) # 4. Convert the masks into bytes buf = BytesIO() mask_rgba.save(buf, format="PNG") mask_bytes = buf.getvalue() # 5. Save the ensuing file img_path_mask_alpha = "mask_alpha.png" with open(img_path_mask_alpha, "wb") as f:     f.write(mask_bytes)

Finest Practices for Utilizing the Mannequin

Listed below are some suggestions and finest practices to comply with whereas utilizing gpt-image-1 for producing or enhancing photos.

  1. You may customise how your picture appears to be like by setting choices like dimension, high quality, file format, compression stage, and whether or not the background is clear or not. These settings provide help to management the ultimate output to match your particular wants.
  2. For sooner outcomes, go along with sq. photos (1024×1024) and commonplace high quality. You may as well select portrait (1536×1024) or panorama (1024×1536) codecs. High quality will be set to low, medium, or excessive, and each dimension and high quality default to auto if not specified.
  3. Observe that the Picture API returns the base64-encoded picture information. The default format is png, however we are able to additionally request it in jpeg or webp.
  4. If you’re utilizing jpeg or webp, then you can even specify the output_compression parameter to manage the compression stage (0-100%). For instance, output_compression=50 will compress the picture by 50%.

Purposes of gpt-image-1

From artistic designing and e-commerce to training, enterprise software program, and gaming, gpt-image-1 has a variety of functions.

  • Gaming: content material creation, sprite masks, dynamic backgrounds, character technology, idea artwork
  • Artistic Instruments: paintings technology, type switch, design prototyping, visible storytelling
  • Schooling: visible aids, historic recreations, interactive studying content material, idea visualization
  • Enterprise Software program: slide visuals, report illustrations, data-to-image technology, branding belongings
  • Promoting & Advertising and marketing: marketing campaign visuals, social media graphics, localized content material creation
  • Healthcare: medical illustration, affected person scan visuals, artificial picture information for mannequin coaching
  • Structure & Actual Property: inside mockups, exterior renderings, format previews, renovation concepts
  • Leisure & Media: scene ideas, promotional materials, digital doubles

Limitations of gpt-image-1

The GPT-4o Picture mannequin is a robust and versatile instrument for picture technology, however there are nonetheless a number of limitations to remember:

  • Latency: Extra advanced prompts can take as much as 2 minutes to course of.
  • Textual content Rendering: Whereas considerably higher than the DALL·E fashions, the mannequin should still face challenges with exact textual content alignment and readability.
  • Consistency: Though it may generate visually constant photos, the mannequin might often wrestle to take care of uniformity for recurring characters or model components throughout a number of photos.
  • Composition Management: Even with improved instruction-following capabilities, the mannequin might not all the time place components precisely in structured or layout-sensitive designs.

Mannequin Comparability

Right here’s how OpenAI’s gpt-image-1 compares with the favored DALL·E fashions:

Mannequin Endpoints Options
DALL·E 2 Generations, Edits, Variations Decrease price, helps concurrent requests, consists of inpainting functionality
DALL·E 3 Generations solely Increased decision and higher picture high quality than DALL·E 2
gpt-image-1 Generations, Edits (Responses API coming quickly) Glorious instruction-following, detailed edits, real-world consciousness

Conclusion

OpenAI’s gpt-image-1 showcases highly effective picture technology capabilities with help for creation, enhancing, and variations all coming from easy textual prompts. Whereas the technology of photos might take a while, the standard and management it gives make it extremely sensible and rewarding total.

Picture technology fashions like this facilitate sooner content material creation, personalization, and sooner prototyping. With built-in customization choices for dimension, high quality, format, and many others. and even inpainting capabilities, gpt-image-1 gives builders full and clear management over the specified output.

Whereas some would possibly fear that this know-how may substitute human creativity, it’s necessary to notice that such instruments goal to boost human creativity and be useful instruments for artists. Whereas we must always positively respect originality, we should additionally embrace the comfort that this know-how brings. We should discover the best stability the place such instruments assist us innovate with out taking away the worth of genuine, human-made work.

GenAI Intern @ Analytics Vidhya | Last 12 months @ VIT Chennai
Obsessed with AI and machine studying, I am desirous to dive into roles as an AI/ML Engineer or Information Scientist the place I could make an actual impression. With a knack for fast studying and a love for teamwork, I am excited to convey modern options and cutting-edge developments to the desk. My curiosity drives me to discover AI throughout varied fields and take the initiative to delve into information engineering, guaranteeing I keep forward and ship impactful initiatives.

Login to proceed studying and revel in expert-curated content material.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles