Openai

Image Generation Enters the Platform Era: GPT-Image-1.5 in Microsoft Foundry

In recent years, generative AI technology has evolved rapidly. Beyond natural language processing, image generation and editing capabilities have become a key frontier of AI innovation. In this trend, OpenAI launched the GPT Image series models, which are also available within the Azure OpenAI Service. The newly released GPT-Image-1.5 can be seen as the new flagship in the field of image generation, offering significant improvements in performance, efficiency, and controllability....

A Beginner's Guide to LLM Architectures

Over the past five years, the development of Large Language Models (LLMs) has almost completely reshaped the technological landscape of artificial intelligence. From GPT to LLaMA, from Transformer to Mixture-of-Experts (MoE), and from monolithic models to large-scale distributed parameter server systems, architectural evolution has directly driven leaps in capability. This article will systematically review the mainstream technical paths of LLMs from an architectural perspective, and analyze their pros, cons, and suitable scenarios from an application perspective, providing a reference for technical selection for R&D and business teams....

Understanding KV-Cache - The Core Acceleration Technology for LLM Inference

As large language models (LLMs) continue to grow in scale, the cost of inference has skyrocketed. To enable models to respond to user requests faster and more economically, various optimization techniques have emerged. Among them, KV-Cache (Key-Value Cache) stands out as one of the most critical and impactful inference acceleration mechanisms, widely adopted by all major inference frameworks (e.g., vLLM, TensorRT-LLM, LLama.cpp, llm-d, OpenAI Triton Transformer Engine, etc.). This article provides a comprehensive introduction to what KV-Cache is, how it works, why it significantly improves inference efficiency, its impact on the industry, and best practices for its use....

Easily Generate Videos with Sora 2 from Azure AI Foundry

With Azure AI Foundry opening support for Sora 2 (OpenAI’s generative video model), developers can now access top-tier video generation capabilities in an enterprise-grade, compliant, and controllable environment. This tutorial will take you from zero to production, showing how to call Sora 2 via the Playground and the Python SDK to complete a “text-to-video” workflow. Prerequisites Before starting, you need: Get an Azure subscription You need an Azure subscription. If you’re unsure how to get one, refer to the subscription registration section in my earlier article....

Building Your Own ChatGPT on Azure Without Writing Any Code

Using ChatGPT to help us solve problems in our work and daily life has become a habit. However, after using the official GPT-4o heavily, we may encounter temporary quota issues. Today, we will show you how to easily build your own personalized ChatGPT application using Azure OpenAI services. Prerequisites Before we begin, make sure you have an Azure global subscription. If you don’t have one yet, you can easily start an Azure subscription through Pay-as-you-go:...