Welcome!

Cloud Native, Enterprise Application, Open Source and AI!

Optimizing Inference with Parameter/Data (P/D) Separation in vLLM Framework

Large language models often encounter GPU memory bottlenecks during inference deployment: Model parameters (P) can reach hundreds of GB and must remain resident in GPU memory. Input/output data (D) changes dynamically with each request but is often coupled with parameters on the same device, leading to imbalanced memory usage and limited scalability. To solve this problem, we can leverage the vLLM framework to implement Parameter/Data (P/D) Separation, improving the flexibility and throughput of inference systems....

September 29, 2025 · 5 min

Getting Started with Microsoft’s Latest Open-Source Long-Form Speech Model VibeVoice

What is VibeVoice? VibeVoice is a research framework released by Microsoft Research for long-form, multi-speaker, conversational speech synthesis. Target scenarios include entire podcast episodes, audio dramas, or interviews: it can maintain speaker consistency within a single generation and handle natural turn-taking. The model family includes multiple scales (e.g., 1.5B, 7B, etc.) and is available on Hugging Face as microsoft/VibeVoice-1.5B, along with model cards, weights, installation guides, and responsible use notes....

September 18, 2025 · 4 min

A Beginner’s Guide to Inference with the SGLang Framework

As large language models (LLMs) grow in popularity, the focus for enterprises and individuals has shifted from training to inference (in other words, moving from “building wheels” to practical usage). In the field of inference, the two hottest frameworks are undoubtedly vLLM and SGLang. As a rising star, SGLang has also attracted attention. Today, we’ll explore SGLang through a beginner-friendly tutorial to help more people understand both LLM inference and the SGLang framework....

July 10, 2025 · 5 min

Easily Deploy and use DeepSeek-R1 with Azure AI Foundry

The popularity of DeepSeek has once again showcased the charm of AI. However, this has not led to a reduction in the demand for computing power. Instead, it has brought about another wave of demand for computing power by building more AI business scenarios in low-cost, user-friendly artificial intelligence. Today, we will quickly experience the elegance of DeepSeek through Azure AI Foundry (formerly Azure AI Studio). Prerequisites First, you need to have an Azure subscription....

February 10, 2025 · 7 min

Building Your Own ChatGPT on Azure Without Writing Any Code

Using ChatGPT to help us solve problems in our work and daily life has become a habit. However, after using the official GPT-4o heavily, we may encounter temporary quota issues. Today, we will show you how to easily build your own personalized ChatGPT application using Azure OpenAI services. Prerequisites Before we begin, make sure you have an Azure global subscription. If you don’t have one yet, you can easily start an Azure subscription through Pay-as-you-go:...

June 25, 2024 · 5 min

Azure 101 Series: Microsoft Azure Overview

Azure is a cloud computing platform and service provided by Microsoft. It offers a range of infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) solutions for building, deploying, and managing various types of applications and services. Overview Azure provides a wide range of features and services, including virtual machines, storage, databases, artificial intelligence, machine learning, blockchain, Internet of Things (IoT), containers, and serverless computing....

June 19, 2024 · 13 min