Phi-4-Reasoning-Vision代码实例:TextIteratorStreamer流式封装详解

张开发
2026/4/6 6:13:42 15 分钟阅读

分享文章

Phi-4-Reasoning-Vision代码实例:TextIteratorStreamer流式封装详解
Phi-4-Reasoning-Vision代码实例TextIteratorStreamer流式封装详解1. 项目概述Phi-4-Reasoning-Vision是基于微软Phi-4-reasoning-vision-15B多模态大模型开发的高性能推理工具专为双卡RTX 4090环境优化。该工具严格遵循官方SYSTEM PROMPT规范支持THINK/NOTHINK双推理模式、图文多模态输入、流式输出与思考过程折叠展示等功能。1.1 核心特性双卡并行优化自动将15B模型拆分至两张RTX 4090显卡精准Prompt适配严格遵循Phi-4官方SYSTEM PROMPT要求智能流式输出基于TextIteratorStreamer实现逐字输出解析多模态输入支持同时处理图片上传和文本提问专业级交互界面通过Streamlit搭建宽屏交互界面2. 环境准备与部署2.1 硬件要求两张NVIDIA RTX 4090显卡至少64GB系统内存CUDA 11.7或更高版本2.2 软件依赖安装pip install torch2.0.1cu117 --extra-index-url https://download.pytorch.org/whl/cu117 pip install transformers4.31.0 streamlit1.25.02.3 模型下载与配置from transformers import AutoModelForCausalLM, AutoTokenizer model_name microsoft/phi-4-reasoning-vision-15B tokenizer AutoTokenizer.from_pretrained(model_name) model AutoModelForCausalLM.from_pretrained( model_name, device_mapauto, torch_dtypetorch.bfloat16 )3. TextIteratorStreamer流式封装实现3.1 流式输出基础实现from transformers import TextIteratorStreamer from threading import Thread def generate_stream_response(prompt, max_length512): inputs tokenizer(prompt, return_tensorspt).to(cuda) streamer TextIteratorStreamer(tokenizer) generation_kwargs dict( inputs, streamerstreamer, max_new_tokensmax_length ) thread Thread(targetmodel.generate, kwargsgeneration_kwargs) thread.start() for new_text in streamer: yield new_text3.2 THINK/NOTHINK模式处理def process_think_mode(response): think_blocks response.split() result { thinking: [], final_answer: } for i, block in enumerate(think_blocks): if i % 2 1: # 位于之间的内容为思考过程 result[thinking].append(block.strip()) else: if i len(think_blocks) - 1: # 最后一部分为最终答案 result[final_answer] block.strip() return result3.3 多模态输入处理from PIL import Image import base64 from io import BytesIO def process_image_upload(uploaded_file): image Image.open(uploaded_file) buffered BytesIO() image.save(buffered, formatJPEG) img_str base64.b64encode(buffered.getvalue()).decode() return fdata:image/jpeg;base64,{img_str}4. Streamlit交互界面实现4.1 界面布局设计import streamlit as st def setup_interface(): st.set_page_config(layoutwide) st.title(Phi-4-Reasoning-Vision 多模态推理工具) col1, col2 st.columns([1, 2]) with col1: st.subheader(参数配置) uploaded_file st.file_uploader(上传一张图片以供分析, type[jpg, png]) question st.text_area(提出你的问题, height100) if st.button( 开始推理): if uploaded_file is None: st.error(请先上传图片) else: with st.spinner(正在唤醒双卡算力...): process_inference(uploaded_file, question) with col2: st.subheader(结果展示) if result in st.session_state: display_result(st.session_state[result])4.2 推理结果展示def display_result(result): with st.expander(思考过程, expandedFalse): for i, thought in enumerate(result[thinking], 1): st.markdown(f**思考步骤 {i}**) st.write(thought) st.markdown(## 最终答案) st.write(result[final_answer]) if image in st.session_state: st.image(st.session_state[image], caption分析图片)5. 完整推理流程实现5.1 主推理函数def process_inference(uploaded_file, question): try: # 处理图片上传 image_data process_image_upload(uploaded_file) st.session_state[image] uploaded_file # 构建多模态Prompt prompt build_multimodal_prompt(image_data, question) # 流式生成响应 full_response response_container st.empty() for chunk in generate_stream_response(prompt): full_response chunk response_container.markdown(full_response) # 处理思考过程 processed_result process_think_mode(full_response) st.session_state[result] processed_result except RuntimeError as e: if CUDA out of memory in str(e): st.error(显存不足请尝试减小输入尺寸或关闭其他GPU程序) else: st.error(f推理错误: {str(e)})5.2 多模态Prompt构建def build_multimodal_prompt(image_data, question): system_prompt You are Phi-4-reasoning-vision, a multimodal AI assistant. When asked to analyze an image, please follow these steps: 1. First describe the image content in detail 2. Then analyze any hidden clues or subtle elements 3. Finally provide a comprehensive answer to the question Use to separate your thinking process from the final answer. return fSYSTEM: {system_prompt} USER: [IMAGE]{image_data}[/IMAGE] QUESTION: {question} ASSISTANT: 6. 总结6.1 关键技术点回顾双卡并行优化通过device_mapauto实现模型自动拆分流式输出处理使用TextIteratorStreamer实现逐字输出思考过程解析精准识别分隔符分离思考与结论多模态输入封装正确处理图片上传与文本提问组合6.2 性能优化建议使用torch.bfloat16减少显存占用合理设置max_new_tokens控制生成长度确保CUDA版本与PyTorch匹配定期清理GPU缓存避免内存泄漏6.3 应用场景扩展复杂图像内容分析视觉推理问题解答多模态数据理解专业领域图像解读获取更多AI镜像想探索更多AI镜像和应用场景访问 CSDN星图镜像广场提供丰富的预置镜像覆盖大模型推理、图像生成、视频生成、模型微调等多个领域支持一键部署。

更多文章