Spaces:

stepfun-ai
/

Step3

Build error

App Files Files

Zenith Wang commited on 8 days ago

Commit

139c357

1 Parent(s): d2001c1

支持CoT推理展示，优化界面布局，简化说明文档

Browse files

Files changed (2) hide show

README.md +15 -51
app.py +113 -121

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Step-3 图像理解助手
 emoji: 🤖
 colorFrom: purple
 colorTo: blue
@@ -10,65 +10,29 @@ pinned: false
 license: mit
 ---
-# Step-3 图像理解助手 🤖
-基于阶跃星辰 Step-3 模型的智能图像理解和分析工具。
-## 功能特点
-- 🖼️ **图像理解**：上传图片，AI 自动分析图像内容
-- 💬 **自然语言交互**：使用中文自然语言描述你的需求
-- 🔄 **实时流式输出**：支持流式响应，实时查看生成结果
-- 🧠 **深度推理**：Step-3 模型具备强大的推理能力
-## 如何使用
-### 在 Hugging Face Spaces 中使用
-1. **配置 API 密钥**（重要！）
-   - 进入 Space 的 Settings 页面
-   - 在 "Repository secrets" 部分添加：
-     - Name: `STEP_API_KEY`
-     - Value: 你的阶跃星辰 API 密钥
-2. **使用应用**
-   - 上传一张图片
-   - 输入提示词（例如："这是什么？请详细描述"）
-   - 点击"开始分析"
-   - 等待 AI 返回结果
-### 获取 API 密钥
-1. 访问 [阶跃星辰官网](https://www.stepfun.com/)
-2. 注册/登录账号
-3. 在控制台创建 API 密钥
-## 示例提示词
-- "这张图片中有什么内容？请详细描述。"
-- "帮我看看这是什么菜，如何制作？"
-- "分析这张图片的构图和色彩运用。"
-- "这张图片可能是在什么地方拍摄的？"
-- "图片中的人物在做什么？他们的表情如何？"
 ## 技术栈
 - **模型**: Step-3 / Step-r1-v-mini
 - **框架**: Gradio 4.19.2
-- **API**: OpenAI Python SDK (兼容 Step API)
-## 注意事项
-- 请确保图片清晰度足够
-- 提示词越具体，分析结果越准确
-- API 密钥请妥善保管，不要公开分享
-## 许可证
-MIT License
-## 致谢
-- [阶跃星辰](https://www.stepfun.com/) - 提供强大的 AI 模型
-- [Gradio](https://gradio.app/) - 提供优秀的 Web UI 框架
-- [Hugging Face](https://huggingface.co/) - 提供免费的部署平台

 ---
+title: Step-3
 emoji: 🤖
 colorFrom: purple
 colorTo: blue
 license: mit
 ---
+# Step-3 🤖
+智能图像理解和分析工具，支持 Chain of Thought (CoT) 推理展示。
+## 主要特性
+- 🧠 **CoT 推理展示**：实时显示模型的思考过程
+- 🔄 **流式输出**：推理过程和最终答案分开展示
+- 🖼️ **图像分析**：支持多种图片格式
+- 📝 **双模型支持**：Step-3 和 Step-r1-v-mini
+## 如何配置
+在 Hugging Face Space 的 Settings → Repository secrets 中添加：
+- **Name**: `STEP_API_KEY`
+- **Value**: 你的 Step API 密钥
+## 获取 API 密钥
+访问 [阶跃星辰官网](https://www.stepfun.com/) 注册并获取 API 密钥。
 ## 技术栈
 - **模型**: Step-3 / Step-r1-v-mini
 - **框架**: Gradio 4.19.2
+- **API**: OpenAI Python SDK (兼容)

app.py CHANGED Viewed

@@ -8,7 +8,7 @@ from PIL import Image
 # 配置
 BASE_URL = "https://api.stepfun.com/v1"
-# 从环境变量获取API密钥（Hugging Face Spaces 推荐方式）
 STEP_API_KEY = os.environ.get("STEP_API_KEY", "")
 # 可选模型
@@ -19,7 +19,6 @@ def image_to_base64(image):
     if image is None:
         return None
-    # 如果是PIL图像，直接处理
     if isinstance(image, Image.Image):
         buffered = BytesIO()
         image.save(buffered, format="PNG")
@@ -28,25 +27,30 @@ def image_to_base64(image):
     return None
-def call_step_api(image, prompt, model, temperature=0.7, max_tokens=2000, stream_output=True):
-    """调用Step API进行图像分析和文本生成"""
     if image is None:
-        return "❌ 请先上传一张图片"
     if not prompt:
-        return "❌ 请输入提示词"
     if not STEP_API_KEY:
-        return "❌ API密钥未配置。请在 Hugging Face Space 的 Settings 中添加 STEP_API_KEY 环境变量。"
     # 转换图像为base64
     try:
         base64_image = image_to_base64(image)
         if base64_image is None:
-            return "❌ 图片处理失败"
     except Exception as e:
-        return f"❌ 图片处理错误: {str(e)}"
     # 构造消息
     messages = [
@@ -72,94 +76,90 @@ def call_step_api(image, prompt, model, temperature=0.7, max_tokens=2000, stream
     try:
         client = OpenAI(api_key=STEP_API_KEY, base_url=BASE_URL)
     except Exception as e:
-        return f"❌ 客户端初始化失败: {str(e)}"
     try:
         # 记录开始时间
         start_time = time.time()
-        if stream_output:
-            # 流式输出
-            response = client.chat.completions.create(
-                model=model,
-                messages=messages,
-                temperature=temperature,
-                max_tokens=max_tokens,
-                stream=True
-            )
-            full_response = ""
-            for chunk in response:
-                if chunk.choices and chunk.choices[0].delta:
-                    delta = chunk.choices[0].delta
-                    # 检查是否有内容
-                    if hasattr(delta, 'content') and delta.content:
-                        content = delta.content
-                        full_response += content
-                        yield content
-            # 显示生成时间
-            elapsed_time = time.time() - start_time
-            yield f"\n\n⏱️ 生成用时: {elapsed_time:.2f}秒"
-        else:
-            # 非流式输出
-            response = client.chat.completions.create(
-                model=model,
-                messages=messages,
-                temperature=temperature,
-                max_tokens=max_tokens,
-                stream=False
-            )
-            if response.choices and response.choices[0].message:
-                full_response = response.choices[0].message.content
-                elapsed_time = time.time() - start_time
-                yield f"{full_response}\n\n⏱️ 生成用时: {elapsed_time:.2f}秒"
-            else:
-                yield "❌ API返回空响应"
     except Exception as e:
         error_msg = str(e)
         if "api_key" in error_msg.lower():
-            yield "❌ API密钥错误：请检查密钥是否有效"
         elif "network" in error_msg.lower() or "connection" in error_msg.lower():
-            yield "❌ 网络连接错误：请检查网络连接"
         else:
-            yield f"❌ API调用错误: {error_msg[:200]}"
-def process_image_and_prompt(image, prompt, model, temperature, max_tokens, stream_output):
-    """处理图像和提示词的主函数"""
-    output = ""
-    for chunk in call_step_api(image, prompt, model, temperature, max_tokens, stream_output):
-        output = chunk
-        yield output
 # 创建Gradio界面
-with gr.Blocks(title="Step-3 图像理解助手", theme=gr.themes.Soft()) as demo:
     gr.Markdown("""
-    # 🤖 Step-3 图像理解助手
-    基于阶跃星辰 Step-3 模型的图像理解和分析工具。上传图片并输入提示词，让AI帮你分析图像内容。
-    ### 功能特点：
-    - 🖼️ 支持多种图片格式上传
-    - 💬 自然语言交互
-    - 🔄 实时流式输出
-    - 🧠 深度推理能力
     """)
-    # API密钥状态提示
-    if not STEP_API_KEY:
-        gr.Markdown("""
-        ⚠️ **注意：API密钥未配置**
-        请在 Hugging Face Space 的 Settings 中添加 Secret：
-        - Name: `STEP_API_KEY`
-        - Value: 你的阶跃星辰 API 密钥
-        """)
     with gr.Row():
         with gr.Column(scale=1):
             # 输入区域
@@ -171,9 +171,9 @@ with gr.Blocks(title="Step-3 图像理解助手", theme=gr.themes.Soft()) as dem
             prompt_input = gr.Textbox(
                 label="提示词",
-                placeholder="例如：帮我看看这是什么菜，如何制作？",
                 lines=3,
-                value="帮我详细描述这张图片的内容。"
             )
             with gr.Accordion("高级设置", open=False):
@@ -188,7 +188,7 @@ with gr.Blocks(title="Step-3 图像理解助手", theme=gr.themes.Soft()) as dem
                     maximum=1,
                     value=0.7,
                     step=0.1,
-                    label="Temperature (创造性)"
                 )
                 max_tokens_slider = gr.Slider(
@@ -198,76 +198,68 @@ with gr.Blocks(title="Step-3 图像理解助手", theme=gr.themes.Soft()) as dem
                     step=100,
                     label="最大输出长度"
                 )
-                stream_checkbox = gr.Checkbox(
-                    value=True,
-                    label="流式输出"
-                )
             submit_btn = gr.Button("🚀 开始分析", variant="primary")
             clear_btn = gr.Button("🗑️ 清空", variant="secondary")
         with gr.Column(scale=1):
-            # 输出区域
-            output_text = gr.Textbox(
-                label="分析结果",
-                lines=20,
-                max_lines=30,
-                show_copy_button=True
             )
-    # 示例（仅提供提示词示例）
     gr.Examples(
         examples=[
-            ["这张图片中有什么内容？请详细描述。", "step-3"],
-            ["帮我看看这是什么菜，如何制作？", "step-3"],
-            ["分析这张图片的构图和色彩运用。", "step-3"],
-            ["这张图片可能是在什么地方拍摄的？", "step-3"],
-            ["图片中的人物在做什么？他们的表情如何？", "step-3"],
-            ["这个产品的设计有什么特点？", "step-3"],
         ],
         inputs=[prompt_input, model_select],
-        label="提示词示例（请先上传图片）"
     )
-    # 事件处理
     submit_btn.click(
-        fn=process_image_and_prompt,
         inputs=[
             image_input,
             prompt_input,
             model_select,
             temperature_slider,
-            max_tokens_slider,
-            stream_checkbox
         ],
-        outputs=output_text,
         show_progress=True
     )
     clear_btn.click(
-        fn=lambda: (None, "", ""),
         inputs=[],
-        outputs=[image_input, prompt_input, output_text]
     )
     # 页脚
     gr.Markdown("""
     ---
-    ### 使用说明：
-    1. 上传一张图片（支持 JPG、PNG 等格式）
-    2. 输入你的问题或分析需求
-    3. 点击"开始分析"按钮
-    4. 等待AI返回分析结果
-    ### 注意事项：
-    - 请确保图片清晰度足够
-    - 提示词越具体，分析结果越准确
-    - 可以在高级设置中调整模型参数
-    Powered by [阶跃星辰 Step-3](https://www.stepfun.com/)
     """)
-# 启动应用 - Hugging Face Spaces 会自动调用
 if __name__ == "__main__":
     demo.launch()

 # 配置
 BASE_URL = "https://api.stepfun.com/v1"
+# 从环境变量获取API密钥
 STEP_API_KEY = os.environ.get("STEP_API_KEY", "")
 # 可选模型
     if image is None:
         return None
     if isinstance(image, Image.Image):
         buffered = BytesIO()
         image.save(buffered, format="PNG")
     return None
+def call_step_api(image, prompt, model, temperature=0.7, max_tokens=2000):
+    """调用Step API进行图像分析和文本生成，支持CoT推理展示"""
     if image is None:
+        yield "❌ 请先上传一张图片", ""
+        return
     if not prompt:
+        yield "❌ 请输入提示词", ""
+        return
     if not STEP_API_KEY:
+        yield "❌ API密钥未配置。请在 Hugging Face Space 的 Settings 中添加 STEP_API_KEY 环境变量。", ""
+        return
     # 转换图像为base64
     try:
         base64_image = image_to_base64(image)
         if base64_image is None:
+            yield "❌ 图片处理失败", ""
+            return
     except Exception as e:
+        yield f"❌ 图片处理错误: {str(e)}", ""
+        return
     # 构造消息
     messages = [
     try:
         client = OpenAI(api_key=STEP_API_KEY, base_url=BASE_URL)
     except Exception as e:
+        yield f"❌ 客户端初始化失败: {str(e)}", ""
+        return
     try:
         # 记录开始时间
         start_time = time.time()
+        # 流式输出
+        response = client.chat.completions.create(
+            model=model,
+            messages=messages,
+            temperature=temperature,
+            max_tokens=max_tokens,
+            stream=True
+        )
+        full_response = ""
+        reasoning_content = ""
+        final_answer = ""
+        is_reasoning = False
+        reasoning_started = False
+        for chunk in response:
+            if chunk.choices and chunk.choices[0].delta:
+                delta = chunk.choices[0].delta
+                if hasattr(delta, 'content') and delta.content:
+                    content = delta.content
+                    full_response += content
+                    # 检测reasoning标记
+                    if "<reasoning>" in content:
+                        is_reasoning = True
+                        reasoning_started = True
+                        # 提取<reasoning>之前的内容添加到final_answer
+                        before_reasoning = content.split("<reasoning>")[0]
+                        if before_reasoning:
+                            final_answer += before_reasoning
+                        # 提取<reasoning>之后的内容开始reasoning
+                        after_tag = content.split("<reasoning>")[1] if len(content.split("<reasoning>")) > 1 else ""
+                        reasoning_content += after_tag
+                    elif "</reasoning>" in content:
+                        # 提取</reasoning>之前的内容添加到reasoning
+                        before_tag = content.split("</reasoning>")[0]
+                        reasoning_content += before_tag
+                        is_reasoning = False
+                        # 提取</reasoning>之后的内容添加到final_answer
+                        after_reasoning = content.split("</reasoning>")[1] if len(content.split("</reasoning>")) > 1 else ""
+                        final_answer += after_reasoning
+                    elif is_reasoning:
+                        reasoning_content += content
+                    else:
+                        final_answer += content
+                    # 实时输出
+                    if reasoning_started:
+                        yield reasoning_content, final_answer
+                    else:
+                        yield "", final_answer
+        # 添加生成时间
+        elapsed_time = time.time() - start_time
+        time_info = f"\n\n⏱️ 生成用时: {elapsed_time:.2f}秒"
+        final_answer += time_info
+        yield reasoning_content, final_answer
     except Exception as e:
         error_msg = str(e)
         if "api_key" in error_msg.lower():
+            yield "", "❌ API密钥错误：请检查密钥是否有效"
         elif "network" in error_msg.lower() or "connection" in error_msg.lower():
+            yield "", "❌ 网络连接错误：请检查网络连接"
         else:
+            yield "", f"❌ API调用错误: {error_msg[:200]}"
 # 创建Gradio界面
+with gr.Blocks(title="Step-3", theme=gr.themes.Soft()) as demo:
     gr.Markdown("""
+    # 🤖 Step-3
+    上传图片并输入提示词，让 Step-3 分析图像内容。
     """)
     with gr.Row():
         with gr.Column(scale=1):
             # 输入区域
             prompt_input = gr.Textbox(
                 label="提示词",
+                placeholder="例如：这是什么？请详细描述",
                 lines=3,
+                value="请详细描述这张图片的内容。"
             )
             with gr.Accordion("高级设置", open=False):
                     maximum=1,
                     value=0.7,
                     step=0.1,
+                    label="Temperature"
                 )
                 max_tokens_slider = gr.Slider(
                     step=100,
                     label="最大输出长度"
                 )
             submit_btn = gr.Button("🚀 开始分析", variant="primary")
             clear_btn = gr.Button("🗑️ 清空", variant="secondary")
         with gr.Column(scale=1):
+            # 推理过程展示
+            with gr.Accordion("💭 推理过程 (CoT)", open=True):
+                reasoning_output = gr.Textbox(
+                    label="思考过程",
+                    lines=10,
+                    max_lines=15,
+                    show_copy_button=True,
+                    interactive=False
+                )
+            # 最终答案展示
+            answer_output = gr.Textbox(
+                label="📝 分析结果",
+                lines=15,
+                max_lines=25,
+                show_copy_button=True,
+                interactive=False
             )
+    # 示例
     gr.Examples(
         examples=[
+            ["这张图片中有什么？", "step-3"],
+            ["详细描述图片内容", "step-3"],
+            ["这是什么物体？有什么特征？", "step-3"],
+            ["分析图片的主要元素", "step-3"],
         ],
         inputs=[prompt_input, model_select],
+        label="示例提示词"
     )
+    # 事件处理 - 流式输出到两个文本框
     submit_btn.click(
+        fn=call_step_api,
         inputs=[
             image_input,
             prompt_input,
             model_select,
             temperature_slider,
+            max_tokens_slider
         ],
+        outputs=[reasoning_output, answer_output],
         show_progress=True
     )
     clear_btn.click(
+        fn=lambda: (None, "", "", ""),
         inputs=[],
+        outputs=[image_input, prompt_input, reasoning_output, answer_output]
     )
     # 页脚
     gr.Markdown("""
     ---
+    Powered by [Step-3](https://www.stepfun.com/)
     """)
+# 启动应用
 if __name__ == "__main__":
     demo.launch()