Cannot retrieve streamed reasoning output from openai/gpt-oss-20b with vLLM and LangChain

#149
by JSchaffauser - opened

Hi everyone,

I'm running the openai/gpt-oss-20b model locally via a Docker container using vllm-openai, and trying to stream the reasoning output through LangChain's ChatOpenAI.

My goal is to receive intermediate reasoning steps in the following format during streaming:

{'type': 'reasoning', 'text': 'the chunk of content generated by the reasoning'}

However, the only thing I get during the reasoning phase is this:

{'type': 'reasoning', 'status': 'in_progress'}

Then, after a pause (presumably while the model is reasoning), the stream continues—but only with 'type': 'text' tokens, like so:

[{'type': 'text', 'text': '":', 'index': 1}]
[{'type': 'text', 'text': ' "', 'index': 1}]
[{'type': 'text', 'text': '6', 'index': 1}]
[{'type': 'text', 'text': '",\n', 'index': 1}]
...

I have checked LangChain, OpenAI, and vLLM docs to check for potential missing flag or configuration, without finding answers.

Here is the code snippet

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model_name="gpt-oss-20b",	
    output_version="responses/v1",
    base_url="http://localhost:8000/v1",
    api_key="sk-no-key",
    temperature=0,
    request_timeout=360,
    max_retries=0,
    reasoning_effort="medium",
    streaming=True,
)

chain = promt_template | llm

stream_iter = chain.stream({"context": context_text, "question": prompt_question})

for chunk in stream_iter:
    print(chunk.content if hasattr(chunk, "content") else chunk, flush=True)

Setup

  • Model : openai/gpt-oss-20b downloaded locally
  • Docker image : vllm/vllm-openai:v0.10.1
  • langchain>=0.3.27
  • langchain-openai>=0.3.33

Thank you in advance for any help !

in stream() . stream_mode may be "messages"
https://github.com/langchain-ai/langgraph/discussions/3215

Sign up or log in to comment