Cannot retrieve streamed reasoning output from openai/gpt-oss-20b with vLLM and LangChain
#149
by
JSchaffauser
- opened
Hi everyone,
I'm running the openai/gpt-oss-20b
model locally via a Docker container using vllm-openai
, and trying to stream the reasoning output through LangChain's ChatOpenAI.
My goal is to receive intermediate reasoning steps in the following format during streaming:
{'type': 'reasoning', 'text': 'the chunk of content generated by the reasoning'}
However, the only thing I get during the reasoning phase is this:
{'type': 'reasoning', 'status': 'in_progress'}
Then, after a pause (presumably while the model is reasoning), the stream continues—but only with 'type': 'text' tokens, like so:
[{'type': 'text', 'text': '":', 'index': 1}]
[{'type': 'text', 'text': ' "', 'index': 1}]
[{'type': 'text', 'text': '6', 'index': 1}]
[{'type': 'text', 'text': '",\n', 'index': 1}]
...
I have checked LangChain, OpenAI, and vLLM docs to check for potential missing flag or configuration, without finding answers.
Here is the code snippet
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model_name="gpt-oss-20b",
output_version="responses/v1",
base_url="http://localhost:8000/v1",
api_key="sk-no-key",
temperature=0,
request_timeout=360,
max_retries=0,
reasoning_effort="medium",
streaming=True,
)
chain = promt_template | llm
stream_iter = chain.stream({"context": context_text, "question": prompt_question})
for chunk in stream_iter:
print(chunk.content if hasattr(chunk, "content") else chunk, flush=True)
Setup
- Model :
openai/gpt-oss-20b
downloaded locally - Docker image :
vllm/vllm-openai:v0.10.1
langchain>=0.3.27
langchain-openai>=0.3.33
Thank you in advance for any help !
in stream() . stream_mode may be "messages"
https://github.com/langchain-ai/langgraph/discussions/3215