I’m running into an issue with the new GPT-5 model. I’ve got a detailed prompt that needs to be followed to the letter, so I set reasoning effort to high. The output I want is short (around 100 words), so at first I set the max response size to 4000. This worked fine at first… but then some runs started returning empty strings.
Looking at the profiler logs, I saw the model generating all 4000 tokens for reasoning and leaving nothing for the actual output.
So, I tried bumping the max response size to 20000 - thinking that would be plenty. That fixed the empty output problem, but now almost all 20000 tokens are going into reasoning, which really drives up the cost.
Ideally, I’d like a setup where I can have something like 5000 total tokens, with ~90% going to reasoning. Claude Sonnet 3.7 lets you set both a max response size and a separate max reasoning size, but GPT-5 doesn’t seem to have that option.
Any suggestions on how to best handle this?