Balancing Reasoning vs Output in GPT-5

I’m running into an issue with the new GPT-5 model. I’ve got a detailed prompt that needs to be followed to the letter, so I set reasoning effort to high. The output I want is short (around 100 words), so at first I set the max response size to 4000. This worked fine at first… but then some runs started returning empty strings.

Looking at the profiler logs, I saw the model generating all 4000 tokens for reasoning and leaving nothing for the actual output.

So, I tried bumping the max response size to 20000 - thinking that would be plenty. That fixed the empty output problem, but now almost all 20000 tokens are going into reasoning, which really drives up the cost.

Ideally, I’d like a setup where I can have something like 5000 total tokens, with ~90% going to reasoning. Claude Sonnet 3.7 lets you set both a max response size and a separate max reasoning size, but GPT-5 doesn’t seem to have that option.

Any suggestions on how to best handle this?