Balancing Reasoning vs Output in GPT-5

bkermen · August 10, 2025, 7:56pm

I’m running into an issue with the new GPT-5 model. I’ve got a detailed prompt that needs to be followed to the letter, so I set reasoning effort to high. The output I want is short (around 100 words), so at first I set the max response size to 4000. This worked fine at first… but then some runs started returning empty strings.

Looking at the profiler logs, I saw the model generating all 4000 tokens for reasoning and leaving nothing for the actual output.

So, I tried bumping the max response size to 20000 - thinking that would be plenty. That fixed the empty output problem, but now almost all 20000 tokens are going into reasoning, which really drives up the cost.

Ideally, I’d like a setup where I can have something like 5000 total tokens, with ~90% going to reasoning. Claude Sonnet 3.7 lets you set both a max response size and a separate max reasoning size, but GPT-5 doesn’t seem to have that option.

Any suggestions on how to best handle this?

Topic		Replies	Views
Profiler: Comparing Models Beyond Temp & Max Response Size Feature Requests	1	23	August 11, 2025
Anthropic Error Support	7	45	September 25, 2025
Response token count much higher than expected Support	0	16	September 18, 2025
Invalid response size Support	11	73	July 29, 2025
Generate Text is ending early despite OpenAi 4.1 being set to 100k tokens Community Discussion	2	47	May 23, 2025

Balancing Reasoning vs Output in GPT-5

Related topics