Remote fetch latency from LFS

heyhaiden · April 22, 2025, 9:35am

What happens:
Two identical agent runs on the same event page using the same model (llama-3.1-8b-instant-groq) returned the same output, but with a massive performance difference.

Run ID: d00e8da7 took ~35 seconds to return the first token
Run ID: a0386e03 completed in ~1.2 seconds.

Intended behavior:
Consistent response latency across equivalent inputs. Remote file fetch or model queueing shouldn’t introduce ~30 sec variability.
Agent ID:
d0f9126b-d4bb-4356-8283-50ae866c9ee6
Attachment:
See screenshot attached.

Screenshot 2025-04-22 at 09.55.102178×510 84.2 KB

Simon · April 22, 2025, 10:10am

Thanks for getting in touch. Please may you also send this through to support@mindstudio.ai so we can take a deeper look into this.

Thank you!

heyhaiden · April 22, 2025, 11:02am

Sent, thanks!

sean · April 22, 2025, 11:23am

Hi there, I took a deeper look and it appears this was a service interruption caused by Groq—it looks like they had a network issue from which it took a moment to recover. You can see in the attached logs screenshot (these are logs I pulled directly from Groq, not logs from MindStudio) that the first request fails entirely, then is automatically retried and takes a while to complete.

Unfortunately, while we try our best to deliver a consistent experience, the model providers are dealing with a lot of demand and we tend to see these sorts of issues from time to time (e.g., take a look at the “API” section on Anthropic’s status page to see how frequently they have outages: https://status.anthropic.com/).

Please let me know if that answers your query or if there is anything else I can do to help! Thanks!

heyhaiden · April 22, 2025, 11:56am

Thanks Sean, totally understand. To clarify, when I see “Received first response token” in debugger, is this always referencing the model provider?

sean · April 22, 2025, 12:24pm

Correct! This is also seen in the Groq screenshot above as “TTFT” (time to first token). The latency column is then TTFT + how long it takes to generate the full result. Sometimes if TTFT is high it means there are availability or other network issues affecting the provider, where the model provider is struggling to get the request started. As opposed to low TTFT but high latency, which just means the model is taking a lot of time to do its work.

“Running model” = “MindStudio has sent the request to the model provider and the model provider has acknowledged that they have received the request”
“Received first response token” = “We’ve started getting result data back from the model provider”
“Received full result…” = “Model provider has finished responding and MindStudio is now processing the result”

Topic		Replies	Views
API Speed Throttling? Support	3	32	April 22, 2025
Severe API Latency MindStudio IDE api	0	34	March 26, 2025
How to fix: Model did not return valid JSON? Support	0	7	April 13, 2025
Grok3 - Output (8,192 tokens) - Considerable Differences between output from grok3 directly, and my agent using grok3. - Urgent Support	8	18	July 9, 2025
Gemini 2.5 Flash cut off Bug Reports	4	30	May 22, 2025

Remote fetch latency from LFS

Related topics