Scaling Stress Test: Stalled Runs During High-Concurrency Rollout

I have an agent in active rollout that frequently sees multiple runs initiated in a short period of time. The agent does significant processing, so a single run typically takes 4–6 minutes (sometimes more) to complete.

Yesterday, 8 runs were initiated within roughly 10 minutes, resulting in several running concurrently. 3 of those runs stalled with the “Running…” indicator frozen — not progressing to the next block. After about 20 minutes I cancelled them, replayed the runs from the point they stalled, and everything completed successfully.

My question: is there a concurrency limit or breakpoint when running multiple instances of the same workflow simultaneously? Should we be throttling the trigger cadence between runs, or is concurrent execution something the platform is designed to handle at this scale?

Worth noting — this is the first time I’ve seen this behaviour, though it also coincides with the first couple of weeks of stress-testing the agent at this volume.