How to ensure consistent data retrieval in my Data Source workflow

Fjmtrigo · January 13, 2026, 11:44am

I have a workflow that fetches information from a data source, analyzes it, and gives me the results based on this data source with just a 400-page PDF. It has given me good results, especially because I followed your suggestion and now it divides the question into three and then queries the data source three times in parallel to try to give a more accurate answer, it is working and giving very good answers.

However, it is not consistent. Sometimes the answers are very good, but there are times when it responds that it has no knowledge of this information and returns inconsistent (i.e., empty) results.
This is the workflow:

Example of a log where it ran the same question without any problems, fetched the respective chunks, and gave an accurate answer.

Example of a log with the same question asked immediately afterwards or even about 5 minutes before, where it did not return any chunks with answers.

Could this have to do with some update that needs to be made at the data source level, or is there a way to ensure consistency without having to worry about whether or not it retrieved the respective chunks?

thanks

fernando

Alex_MindStudio · January 13, 2026, 3:55pm

Hi @Fjmtrigo,

Could you please share the Debugger logs for the run that returned no chunks? That will let me take a closer look to understand what might have caused it.

To do that, open the Debugger, select the run, click Share, and reply with the URL:

Please make sure there’s no sensitive data in the debugger logs for the run you share.

Fjmtrigo · January 13, 2026, 4:12pm

Log with NO returned Chunks: Unknown Courses Management Query Failed - Debugger Logs | MindStudio

Log with returned Chunks: Granting Unknown Courses Management Permissions - Debugger Logs | MindStudio:

Fjmtrigo · January 14, 2026, 3:08pm

hi @Alex_MindStudio , forget to mention that the 3 “query data source” that are retrieving the chunks are running in parallell…

Alex_MindStudio · January 15, 2026, 1:43pm

Hi @Fjmtrigo,

Thanks for the additional details!

Could you let me know how often you see runs where no chunks are returned? Could you also share a Loom showing the steps you’re taking when the Query Data Source blocks don’t return any chunks?

Fjmtrigo · January 15, 2026, 2:30pm

Hi Alex,

Thanks for following up. Unfortunately, it’s not feasible to constantly monitor the agent. However, whenever I notice that there are no responses, I check it, and it’s always due to the data source not returning results. This can happen multiple times a day, as it has already occurred several times this week.

Could you clarify what you mean by recording a video in LOOM, doing what?

Regards,

Fernando Trigo

Alex_MindStudio · January 21, 2026, 12:42pm

Hi @Fjmtrigo,

I’ve been trying to reproduce this on our end, which is why I asked for a Loom. I’d like to see the exact steps you’re taking in case I’m testing it a different way.

How often do you see the Query Data Source block return no results? I can see there was a temporary disruption on January 12 to 13, but I’m trying to understand whether this could be a more consistent issue.

If you’re seeing this regularly, could you share the debugger log links for those runs? That should help us narrow down what’s causing it.

Fjmtrigo · January 22, 2026, 4:06pm

hi @Alex_MindStudio ,

I´ve alread shared with you some logs: How to ensure consistent data retrieval in my Data Source workflow - #3 by Fjmtrigo

Do you want new ones, is that it?

thanks

fernando

Alex_MindStudio · January 22, 2026, 4:12pm

Hi @Fjmtrigo,

I’m trying to figure out whether this was a one-off glitch or something that’s happening more regularly.

Do you have runs where the Query Data Source blocks don’t return any results on a recurring basis? If so, could you share the debugger logs for any runs that happened after January 13? That would really help us pinpoint the cause of this behavior.

Thanks in advance!

Fjmtrigo · January 22, 2026, 5:28pm

hi @Alex_MindStudio ,

In fact, after January 13, every logs returned results, but I still have inconsistencies when I ask the same question and get different results. Could it be that it blocks the data source, or do I need to go there and “force” the data source every day?

Today, the same question was asked twice again. The first time, there was no correct response (empty), and the second time, the response was correct, asked by the same user, a few minutes later:

Log with “empty” result, or without knowing how to respond when the answer is in the data source: https://app.mindstudio.ai/share/debugger/cc2d6a5d-24e0-48f9-a3ab-bef7c7408102/2bfc8592-1b36-425b-a874-1f71f9fb34a8

Log with the correct response: https://app.mindstudio.ai/share/debugger/cc2d6a5d-24e0-48f9-a3ab-bef7c7408102/0617f431-34a6-402c-979a-6b096aca4c87

What might need to be done to make the results consistent? Can you help?

Thank you,
Fernando

Alex_MindStudio · January 23, 2026, 1:58pm

Hi @Fjmtrigo,

Glad to hear Query Data Source is pulling chunks consistently again. If you see any “empty” outputs from the Query Data Source blocks again, please let us know so we can take a closer look.

Now, onto your question:

Your Agent is set up like this:

Generate Text: creates 3 queries from the user’s question
3 Query Data Source blocks: retrieve the closest chunks
Generate Text: creates the final answer from those chunks

Let’s compare both runs:

Steps:	“empty” result run	correct response run	Comment
Refine Query	0: “replace a ned” 1: “what does replace a ned mean” 2: “ned replacement process”	0: “who can replace a need” 1: “how to replace a need” 2: “need replacement process”	The refined queries are close enough
Query Data Source blocks	https://app.mindstudio.ai/share/debugger/cc2d6a5d-24e0-48f9-a3ab-bef7c7408102/2bfc8592-1b36-425b-a874-1f71f9fb34a8	https://app.mindstudio.ai/share/debugger/cc2d6a5d-24e0-48f9-a3ab-bef7c7408102/0617f431-34a6-402c-979a-6b096aca4c87	The blocks returned similar chunks
Generate Text - Final Answer	I don’t have specific information on that topic. For assistance, please email our support team with your detailed question. They’ll provide you with the information you need.	Any user can replace a training need that was contributed by them until the need is approved or denied by an approver. Permission depends on your approval workflow: 1. When Workflow Hierarchical Approval is enabled – Users from higher levels can always replace contributions from lower levels. 2. When Workflow Hierarchical Approval is disabled – You can only replace a need if you were the contributor or if you did an override of needs from contributors at levels lower than your own, during their contribution period. Important restriction: If the configuration “Managers prevented to Replace or Delete end-user requested needs” is active in Task Plans Global Configurations, the replace button will not be available to managerial contributors (all levels) for needs assigned by end-users. When replacing a need, you must select a course using the Search Course Functionality and press the Replace Assignment button to complete the action. In task plans with budget configuration active, the budget is also checked when replacing a need.	Claude 4.5 Haiku generated different outputs from similar inputs

Based on this, it looks like the inconsistency is coming from Claude 4.5 Haiku in the final Generate Text step.

I can’t say exactly why the model responded differently, but in practice, this can happen when you’re sending a lot of text into a model designed for speed/cost. It can sometimes miss what it needs and fall back to a generic answer, even when the relevant chunks are present. While the model itself is pretty good, it doesn’t have reasoning capabilities, which seem to be required for your use case.

Here’s what I’d recommend:

Test models with reasoning capabilities for the final Generate Text block. The Profiler feature should help with the side-by-side comparison
Lower the temperature for more consistent responses
Reinforce and/or shorten the prompt to minimize any ambiguity

Hope this helps!

Fjmtrigo · January 23, 2026, 9:57pm

Hi @Alex_MindStudio ,

thanks a lot, will give it a try.

best

fernando

Fjmtrigo · January 25, 2026, 5:31pm

hi @Alex_MindStudio ,

Tried to do that and the answer seems to be more consistent, the only question is that selecting in the gen text block the reasoning model GPT 5.1…

… it takes more time to answer on the frontend side, since this is made via API.

The end user needs to wait near 20s to get an simple answer here:

Any clue on how can we improve the response performance of the BOT, which currently has an average response latency of 20 seconds? Do you think we can somehow reduce this to 5-6 seconds?

What best practices would you recommend?

thanks

fernando

Alex_MindStudio · January 26, 2026, 5:14pm

Hi @Fjmtrigo,

Reasoning models do tend to take longer to respond. I’d recommend testing a few different models to find the best balance between speed and output quality for your use case.

Fjmtrigo · January 26, 2026, 6:42pm

Hi @Alex_MindStudio ,

The best way to find the best balance between speed and output quality is using the “Profiler”?

I ask this because every time i test it, it doesn’t give me an answer connected to the agent context, it looses the agent context.

What is the best way to find the best balance between speed and output quality inside mindstudio, but with the right context, to know if the answers are accurate and in real time?

thanks

fernando

Alex_MindStudio · January 26, 2026, 7:01pm

Hi @Fjmtrigo,

You can paste the exact prompt that was sent to the Generate Text block when it returned an incomplete response.

Profiler doesn’t have access to Data Sources on its own, just like Generate Text blocks don’t until you add Query Data Source blocks and reference those variables in the prompt.

When a prompt is sent to the model, all variables are resolved and the full text is passed to the selected LLM. You can replicate that in Profiler by copying the prompt with resolved variables from the Debugger and pasting it there:

Fjmtrigo · March 18, 2026, 10:21am

Hi @Alex_MindStudio ,

Hope to find you well!

There’s still an issue of performance, the end user needs to wait near 20s to get an simple answer here:

I’ve compared all the LLMs in the profiler and the one that answers faster with real good accuracy is the claude sonnet 4.5, and it’s real fine.

The issue of performance and time consuming is not in this llm that generates the text in the “gen text block”

Most of the time is spent on the three query data sources running in parallel; they take a long time to run and process the information. as you can see in the image below for the first 2 data source query:

Here is a log link example: Dual Approval Workflow Explained - Debugger Logs | MindStudio

Any clue on how can we improve the response performance of the BOT, which currently has an average response latency of 20 seconds? Do you think we can somehow reduce this to 5-6 seconds?

What best practices would you recommend?

thanks

fernando

Alex_MindStudio · March 18, 2026, 2:12pm

Hi @Fjmtrigo,

Glad to hear that Sonnet 4.5 is working well for you!

Unfortunately, I don’t think 5-6 seconds is achievable here. The latency is coming from the Query Data Source blocks themselves. The search just takes time, and running the blocks in parallel is already the best-case scenario for that setup. There isn’t really a way to speed that part up without removing the Query Data Source blocks, which would defeat the purpose.

The 20-second response time is essentially what this workflow takes to run. I know that’s not what you were hoping to hear, but there’s nothing else you can adjust to speed up these Query Data Source blocks.

Fjmtrigo · March 19, 2026, 5:23pm

Thanks @Alex_MindStudio that’s bad news; it’s frustrating for users to have to wait so long for a response.

Tell me something: if we were to somehow make use of these 1M-token context windows that some LLMs now have, could that solve this problem in some way? We’d stop using the “RAG” method here, but if it solves the performance issue, that would be great!

thanks

fernando

Alex_MindStudio · March 20, 2026, 2:35pm

Hi @Fjmtrigo,

This is definitely one way to approach it, and it should bring the latency down. Instead of using Query Data Source blocks, you can reference the access snippet variable to the full document content directly into the Generate Text block’s prompt and let the model work from there.

Access snippet to a file: {{dataSource “Data_Source_Name” “File_Name”}}

However, please keep in mind that the more data you have in the prompt, the more the LLM provider will charge per run. So it’s a trade-off between speed and usage costs.

Topic		Replies	Views
Help! Query Data Source Not Returning Relevant Chunks Support	9	43	October 23, 2025
Why not just use the Chat block? Support	3	54	December 19, 2025
How to provide good answers beyond my data sources Support	3	27	December 19, 2025
My data sources are being 'hibernated' Community Discussion	1	25	October 30, 2025
5G Data source chat bot agent Support	1	28	August 29, 2025

How to ensure consistent data retrieval in my Data Source workflow

Related topics