Hi MindStudio Experts,
TL;DR
My RAG implementation with the query data source block isn’t finding the right content for user questions despite including document details names. Need help creating an effective query template.
Setup
-
Data source “Training Plans Guides and Documentation” with 11 documents
-
Each document has proper name, summary, and access snippet configured
-
Example document:
-
Name: Training_Plans_Comprehensive_User_and_Administration_Guide.pdf
-
Access snippet: {{dataSource “Training Plans Guides and Documentation” “Training_Plans_Comprehensive_User_and_Administration_Guide.pdf”}}
What I’ve Tried
-
Including document names:
Find information in Training_Plans_Comprehensive_User_and_Administration_Guide.pdf about "{{ISSUE}}"
-
Referencing specific sections:
Find content in section 3.2.1 about "{ISSUE}"
-
Using detailed document references in the query template
Still getting irrelevant chunks for questions like “Where can I see if a user has completed a training?” when the answer clearly exists in the knowledge base.
Questions
-
What’s the most effective way to reference specific documents in the query template?
-
Can access snippets variables be used effectively within query templates?
-
Is there a specific syntax or approach that consistently produces more accurate results?
Looking for proven query template strategies that effectively use document details to improve retrieval accuracy!
Thanks in advance!
fernando
Hi @Alex_MindStudio, do you think you can help me here?
Thanks a lot!
fernando
Hi @Fjmtrigo,
Thanks for the detailed post!
Querying data sources works as a semantic search across your uploaded content. Instead of looking for exact matches like filenames or section numbers, it searches for text with a similar meaning to the user’s query and then returns the most relevant chunks.
If you’d like to ask the AI specific questions in Chat, it might be best to provide it with the full text of the files in your Data Sources. You can do this by referencing Access Snippets in the System Prompt of your Workflow. Keep in mind that this will significantly increase token consumption since the entire file text will be included in the prompt.
Alternatively, you can use the Logic block where you provide it with Index Snippets (file summaries) to let the AI decide which document fits best, then branch out to Query Data Source blocks to retrieve the most relevant chunks from different Data Sources or to Generate Text blocks if you want it to use the full document text
1 Like
hi @Alex_MindStudio ,
Thank you for your detailed response!
These options will certainly help make the AI answer more accurately.
While including full text files or using Logic blocks with Index Snippets would improve accuracy, I’m concerned about scaling this solution. Our users can ask hundreds of very specific questions in Chat, and it would be challenging to anticipate and create specific implementations for all possible scenarios.
Is there perhaps a more scalable approach that could maintain accuracy while handling a wide variety of unpredictable user questions? Maybe a way to optimize the semantic search itself or a hybrid approach that balances token consumption with retrieval quality?
Thanks again for your insights.
fernando
Hi @Fjmtrigo,
In this case, using Dynamic Tools in the Chat block could help you build a more scalable setup. You can read more about how it works here:
Hi @Alex_MindStudio,
The question here is that i cannot do that in the chat block, since i need to have at the end of the workflow an end block, because i’m connecting this agent through API.
This is the Workflow:
This is the user interface:
Thanks
fernando
Hi @Fjmtrigo,
Thanks for clarifying this!
In that case, you can use a Generate Text block to refine the query or create multiple queries, then pass those to one or more Query Data Source blocks to retrieve all relevant chunks from your Data Sources.
1 Like
Thanks @Alex_MindStudio,
I don’t see how to do that!
I converted all the files into a single 360-page PDF file with lots of text and some images, since they are manuals/technical guides that explain how to set up and configure all the features of a portal/application that we have and sell to our customers.
The goal is to have this chatbot that will help answer questions from admin users of that portal/application.
In other words, now my data source has only one file with 360 pages and 141 chunks, to see if that way I can get more accurate answers.
The maximum number of chunk results I entered was 5, and the prompt I entered in the query template was this:
“Search for the most relevant content from any document in the ”ISQe Training Plans Guides and Documentation“ data source for information about the user question ”{{issue}}", ensure the retrieved information directly answers the question
To answer this in the most accurate way, you will search all the files inside the data source “ISQe Training Plans Guides and Documentation” and return the best and accurate data you will find about “{{issue}}.”
Do you think that if the file type is different from pdf, the query prompt is different, and/or the maximum number of chunks is different, I can get better results?
Thanks
fernando
Hi @Fjmtrigo,
Let’s go over this in more detail.
1.How does the Query Data Source block work?
When you upload a file to Data Sources, MindStudio automatically converts the text into numerical representations called vectors. This allows the system to perform semantic search, so it looks for text with a similar meaning to your query rather than following instructions like a chat model.
When your Agent uses a Query Data Source block, it’s not reading or following your instructions like a normal AI model. It’s searching your uploaded data for the most semantically related chunks. So, if your query includes step-by-step instructions, they may confuse the search and lead to poor matches.
2.Refining queries
If passing the user’s raw query isn’t returning what you expect, and you’re calling the Agent via API, try this approach:
- Use a Generate Text block to turn the user’s message into one or more retrieval-friendly queries
- Send those queries to one or more Query Data Source blocks to pull the most relevant chunks
Here’s a remix link to a sample agent:
https://app.mindstudio.ai/agents/fjmtrigo–refining-queries-fe9b353d/remix
1 Like