I have an odd problem with Data source. I uploaded a full PDF technical manual and It seems that it cant focus on the query and give irrelevant response. I tried even a small Word document with one page but when I asked for “Print ribbon problem” (Error 109) and got the entire page. See image below. Data source id - 9707ce80-17af-4e6b-b283-892abffaf9ed
How Data Sources work:
When you upload a file to the Data Sources, the text gets extracted and split into chunks. Short documents can end up as a single chunk, so when your query matches it, the whole page comes back.
The Query Data Source block only retrieves chunks and it doesn’t generate an answer on its own. It’s best to pair it with a Generate Text block and prompt the AI to answer the user’s specific question using only the retrieved content. Your workflow would then look like this:
User enters a query
Query Data Source block retrieves the most relevant chunks
Generate Text block uses those chunks as context to answer the user’s question
The idea behind Data Sources is to find the most relevant pieces of text from your own files so the LLM replies are based on your data, rather than its own knowledge. You can learn more about Data Sources here:
Data Sources only extract text from uploaded files, so figures and images in your manuals won’t be included in the results.
The Query Data Source block uses semantic search, which means it looks for content that’s similar in meaning to your query, rather than scanning for every mention of a specific word or phrase. Asking for “all error messages” won’t pull every instance the way a keyword search would.
Could you share a bit more about what a typical user query would look like? Would a user enter an error code to have the AI walk them through troubleshooting steps from the manual?
Following your reply is there a way to get images as well?
actually I am just playing and right now Gemini NotebookLM solves that tissue perfect when I ask him to list all codes he does it. Actually I don’t now if that a relevant Q. for now I am going to stick to Gemini.