Using/architecting data sources

I’m building a chat bot to assist users with navigating a large database of primarily product documentation. The underlying data itself is a mix of PDFs and webpages.

I’ve been able to set the PDFs as data sources easily enough and I’ve got a scrape URL block, but I’m not sure the best way to scale this up especially to a lot of URLs. If it’s helpful most documents are very topic based (Any query with ATA100 very clearly goes to one set of documents, whereas ATA200 definitely goes somewhere else).

Thank you!

Hi @rdrake,

Here’s a setup that works well for topic-based documentation:

  1. User enters their query
  2. Router block matches it to the closest Data Source for that topic
  3. Generate Text block analyzes the query and produces three similar variations to pull in more related content
  4. Those queries get passed to the Query Data Source blocks

We also have a template for this setup:
https://app.mindstudio.ai/agents/sample-data-sources-agent-38f31065/remix

Let me know what you think!