I’m building a chat bot to assist users with navigating a large database of primarily product documentation. The underlying data itself is a mix of PDFs and webpages.
I’ve been able to set the PDFs as data sources easily enough and I’ve got a scrape URL block, but I’m not sure the best way to scale this up especially to a lot of URLs. If it’s helpful most documents are very topic based (Any query with ATA100 very clearly goes to one set of documents, whereas ATA200 definitely goes somewhere else).
Thank you!