Memory | Database

Todd · May 2, 2025, 2:40pm

Presently, I have only been building RAG databases using pdf files. Now I want to try and build a chatbot that uses a companies website information as the database as opposed to asking the company to give me their PDFs for me to create the database. Since there does not appear to be a specific block for this, I was thinking to use a scrape block and capture that as a huge PDF file and then manually myself set up the RAG database. Is there a smarter way of going about doing this? Would be neat if we had the ability to store whatever we scrape as preserved database information…or is that already possible?

reactorcore · May 2, 2025, 3:22pm

Hmm, it sounds like it would be best split into 2 agents, one that creates the database PDF from scraped data and the other one that has the PDF as a Data Source in MS that uses the Retrieve Data Source block.

That said, when it comes to scraping tons of stuff, maybe something like https://www.hyperbrowser.ai/ might be useful here.

Alternatively have Claude create a webscraper using Python that specifically scrapes exactly what you need from a site. Claude will guide you through it so it’ll be walk in the park even if you don’t know a line of python code.

Topic		Replies	Views
User adding files to database Support	0	4	April 19, 2025
Temporary/Permanent User Uploaded Databases Feature Requests	0	13	May 4, 2025
Query information sources Support	1	9	May 5, 2025
Database Builder Feature Requests	0	9	May 23, 2025
Anyone created a lead scraper? Community Discussion	1	27	May 4, 2025

Memory | Database

Related topics