Extract Information Block: URLs are being auto-validated and break the flow

AliAlbarran · April 13, 2025, 4:29am

I’m building a flow in which the user uploads 2 or 3 PDF files and asks a question that should be answered by analysing and linking the content of all the documents.
Everything works well except that the `Extract Information from Document` block automatically detects and attempts to validate URLs or DOIs found in the text, which either breaks the flow or prevents the output from generating correctly.
Even when the prompt explicitly instructs the model not to follow or use links, the system still tries to process them.

How can this be avoided?

Is there a way to disable the automatic URL/DOI validation?
Is this behaviour coming from the Extract Information block or the base model?
Thanks for any help you can offer.

jerry-mindstudio · April 14, 2025, 1:22am

Can you post screens of the debugger where it breaks? Can you post the files you are uploading along with a public use link? This will help us test.

Thanks,

AliAlbarran · April 14, 2025, 2:25am

Thanks for the reply, Jerry. Here is the screenshots. I only can post 1 image at the time… and i have a video about the agent working and crashing. The materials are on my puter. May I send you the materials for mail?

jerry-mindstudio · April 14, 2025, 3:40pm

Are you referring to the Extract Text from File block? Can you post a PDF that errors out for you? I can test if so.

AliAlbarran · April 14, 2025, 4:27pm

yes, and as the images shows the extract block doesn’t read it as a plain text, converts everything into links…
here’s one pdf: supportingCognition.pdf - Google Drive

jerry-mindstudio · April 14, 2025, 8:17pm

I think I know what’s going on. For file uploads, you can set the input to extract text OR return the URL of the file. If you set the input to extract text (which it looks like you have), then you don’t need the ‘Extract Text from File’ block because you already have the text. You are most likely extracting text from file and then passing text (not URL) into ‘Extract Text from File’ which is why it thinks every row of text is an invalid URL.

AliAlbarran · April 14, 2025, 9:54pm

i’ll try, thanks alot.

Topic		Replies	Views
OCR Image Block - Image Input Requirements & Security Considerations Support	5	21	July 2, 2025
Can not extract text from a pdf Bug Reports	6	55	June 5, 2025
API - How is an uploaded file on my web app communicated to mindstudio Support	3	13	July 7, 2025
Workflow end Email pdf url formatting issue Bug Reports	1	8	June 2, 2025
Revise Document End Block causing agent falure. Support	7	29	April 17, 2025

Extract Information Block: URLs are being auto-validated and break the flow

Related topics