Extract Information Block: URLs are being auto-validated and break the flow

I’m building a flow in which the user uploads 2 or 3 PDF files and asks a question that should be answered by analysing and linking the content of all the documents.
Everything works well except that the Extract Information from Document block automatically detects and attempts to validate URLs or DOIs found in the text, which either breaks the flow or prevents the output from generating correctly.
Even when the prompt explicitly instructs the model not to follow or use links, the system still tries to process them.

How can this be avoided?

  • Is there a way to disable the automatic URL/DOI validation?
  • Is this behaviour coming from the Extract Information block or the base model?
    Thanks for any help you can offer.

Can you post screens of the debugger where it breaks? Can you post the files you are uploading along with a public use link? This will help us test.

Thanks,

Thanks for the reply, Jerry. Here is the screenshots. I only can post 1 image at the time… and i have a video about the agent working and crashing. The materials are on my puter. May I send you the materials for mail?

Are you referring to the Extract Text from File block? Can you post a PDF that errors out for you? I can test if so.

yes, and as the images shows the extract block doesn’t read it as a plain text, converts everything into links…
here’s one pdf: supportingCognition.pdf - Google Drive

I think I know what’s going on. For file uploads, you can set the input to extract text OR return the URL of the file. If you set the input to extract text (which it looks like you have), then you don’t need the ‘Extract Text from File’ block because you already have the text. You are most likely extracting text from file and then passing text (not URL) into ‘Extract Text from File’ which is why it thinks every row of text is an invalid URL.

i’ll try, thanks alot.