OCR Block Flexibility & Enhancement

We have a need for multiple choices when using OCR within a workflow.

We have a client who is a staffing firm based in California. They receive timesheets often as pictures of handwritten documents. Our desire is to create multiple paths using more than one OCR model, compare them and only send exceptions for a human in the loop approval if the OCR results don’t agree with each other.

Since this is for payroll, it is imperative that the results are accurate.

I would love to be able to ground the model with a prompt in the block and specify which OCR model to use in the OCR block.

Although, upon looking in the debugger, it looks like this is a packaged workflow.

You can use the Analyze Image block to perform OCR with whichever models you would like. The ORC block is simply a handy packaged workflow that uses Analyze Image + Generate Text (to clean up formatting/inconsistencies and validate the output).

Thanks Sean, this is a game changer for our agent design for this client!!

It would be great to get a copy of that workflow to see how you experts cleaned up inconsistencies and created validations.

That way we can duplicate your best practices!

Nothing complex! It’s OCRImage.flow in here: https://app.mindstudio.ai/agents/packaged-function-standard-library-f4ff0e9a/remix

1 Like