xml metadata file extraction along with scanned document

  • Last Post 12 January 2021
Viktoras Šiurgotas posted this 07 January 2021


We have a situation where scanned documents comes to watch folder from another system. We receiving two files - one original scanned document and second - xml file containing metadata about this scanned document. Metadata contains information like user name, user email, department, etc. xml metadata file structure:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <scan version="1.0"> <accountName>Finance</accountName> <date>2018-09-15 09:12:16</date> <deviceName>device\Sharp_room01</deviceName> <fields> <field> <label>To</label> <value>finance@example.org</value> </field> <field> <label>Subject</label> <value>Your scan (Scan to my email)</value> </field> <field> <label>Filename</label> <value>scan_t_2018-09-15</value> </field> </fields> <files> <file>scan_t_2018-09-15_1.pdf</file> <file>scan_t_2018-09-15_2.pdf</file> </files> <jobId>d5763b8a-3639-409b-99ca-c6e7f701b77e</jobId> <name>Scan to my email</name> <settings> <fileType>DOCX</fileType> <ocrEnabled>true</ocrEnabled> </settings> <type>email</type> <user> <department>Development</department> <office>Arizona</office> <email>joe@example.org</email> <groups> <group>Finance users</group> </groups> <name>joe_downey</name> </user> </scan>


can you please advice how to pull metadata like username/user email from this xml file and reuse it further on Scanshare workflow as variables? Both -scanned  original document and xml metadata file comes to watch folder with the same file names. Attaching xml file example. Thank you in advance for advices.

Attached Files

luca.scarpati posted this 12 January 2021

Hi Viktoras,


everything is feasible in Scanshare you simply need to create a script that reads it (using the variable originalfilenamewithouextension of the input pdf and changing the extension to xml) when the input pdf is processed.

Once this is done, you can delete the xml once the workflow is finished and all the variables that interest you are created.

So your workflow will look like this: WatchFolder -> Script Connector -> WFS (for example)


For read variable you can see this post into our Sample & Materials section: Read variables from a source XML


Just for info we always suggest to use our embedded clients (if possible) so as to automatically have all this information (in your case within the XML) directly populated in the Scanshare variables.


Best regards,