I wanted to share a solution I have for a situation where I need to OCR and store many thousands of multi-page PDF documents into SharePoint with the stored location being defined by subfolders based upon variables captured in the OCR process.
I welcome any discussion around better methods to handle this issue, I must point out I am a beginner with this system so I may have overlooked something completely obvious!
My problem was in the way that the Workflow process will handle a watchfolder containing multiple files. Specifically that the 'first workflow' will lockout any other workflows from polling until it has completed however many files appeared in the watchfolder.
My staff will scan a PDF to the watchfolder with a multi function centre, but if I have dropped 1000 files into the watchfolder from my backlog archive the users PDF will not be collected and processed for days (potentially).
My solution is to use a simple external powershell script on the server that hosts scanshare to ‘drip feed’ historical files into the watch folder.
The way the script functions is to check the contents of the scanshare watchfolder, if there are no files within it will grab only 1 file from the pool of 20000 waiting elsewhere and move it in.
If a user scans a new deal in the meantime that will be added as a second file in the drop location and be dealt with sooner.
The drip feeder will not add anything else into the watch folder until it sees that the watch folder is empty again.
This ensures the server always has something to work on, but prioritises any user submitted pdfs over the history!
The Powershell (.ps1) code below is set to run every minute via the Windows Task Scheduler.
# DripFeeder
# This script will ensure the destination watch folder always has at least 1 file if available in the backlog
# Author: Cam Titley 14/07/2022
$backlog_folder = "C:\ScanShare Working Folder\Sort Wait\"
$watch_folder = "C:\ScanShare Working Folder\Sort Drop\"
$nextfile = Get-ChildItem -Path $backlog_folder -Force -Recurse -File | Select-Object -First 1 | %{$_.FullName}
$watch_folder_count = ( Get-ChildItem $watch_folder | Measure-Object ).Count
"Watchfolder Count: $watch_folder_count"
"Next File: $nextfile"
if (Test-Path -path $watch_folder) {
if( $watch_folder_count -lt 1) {
Move-Item -Path $nextfile -Destination $watch_folder
}
}