While working at my job, I also created an Export Sorter. When LAW (document processing software/DB) exported your dataset, it saved to folders LOADFILES, IMAGES, NATIVES, OCR in the main VOLUME folder of the case. For example, let’s assume the case we are currently working on is named “Vanguard Industries.” The client needs us to process the data and make it available in a Relativity DB for review and ease-of-access.
Processing Documents
After determining the dataset, we would use LAW to process the data, which includes ingesting the data into the LAW, where it extracts all the document metadata and saves it to a LAW DB usually named after the casename. After ingesting, we would “tiff” the documents, which means create images (.png/.jpeg) for each document. These are the images exported into the IMAGES folder.
We would then OCR the documents using the images we created. OCR stands for Optical Character Recognition — which basically uses AI (Artificial Intelligence) to determine what the document actually says, and creates a text representation of each document in .txt format. These files are generally exported into a folder named TEXT or OCR.
NATIVES are essentially just copies of the document itself in its original format (.xls, .doc, .pdf, etc), and are exported into the NATIVES folder. The TEXT, IMAGES and NATIVES are linked based on a DOCID we assign the documents, therefore a primary key. A .dat and .opt file is also exported with the data. The .dat file generally contains metadata, text location, as well as native location of each document. The .opt file contains the image location for each document. The .dat and .opt files are used to load the data into a Relativity Database.
A final LAW export looks like this if we were exporting to a volume we named VOL001 in LAW:
VANGUARD\
VOL001
NATIVES
IMAGES
OCR
VOL001.dat
VOL001.opt
However the Relativity DB workspaces were represented as such in the backend and all the data from LAW needs to be placed in the appropriate section in order for Relativity to load correctly:
VANGUARD\
LOADFILES FOLDER
VOL001
NATIVES FOLDER
VOL001
IMAGES FOLDER
VOL001
OCR FOLDER
VOL001
The Problem
Since LAW exported everything into the VOL001 folder, all the other folders exported for the volume would need to be moved into their corresponding Relativity folder (NATIVES, IMAGES, OCR) before importing. This meant manually restructuring the LAW Export to be compatible with a Relativity Import, which was fairly time-consuming.
The Solution
The .HTA application I created, exported everything into the LOADFILES folder of the current case, then proceed to move all the folders to their respective NATIVES, IMAGES & OCR folder. The application originally started as a PowerShell script (.ps1 file) that could do everything, however since I wanted to allow others to use it, I created a very easy to use GUI. Since I was limited to using only available computer languages on my work computer, I researched ways I could create a simple GUI. I could have used a PowerShell GUI, however I decided to go the .HTA route, having designed multiple websites, I felt this choice to be more appealing to me as well as give me an opportunity to learn more about .HTA (html applications).
Apparently, I learned that it’s a fairly old technique. Microsoft’s definition of a .HTA file is:
HTML Applications (HTAs) are full-fledged applications. These applications are trusted and display only the menus, icons, toolbars, and title information that the Web developer creates. In short, HTAs pack all the power of Windows Internet Explorer—its object model, performance, rendering power, protocol support, and channel–download technology—without enforcing the strict security model and user interface of the browser. HTAs can be created using the HTML and Dynamic HTML (DHTML) that you already know.
In the GUI HTA application, the user would enter the path of the LAW Export including the volume name, and select “Parse Path”. Parse path would split the path provided, and display the location of the workspace, the Volume Name, and location of the .dat and .opt files where they currently reside, which it will have to manipulate. After verifying that all the locations are correct (and they should be if the exported files were not manipulated in anyway after the LAW export), the user can select “Run Script” button.
The LAW Export Sorter application would take care of all the user steps and have the data load ready for Relativity. Afterwards, it generated a status or progress report — which included all locations, file counts for natives, text, images, and folder sizes (which was useful when completing the task and filling out data for a task-completion worksheet). The user no longer had to go to each individual folder and right-click and select properties to get all the info or have to open up LAW reports — both things which take time with large amounts of EDD data.
IMAGES Contained .tif and .jpg files -> Law Export Sorter would move all images under IMAGES\VOL001\ folder and update the .opt to their new absolute locations on the HDD or Network.
NATIVES Contained the actual files extracted from the emails (pdf files, Excel files, Word files, etc.). Law Export Sorter moves all the natives to the NATIVES\VOL001 folder and update the .dat with their new location.
OCR Contained actual text from the extracted files -> LAW Export Sorter would move all the TEXT into the OCR\VOL001 folder and update the .dat with their location as well.
Basically, LAW Export Sorter also opened up the .dat and .opt and manipulated the text in the files, to represent the actual location. In the end, everything has been moved from the Vanguard\Loadfiles\VOL001 folder to exactly where they need to be for Relativity.
Conclusion
Lets say for the example case “Vanguard,” When ran, LAW exported all the folders (Natives, Images, OCR, and Loadfiles) under the “Vanguard/LOADFILES” folder. Initially it was up to the user to rearrange them and manipulate the .opt and .dat files manually, after moving each folder to their respective destination. These steps can be very time-consuming and tedious for workers, therefore I created a the LAW Export Sorter to take care of all of this. This saved my co-workers a lot of time by automating the process and ultimately creating more time for the workers to work on other jobs.
I will update this with the code and explanation of how the code worked.
0 Comments