UNITIZATION: The Process of Separating Logical Boundaries from Physical Boundaries

In the previous issue, I wrote about scanning as most cases still include a paper component; lots of it in some cases. As these two subjects (scanning and unitization) intertwine so closely it would be a good idea to tuck these guidelines on unitization away for future use with last issue’s article on scanning.

The purpose of unitization (logical document determination or LDD) is to locate and identify where each logical, self-standing document starts and ends.  Self-standing documents include letters, memos, reports, books, photographs, drawings, graphs, charts, or other compilations.  A self-standing individual document is determined by looking at its format and bibliographic information.  Bibliographic information refers to items that reside outside of the text (main body) of a document and include one or more of the following:

 Author:                    Person and /or Organization that produced or approved the document

Recipient:                Person and /or Organization that received the document

Copyee:                   Person and /or Organization that received a copy of the document

Title:                         Heading, subject or RE line of the document

Date:                        Date the document was originally produced  

In a typical paper collection, documents are found stapled, clipped or otherwise bound together.  For example, a fax cover, a letter and a report may be stapled together, but each is clearly an individual document considering each one has its own unique format and associated bibliographic data. The same principles apply to a group of images that have not been split into unique documents. It is important to get the logical documents determined for later use in databases, depositions, and trial.

Each paper or imaged collection is reviewed to find the beginning point and the ending point of each individual document.  A letter begins with the salutation and ends with the signature.  A report begins at the title page and includes all pages following thereafter in consecutive order including indexes, until the end of the report or last page included is located.

 Physical Attachments:

Physical attachments to a document should be considered a separate document from the parent (lead) document, if the document can stand on its own.  For example, a fax cover, stapled with a letter and report.  Each document can stand on its own, i.e., has its own bibliographic data that may be different from the attachment. You end up with three distinct logical documents for later coding purposes.

For example, you have a stapled group consisting of a fax cover, followed by a letter, followed by a report.  If you code them together, you may only get the information from the first document. If you code the information from all three, it can become confusing and laborious at search time.  If you use a vendor, they will code only the information from the first document as they cannot code information from all three for the price of one.  In the above case, if the fax is from Dick to Jane, the letter from Jack to Jill, and the report is about falling down the hill, then the information from the second and third document is lost and will not be found in a search.

 EXCEPTIONS:     Attachments to a contract are usually not separated (especially exhibits) and you should include appendices. In the case of Exhibits – these are typically kept together though they may include more than one document.  Data capture is typically performed on only the first document.  Addendums can be coded separate or together depending on the needs of the legal team.

Multiple Documents on One Page:

Some pages may have more than one document to a page. (For example: a photocopy of three different checks appears on one page.  Each check is considered a document, unless otherwise specified).  Each check may have the same author, but likely has different dates and payees (recipients).

Multiple Documents as One Document:

If a document has page numbers running consecutively, it is an excellent indication that all of the pages are part of the same document, even if the document contains other imbedded documents that would otherwise stand alone based on bibliographic information. This is particularly true with faxes where the fax line clearly shows the number of pages. Be careful as not all the pages may be there.

Duplicate Documents:

Duplicate documents are separate documents even if they follow each other in a group of pages.  (Documents that look alike may have minor differences in information that will affect the review or coding process).  In today’s world, exact copies of electronic duplicates may be deleted by matching their Hash Values.  Paper documents on the other hand can only be noted as potential duplicates from coded information or software programs that look at the document format fingerprint.  In the software case, you will get a percentage of confidence that a document is duplicative of another and have to hand compare.  Many times these end up being near duplicates.  None-the-less it is a technique that can help the reviewer determine whether scanned images are a match.

Distribution Lists:

Distribution lists should not be separated from the document they distribute.

Table of Contents:

A table of contents, when combined with the sections that follow make up one document.  The trailing sections or documents that follow the table should be examined as to whether the documents/sections that follow the table are actually part of the table of contents.  If the pages are consecutively numbered, that is an excellent indication it is one document.  If the pages are not consecutively numbered, the contents of the trailing documents must be examined to see if they logically match the subjects or topics listed in the table, and if so, consider it one document.


An appendix or an index is not a table of contents. In most cases, a list of documents to follow is not enough of an indicator to group all the documents on the list as one document.

 A Word of Caution:         Care must be taken when reviewing reports, patents and other documents where the pages are typically text followed by drawings, pictures, or graphic representations then back to text.  These documents are not typically split up just because the format changes.  Pagination is a good way to determine the beginning and ending of these types of documents.

