Dark Data Task Force Report: Identification and Remediation of Dark Data in Law Firms
This report was created for law firm information governance (IG) management and law firm executives who are
beginning to recognize, or are in fact currently dealing with the problems associated with dark data. The report
logically begins with an industry (and Symposium) definition of dark data and explains where dark data can be
found in law firms.
The report then moves into a detailed explanation of the problems/issues/challenges law firms face with the
unmanaged accumulation of dark data, including rising storage costs, reduced employee productivity and increasing
risk around client data leakage.
The report then explores the various tools that can help bring this dark data phenomenon under control. The
tools discussed include the creation of polices and workflows to address dark data now and into the future, as well
as emerging technologies such as File Analysis Software (FAS) and Data Loss Prevention (DLP) software that can
automate the identification, classification, management and disposition of dark data.
Finally this report offers a three-pronged approach for managing dark data in law firms.
“Olly olly oxen free! Come out, come out wherever you are.“ Finding dark data is often like playing the childhood
game of hide and seek. The information governance (IG) professional, playing the role of the “seeker“ or “it,“ is on
a mission to uncover data that has been hiding in various firm repositories for many years. The seeker “tags“ objects
as they are found and continues with this strategy until all dark data has been flushed from its hiding places.
“Ten, nine, eight, seven, six, five, four, three, two, one! Ready or not, here I come!“ cries the IG professional as she
or he begins the hunt for dark data. This dark data report provides practical guidance for law firms who may be
searching for dark data, now or in the near future. The report is a compilation of information obtained through a
survey of Symposium participants, industry research and group member work experience. A framework is provided
for evaluating the law firm dark data posture. In addition, recommended strategies are offered as a way to justify an
evaluation and eventual analysis of dark data. The strategies considered include uncovering dark data, continuing
management and cost-benefit and risk assessment. Law firm IG professionals may find this information useful as
they explore and settle on an appropriate methodology for their firms.
What Is Dark Data?
There are many characterizations of dark data. In July 2014, the LegalTech® West Coast panel “Recover or Delete
Dark Data“ defined dark data as enterprise data that is predominately uncategorized, has limited visibility to the
organization (if not completely obscured) and because of its obscurity, serves no apparent business purpose.
Law firms are experiencing growing volumes of dark data across their technology platform(s). In fact, dark data
is lurking in many data and content systems including mobile devices, local computer drives, email, network
file shares, legacy paper files, cloud file sharing services and even structured databases, such as the document
management system. Dark data is largely unstructured, such as real-time communications and documents, but can
also be semi-structured, for example XML code, or structured, as in a database. Few firms today have insight into
their dark data and many are fearful of exposing content that could be highly confidential or contain information
subject to legal discovery.
The Dark Data Task Force surveyed a small sample of law firm records managers and IG professionals. The Dark Data Survey results showed that most firms report they are either currently addressing dark data or plan to do so within
the next year. Firms with no plans to address this topic are in the minority.
The survey also indicated that the majority of firms are either in the process of developing policies on dark data or already have these policies in place.
Dark Data Vs. Big Data
We cannot adequately discuss dark data without also mentioning big data. Big data is specific large pools of data
against which analytics are run. Big data has, in recent years, compelled law firms to consider new technologies
such as enterprise search and predictive coding to help make content more accessible and meaningful. Many
organizations, including law firms, find themselves becoming paralyzed by the growing mountain of big data, of
which the vast majority is considered dark.
Why Focus On Dark Data?
Dark data is costly to analyze due to data quality issues, lack of resources to perform analysis and cleanup or
because it is just not yet recognized as a problem by the organization. Many law firms take the stance of “what
you don’t know won’t hurt you.“ If this is true, then shouldn’t dark data be left alone? We know that enterprise data
capacity is growing at a rate of up to 60 percent on average annually, making the ability to manage the growth of big
data even more challenging and potentially prohibitive. The key challenge with dark data is in determining if there is
any real value to justify the management of it. The main concern for most is that managing dark data would be more
expensive than any realized value gained from its management.
To determine if dark data is even worth further analysis, law firms need a means of cost-effectively finding,
reviewing, organizing and visualizing dark data on an ongoing basis. Some law firms are beginning to use file analysis
software to spot trends in legal project costs or client behaviors for business development. This information can
expose patterns that can improve law firm business planning. Firms should also employ file analysis software to
archive valuable business data as well as defensibly destroy valueless dark data. Enterprise search and workflow
applications can be leveraged to migrate high-value content to a structured repository based on end-user activity.
Firms should work towards prohibiting the use of offending storage locations and provide functionally equivalent
solutions for day-forward activities (i.e., file system emulation while cleanup activities are ongoing).
Landmines: Where Is Dark Data Hiding?
Dark data is alive and well in law firms. It lives in both paper and electronic format, evidenced by the number of
unmanaged paper records located in abandoned practice group work rooms, storage areas, attorney offices,
abandoned file cabinets, offsite storage and attorney home offices. Some users (you know who they are!) have
secret file locations. Others refuse to close a file, saving space for the next big case. And then there are the secretive
individuals that simply reply, “Only I need to know where those files are located,“ when asked about files.
Valuable information is often lost in unmanaged and ungoverned repositories. This dark data lives in dormant
servers, legacy applications, unclassified email messages, departed attorney mailboxes and network share drives,
as well as countless other repositories. The issue spans the entire firm and has no boundaries. It even expands
to third-party vendors who are housing data on behalf of the firm in databases, such as systems that house
administrative HR and payroll data, as well as third-party vendors that host discovery data for litigation cases
and corporate deal rooms.
Generally speaking, most users want to be efficient and will follow processes that make sense. The firm’s information
governance team can help by seeking input from subject matter experts and designing workflows that make
sense, save time, eliminate duplicate data and facilitate collaboration among teams, while saving server space
and better managing the firm budget. However, each organization is unique. This report will attempt to give you
some suggested places to begin your search for dark data and provide methods for bringing it under control on an
ongoing basis. See Appendix B for a sample checklist for the collection of dark data.
In order to manage the madness, start with the creation of a data map by asking questions such as:
- Where are people storing documents?
- What types of documents are stored there?
- What are the date ranges?
- What business units are affected?
Be sure to evaluate approved repositories for the firm by administrative department(s) and practice group(s). Keep in
mind that document management needs will vary by group. Also be aware that a data map is a living document that
should be updated on a regular basis. You will need to map the various ways that data comes into the organization.
Established firm policy should be considered when mapping out standard data flows because revisions or updates
may be necessary in terms of how users are taking on risk for the organization. Successful data flows are created
in a collaborative environment and endorsed by senior management. Understand how dark data is created so it
can be identified and managed. For example, some users may maintain their own “just in case“ copy of a data set
on the shared drive or other users may fail to use established client matter numbers to store data in the document
The Sample Data Flow Diagram in Appendix D tracks the movement of data into and out of the organization and
defines the path for mitigating the addition of content that doesn’t belong. Failure to establish these boundaries
adds risk to the organization and makes it difficult to comply with litigation holds, court ordered destruction, client
directives via outside counsel guidelines (OCG) and routine disposition as defined in the firm’s retention policy.
Firms should determine which functional area within the organization will lead the charge in managing dark data. Most often this responsibility falls to the Records Management Group.
Another consideration is the frequency at which dark data will be evaluated. For electronically stored information (ESI), the most common frequency is annually or as part of a system or server migration. This approach will limit effectiveness unless some form of automation or system design can keep up with the growth of data.
Client Data Considerations
The bulk of a firm’s dark data is old client data. For long-term clients, this data is primarily paper, and as you progress in time, the data begins to transition to more electronic information. This can be a difficult problem to tackle because there is no obvious methodology to round up dark data that is right for every firm. However, there are several basic tenants to consider.
Risk-based decisions will need to be made which will require that you have conversations with your General Counsel. While dark data is a hot topic in the industry and an issue that almost every firm is dealing with, there is no brightline rule or prevailing case law that provides specific guidance as to how to proceed. Most firms are forced to make risk-based decisions derived from the general topic surrounding their duty to safeguard client files. While the lawyer is clearly under an obligation to preserve and protect client records, it remains unclear for how long said records need to be retained (if no client directive exists). The conversation then turns into a question of: At what cost must the firm continue to safeguard client information and attempt to locate the client in order to obtain consent to destroy?
The life-to-date offsite storage cost of records relating to a particular matter can potentially exceed the revenue generated by that matter. Additional consideration should be given to potential costs that would be incurred if boxes were to be retrieved and examined. Typically, temporary staff needs to be brought in to facilitate the effort and firms report an 8-10 year return on this investment when this is done. There is a trend among firms subscribing to the position that these measures are cost prohibitive and create an undue hardship on the financial position of the firm at a time when there is significant pressure to provide the highest quality legal service at the lowest possible cost. These firms accept the risks associated with accelerated disposition and minimal client consent as a means of remaining competitive (and in business). Similar arguments are being made to dispose of legacy ESI, especially when
the client matter (number) to which it belongs has not been linked to the record. Manual review and analysis of this data is equally burdensome and this problem is growing faster than ever.
If your firm decides to proceed along these lines, you must be able to document your process. That doesn’t mean you must have a retention policy; many firms don’t and can still safely destroy dark data. But you will need to demonstrate that you have consistently defined the scope of the destruction and that you have applied destruction consistently within that scope. Having a well-documented process that is consistently applied will help the firm defend against potential claims of spoliation in the future.