Dark data task force report: identification and remediation of dark data in law firms


Download this report from Iron Mountain that deals with managing the accumulation of data that either isn’t appropriately tagged or easily identifiable.

August 14, 201512 mins
Iron Mountain logo with blue mountains

This report was created for law firm information governance (IG) management and law firm executives who are beginning to recognize, or are in fact currently dealing with the problems associated with dark data. The report logically begins with an industry (and Symposium) definition of dark data and explains where dark data can be found in law firms.

The report then moves into a detailed explanation of the problems/issues/challenges law firms face with the unmanaged accumulation of dark data, including rising storage costs, reduced employee productivity and increasing risk around client data leakage.

The report then explores the various tools that can help bring this dark data phenomenon under control. The tools discussed include the creation of polices and workflows to address dark data now and into the future, as well as emerging technologies such as File Analysis Software (FAS) and Data Loss Prevention (DLP) software that can automate the identification, classification, management and disposition of dark data.

Finally this report offers a three-pronged approach for managing dark data in law firms.

“Olly olly oxen free! Come out, come out wherever you are.“ Finding dark data is often like playing the childhood game of hide and seek. The information governance (IG) professional, playing the role of the “seeker“ or “it,“ is on a mission to uncover data that has been hiding in various firm repositories for many years. The seeker “tags“ objects as they are found and continues with this strategy until all dark data has been flushed from its hiding places.

“Ten, nine, eight, seven, six, five, four, three, two, one! Ready or not, here I come!“ cries the IG professional as she or he begins the hunt for dark data. This dark data report provides practical guidance for law firms who may be searching for dark data, now or in the near future. The report is a compilation of information obtained through a survey of Symposium participants, industry research and group member work experience. A framework is provided for evaluating the law firm dark data posture. In addition, recommended strategies are offered as a way to justify an evaluation and eventual analysis of dark data. The strategies considered include uncovering dark data, continuing management and cost-benefit and risk assessment. Law firm IG professionals may find this information useful as they explore and settle on an appropriate methodology for their firms.

What is dark data?

There are many characterizations of dark data. In July 2014, the LegalTech® West Coast panel “Recover or Delete Dark Data“ defined dark data as enterprise data that is predominately uncategorized, has limited visibility to the organization (if not completely obscured) and because of its obscurity, serves no apparent business purpose. Law firms are experiencing growing volumes of dark data across their technology platform(s). In fact, dark data is lurking in many data and content systems including mobile devices, local computer drives, email, network file shares, legacy paper files, cloud file sharing services and even structured databases, such as the document management system. Dark data is largely unstructured, such as real-time communications and documents, but can also be semi-structured, for example XML code, or structured, as in a database. Few firms today have insight into their dark data and many are fearful of exposing content that could be highly confidential or contain information subject to legal discovery.

The Dark Data Task Force surveyed a small sample of law firm records managers and IG professionals. The Dark Data Survey results showed that most firms report they are either currently addressing dark data or plan to do so within the next year. Firms with no plans to address this topic are in the minority.

The survey also indicated that the majority of firms are either in the process of developing policies on dark data or already have these policies in place.

Dark data vs. big data

We cannot adequately discuss dark data without also mentioning big data. Big data is specific large pools of data against which analytics are run. Big data has, in recent years, compelled law firms to consider new technologies such as enterprise search and predictive coding to help make content more accessible and meaningful. Many organizations, including law firms, find themselves becoming paralyzed by the growing mountain of big data, of which the vast majority is considered dark.

Why focus on dark data?

Dark data is costly to analyze due to data quality issues, lack of resources to perform analysis and cleanup or because it is just not yet recognized as a problem by the organization. Many law firms take the stance of “what you don’t know won’t hurt you.“ If this is true, then shouldn’t dark data be left alone? We know that enterprise data capacity is growing at a rate of up to 60 percent on average annually, making the ability to manage the growth of big data even more challenging and potentially prohibitive. The key challenge with dark data is in determining if there is any real value to justify the management of it. The main concern for most is that managing dark data would be more expensive than any realized value gained from its management.

To determine if dark data is even worth further analysis, law firms need a means of cost-effectively finding, reviewing, organizing and visualizing dark data on an ongoing basis. Some law firms are beginning to use file analysis software to spot trends in legal project costs or client behaviors for business development. This information can expose patterns that can improve law firm business planning. Firms should also employ file analysis software to archive valuable business data as well as defensibly destroy valueless dark data. Enterprise search and workflow applications can be leveraged to migrate high-value content to a structured repository based on end-user activity. Firms should work towards prohibiting the use of offending storage locations and provide functionally equivalent solutions for day-forward activities (i.e., file system emulation while cleanup activities are ongoing).

Landmines: where is dark data hiding?

Dark data is alive and well in law firms. It lives in both paper and electronic format, evidenced by the number of unmanaged paper records located in abandoned practice group work rooms, storage areas, attorney offices, abandoned file cabinets, offsite storage and attorney home offices. Some users (you know who they are!) have secret file locations. Others refuse to close a file, saving space for the next big case. And then there are the secretive individuals that simply reply, “Only I need to know where those files are located,“ when asked about files.

Valuable information is often lost in unmanaged and ungoverned repositories. This dark data lives in dormant servers, legacy applications, unclassified email messages, departed attorney mailboxes and network share drives, as well as countless other repositories. The issue spans the entire firm and has no boundaries. It even expands to third-party vendors who are housing data on behalf of the firm in databases, such as systems that house administrative HR and payroll data, as well as third-party vendors that host discovery data for litigation cases and corporate deal rooms.

Generally speaking, most users want to be efficient and will follow processes that make sense. The firm’s information governance team can help by seeking input from subject matter experts and designing workflows that make sense, save time, eliminate duplicate data and facilitate collaboration among teams, while saving server space and better managing the firm budget. However, each organization is unique. This report will attempt to give you some suggested places to begin your search for dark data and provide methods for bringing it under control on an ongoing basis. See Appendix B for a sample checklist for the collection of dark data.

In order to manage the madness, start with the creation of a data map by asking questions such as:

  • Where are people storing documents?
  • What types of documents are stored there?
  • What are the date ranges?
  • What business units are affected?

Be sure to evaluate approved repositories for the firm by administrative department(s) and practice group(s). Keep in mind that document management needs will vary by group. Also be aware that a data map is a living document that should be updated on a regular basis. You will need to map the various ways that data comes into the organization. Established firm policy should be considered when mapping out standard data flows because revisions or updates may be necessary in terms of how users are taking on risk for the organization. Successful data flows are created in a collaborative environment and endorsed by senior management. Understand how dark data is created so it can be identified and managed. For example, some users may maintain their own “just in case“ copy of a data set on the shared drive or other users may fail to use established client matter numbers to store data in the document management system.

The Sample Data Flow Diagram in Appendix D tracks the movement of data into and out of the organization and defines the path for mitigating the addition of content that doesn’t belong. Failure to establish these boundaries adds risk to the organization and makes it difficult to comply with litigation holds, court ordered destruction, client directives via outside counsel guidelines (OCG) and routine disposition as defined in the firm’s retention policy.


Firms should determine which functional area within the organization will lead the charge in managing dark data. Most often this responsibility falls to the Records Management Group.

Another consideration is the frequency at which dark data will be evaluated. For electronically stored information (ESI), the most common frequency is annually or as part of a system or server migration. This approach will limit effectiveness unless some form of automation or system design can keep up with the growth of data.

Client data considerations

The bulk of a firm’s dark data is old client data. For long-term clients, this data is primarily paper, and as you progress in time, the data begins to transition to more electronic information. This can be a difficult problem to tackle because there is no obvious methodology to round up dark data that is right for every firm. However, there are several basic tenants to consider.

Risk-based decisions will need to be made which will require that you have conversations with your General Counsel. While dark data is a hot topic in the industry and an issue that almost every firm is dealing with, there is no brightline rule or prevailing case law that provides specific guidance as to how to proceed. Most firms are forced to make risk-based decisions derived from the general topic surrounding their duty to safeguard client files. While the lawyer is clearly under an obligation to preserve and protect client records, it remains unclear for how long said records need to be retained (if no client directive exists). The conversation then turns into a question of: At what cost must the firm continue to safeguard client information and attempt to locate the client in order to obtain consent to destroy?

The life-to-date offsite storage cost of records relating to a particular matter can potentially exceed the revenue generated by that matter. Additional consideration should be given to potential costs that would be incurred if boxes were to be retrieved and examined. Typically, temporary staff needs to be brought in to facilitate the effort and firms report an 8-10 year return on this investment when this is done. There is a trend among firms subscribing to the position that these measures are cost prohibitive and create an undue hardship on the financial position of the firm at a time when there is significant pressure to provide the highest quality legal service at the lowest possible cost. These firms accept the risks associated with accelerated disposition and minimal client consent as a means of remaining competitive (and in business). Similar arguments are being made to dispose of legacy ESI, especially when the client matter (number) to which it belongs has not been linked to the record. Manual review and analysis of this data is equally burdensome and this problem is growing faster than ever.

If your firm decides to proceed along these lines, you must be able to document your process. That doesn’t mean you must have a retention policy; many firms don’t and can still safely destroy dark data. But you will need to demonstrate that you have consistently defined the scope of the destruction and that you have applied destruction consistently within that scope. Having a well-documented process that is consistently applied will help the firm defend against potential claims of spoliation in the future.