e-Discovery, or e-Disclosure, is an essential phase not just of litigation, but also investigations, regulatory enquiries, compliance assessments and, increasingly, arbitrations. Data is the lifeblood of most modern-day organisations, and although not the only source of information relevant to an investigation, data can provide an unbiased, unaltered and accurate reflection of historic events, unlike other sources. Data can be more reliable than the human mind, especially given the history of disputes, and data tends to be more pervasive and persistent than paper documents.


Given the use of technology throughout a workplace and beyond, data exists in many different forms, but can be grouped into four categories: unstructured, structured, semi-structured and social.

Unstructured data refers to information where the content does not exist within a predefined form, is generally text-heavy and typically comprises emails, documents, spreadsheets and presentations.

Structured data is the opposite of unstructured data, in that it refers to information where the content does have a predefined form and is generally in the form of ‘databases’, for example financial & accounting systems and customer relationship management systems.

A hybrid of structured and unstructured data, referred to as semi-structured data, can also be prevalent within an organisation. This is where the content tends to be unstructured, but it is bound by a more solid structure. A typical example of this would be chat or instant messenger messages – which are becoming more widely used and pertinent in certain industries, and therefore should not be overlooked.

Social data refers to data that is shared publicly or shared within a more restricted context, e.g. within an organisation or a circle of ‘friends’. Social data is stored within a central repository and includes not only the content but also information that is linked to this content, such as ‘shares’, ‘likes’, location, time posted, etc. Although the most recognisable sources will be external to an organisation (e.g. Facebook, LinkedIn, etc.) organisations are introducing these technologies internally and thus they need to be appropriately considered.

Managing data 

When dealing with data in respect of a dispute, the exact way that it is managed and implemented will vary from case to case; however, there are various models available which set out some of the key stages of such exercises. The most widely used, and referenced, is the Electronic Discovery Reference Model.  

Although this model was designed to meet the requirements of legal discovery under US litigation, it has equal applicability in the UK and globally.

e-Discovery is as interesting a place as ever as new technologies try to keep up with the challenges we are facing in the prolific growth and dependency of data. Not only are data volumes increasing, but the range and diversity of software and applications that are used to create data are also increasing – especially in the current working environment with so many people working from home or remotely. This has complicated the situation from an e-Discovery perspective, as there are now more systems that need to be considered; for example, the use of collaborative tools which facilitate file sharing and instant messaging, like Microsoft Teams, Zoom and Google Hangouts has increased dramatically. These may not be relevant in every case but need to be considered when mapping out the IT landscape and deciding what data to collect or not, and why.

But as technology provides these challenges it also continues to provide solutions that can be used throughout the e-Discovery process. For example, the use of remote imaging solutions and software is enabling data to be successfully captured without an on-site visit due to the present social-distancing rules; continued use of data reduction processes such as email threading, near deduplication and clustering conceptually similar documents; and the increased use of advanced analytics and assisted review technology, such as Continuous Active Learning (CAL), which in addition to the traditional model can also now “learn” from coding decisions in real time and uses those insights to promote documents more likely to be relevant to the top of the review queue.

These aren’t particularly new, but their usage and legal recognition continues to grow with the challenges discussed above. Similarly, none of these provide a panacea to all ills: it is through the intelligent application of these, and other more traditional techniques, that they can help reduce and prioritise the volume of documents to be reviewed, provide data-led insights into a case, and enhance quality control procedures.

Managing some of the risks 

The added range and diversity of software and applications in use is also leading to increased risk from data leakage and breaches due to human error or a lack of technical security measures. Examples range from employees saving documents to cloud-based storage systems; to communications with colleagues and clients being channelled through internal instant messaging platforms as well as external applications such as WhatsApp; to having to model complex trading data and comparing an entity’s trading records with historic and market patterns. Therefore, when considering an e-Discovery project these varied data sources need to be fully considered and incorporated into the process where proportionate and appropriate.

Within the UK, the law is also changing in respect to data. For example, the GDPR is now well embedded in organisations, or should be, and will continue to be a factor as data breaches continue to occur and enforcement activities start to ramp up. GDPR and related laws are obviously relevant to e-Discovery matters and therefore, the requirements of it must be considered and decisions made fully documented, covering international transfers and the seven principles of processing personal data (which is almost unavoidable): lawfulness, fairness and transparency; purpose limitation; data minimisation; accuracy; storage limitation; integrity and confidentiality; and accountability, all of which should embody data protection by design.

Similar to the GDPR, there has also been an increase in the number of countries with laws restricting the cross-border transfer of data, unless the recipient country offers similar protection in its laws or additional measures are put in place. With the increasingly popularity of cloud-based hosting and data services, this has led to the offering of “data residency as a service” and the growth of local or regional data hosting options.

Disclosure Pilot Scheme 

The way data is being managed in UK courts has also continued to change as working practices and judgments reflect Practice Direction 51U, which came into effect from 1 January 2019. This Practice Direction intends to reform various aspects of the document disclosure process in the Business and Property Courts of England and Wales. The two-year Disclosure Pilot Scheme (DPS) redefines disclosure duties and introduces five extended disclosure models.

At its core, the DPS aims to:

• Modernise disclosure by incorporating technological advances to make the process of disclosure more efficient and limit the handling of irrelevant or duplicative material.
• Address the increasing data volumes posed by various electronic material (inclusive of social media content, instant messages, cloud storage, accounting systems, other business systems, servers, emails, mobile devices and external storage devices) by having various disclosure models aimed at managing the scope of a disclosure.
• Manage the increasing costs of litigation.
• Be used in a proportionate manner across cases of all sizes, varying value and complexity.

The DPS is well into its second year, and across the legal community there is now increased experience in using the DPS rules and various considerations are being made much earlier in the litigation process. However, ambiguities and misunderstandings still exist.

In February this year, in the case of McParland & Partners Limited and Fairstone Financial Management Limited v Stuart William Whitehead, Sir Geoffrey Vos, Chancellor of the High Court, discussed misunderstandings existing with the DPS from a judiciary perspective. These misunderstandings included the identification of issues for disclosure, choosing between disclosure models and cooperation between the parties involved.

Choosing between disclosure models could be not as clear cut as the DPS intends, leading to real uncertainty. Based on the existing model structure, it is possible that parties may opt for a different model to address the same issues. Also, there may be instances where the issues do not completely fit into a specific model. Even when a model is selected, as the matter progresses, significant developments can occur that can lead to inefficiencies in having to constantly adhere to the requirements of a model chosen earlier on in the matter, based on consideration of the documents that were likely to be held.

Communication and cooperation between all parties involved and with the court is key to agreeing to the issues for disclosure, setting the parameters of disclosure and completing the Disclosure Review Document (DRD), all innkeeping with the intended spirit of the DPS.

The added rules regarding data preservation demand that both present and “documents which might otherwise be deleted or destroyed” be preserved. This is inclusive of data under the client's direct control and held by third parties. While this may seem like an increase in the scope of data to be considered, leveraging the right disclosure model combined with the appropriate technology can achieve managing the scope of a disclosure.

Earlier involvement of technologists has been beneficial in assisting with thoroughly identifying potential present and historic data sources, inclusive of data held by third parties, understanding company data retention policies, more accurately estimating data handling costs and translating complex technology-related concepts for the purposes of a DRD.

The DPS aims to manage the increasing costs of litigation, and with more time being spent in scoping, identification of issues and choosing an appropriate disclosure model the present trend is that initial costs have increased with the intended result being cost savings later, as the scope of review efforts is more proportionally defined. While this approach may work for higher-value claims due to the “more defined” scope of disclosure some may question if this will also translate to a cost saving in lower-value claims.

Also, managing matters that don’t completely fit into a specific model or having to consistently adhere to the requirements of a chosen model has led to unnecessary, disproportionate costs at later stages as processes around disclosure for matters with many issues can become overly complicated.

All in all, it appears that the success of the pilot will depend on whether parties are able to cooperate, effectively communicate and use the various models and technological tools available in an applicable, proportional manner.

Despite how the DPS is used going forward, prior to the pilot, parties wanting to use advanced analytics, predictive coding, assisted review and other technologies had to convince the court of why it was needed. Now, it appears that the tables have turned, and parties must now justify why they may have decided to not use technology. This has greatly enhanced the embracing of appropriate technology use within the disclosure process.

It is expected that the DPS is likely to be extended through to the end of 2021 as its working group may deem it premature to end the pilot at the end of this year, as originally planned.

This extension is not surprising as one of the earliest judgments handed down under the new rules in UTB LLC v Sheffield United Ltd was in April 2019. This judgment showed that the DPS was being taken seriously and confirmed that the DPS applied to cases where a standard disclosure was made under Part 31 of the Civil Procedure Rules (Part 31) and subsequent applications addressing disclosure should follow the DPS rules, thus endorsing the ongoing drive towards encouraging cooperation to ensure that disclosure is "reliable, efficient and cost-effective".

What is coming down the road? 

• Ever-growing and varied data volumes – Just think of the Internet of Things, autonomous vehicles and the increased use of social data, never mind the ever-increasing data volumes associated with business as normal usage now. Data volumes are inevitably going to increase and their types evolve, although the way data is stored and the technologies/methodologies available to analyse that data will continue to adapt to help negate the effect of increased and varied data loads.

• Ever-increasing use of cloud and collaborative/communication technologies – We are seeing not only the continued trend of companies putting their data in the cloud, where there are sensible business advantages to be had, but also the expansion of collaborative/communication tools – which has had a huge boost from the impact of Covid-19. This means that data has at the same time become more centralised (through the cloud) but also dispersed as people use collaborative tools as well as emails, chat messages and text messages (etc.) to communicate.

• Intuitive technology – We are seeing the first generation of what is widely termed artificial intelligence embedded within the e-Discovery market now. The sophistication and ability of these tools will only increase. Technology will continue to become more intuitive, utilising multiple sources of data to enrich existing data and continually learning from previous decisions in a more granular and intelligent way. These technologies are now being used in a proactive manner in certain industries, for example to flag potentially fraudulent transactions or to monitor an employee’s communication for sentiment or behavioural changes. They will continue to become embedded and trusted in the e-Discovery and legal markets and potentially, in the long term, be used at the point of creation. This would enable the automation of documents/emails/data categorisation and assessment, with appropriate processes managing onwards to the appropriate legal teams. Think of it as an automated application of information governance at the source – although this is certainly not in the near future.

• Ethical, privacy and data protection concerns – These will continue to clash with the desire to use more and more data in an increasingly automated and insightful manner. We are seeing the first signs of this now, for example where prejudice becomes built into a machine-learning platform, thus reinforcing and strengthening the prejudice. The development of data protection and privacy concerns and laws, such as the GDPR, could also limit the way technology is implemented, and whereas lawyers will be key to determining how that is managed, technologists will continue to devise processes and methodologies to operationalise those decisions.