e-Discovery, or e-Disclosure, is an essential phase of not just litigation, but also investigations, regulatory enquiries, compliance assessments and increasingly arbitrations. Data is the lifeblood of most modern-day organisations, and although not the only source of information relevant to an investigation, data can provide an un-biased, un-altered and accurate reflection of historic events unlike other sources. Data can be more reliable than the human mind, especially given the history of disputes, and data tends to be more pervasive and persistent than paper documents.
Given the use of technology throughout a workplace and beyond, data exists in many different forms but can be grouped into four categories: unstructured, structured, semi-structured and social.
Unstructured data refers to information where the content does not exist within a pre-defined form, is generally text-heavy and typically comprises e-mails, documents, spreadsheets and presentations.
Structured data is the opposite of unstructured data, in that it refers to information where the content does have a pre-defined form and is generally in the form of ‘databases’, for example financial & accounting systems and customer relationship management systems.
A hybrid of structured and unstructured data, referred to as semi-structured data, can also be prevalent within an organisation. This is where the content tends to be unstructured, but it is bound by a more solid structure. A typical example of this would be chat or instant messenger messages – which are becoming more widely used and pertinent in certain industries, and therefore should not be overlooked.
Social data refers to data that is shared publicly or shared within a more restricted context, e.g. within an organisation or a circle of ‘friends’. Social data is stored within a central repository and includes not only the content but also information that is linked to this content, such as ‘shares’, ‘likes’, location, time posted, etc. Although the most recognisable sources will be external to an organisation (e.g. Facebook, LinkedIn, etc.), organisations are introducing these technologies internally and thus they need to be appropriately considered.
When dealing with data in respect of a dispute, the exact way that it is managed and implemented will vary from case to case. However, there are various models available which set out some of the key stages of such exercises. The most widely used, and referenced, is the Electronic Discovery Reference Model.
Although this model was designed to meet the requirements of legal discovery under US litigation, it has equal applicability in the UK and globally.
e-Discovery is as interesting a place as ever, as new technologies try to keep up with the challenges we are facing in the prolific growth and dependency of data. The global data pool is ever expanding, with over 90 percent of all data in the world created within the last two years, and there are no signs of this expansion rate slowing down. Not only are data volumes increasing but the range and diversity of software and applications that are used to create data are also increasing, leading to other issues such as data leakage and breaches due to human error or a lack of technical security measures. Examples range from employees saving documents to cloud-based storage systems; to communications with colleagues and clients being channelled through internal instant messaging platforms as well as external applications such as WhatsApp; to having to model complex trading data and comparing an entity’s trading records with historic and market patterns. Therefore, when considering an e-Discovery project these varied data sources need to be fully considered and incorporated into the process where proportionate and appropriate.
But as technology provides these challenges it also continues to provide solutions that can be used throughout the e-Discovery process. For example: email threading, which is an essential process that groups all emails within a conversation together; near de-duplication, which extends beyond exact duplicates to look at textual similarities between documents; clustering, which groups like documents together based on textual content; and predictive coding, which utilises machine learning technology to automatically code or prioritise a dataset, either through traditional learning sets or in utilising continuous active learning. None of these are particularly new, but their usage and legal recognition continues to grow with the challenges discussed above. Similarly, none of these provide a panacea to all ills: it is through the intelligent application of these, and other more traditional techniques, that they can help reduce and prioritise the volume of documents to be reviewed, provide data-led insights into a case, and enhance quality control procedures.
The law is also changing in respect to data generally. For example, the GDPR is now well embedded in organisations, or should be, and will continue to be a factor as data breaches continue to occur and enforcement activities start to ramp up. GDPR and related laws are obviously relevant to e-Discovery matters and, therefore, the requirements of it must be considered and decisions made fully documented, covering international transfers and the seven principles of processing personal data (which is almost unavoidable): lawfulness, fairness and transparency; purpose limitation; data minimisation; accuracy; storage limitation; integrity & confidentiality; and accountability, all of which should embody data protection by design.
The way data is being managed in UK courts is also changing. From January 1st, 2019, a new Practice Direction (Practice Direction 51U), intended to reform various aspects of the document disclosure process, will run in the Business and Property Courts of England and Wales. This two-year Disclosure Pilot Scheme (DPS) redefines disclosure duties and introduces five extended disclosure models.
Whereas there are the legal and case considerations that will impact decision making in respect of the DPS, it is also essential to consider the technology dimension and how different methodologies and processes can influence the decisions that need to be made.
The DPS strengthens duties around:
• Legal hold scoping and planning: for example, the rules regarding data preservation demand that both present and future “documents which might otherwise be deleted or destroyed” be preserved in a manner that keeps all associated metadata accurate.
• Data collection: there are explicit requirements to consider social media data in addition to the need to capture metadata and deleted files where legal hold may not be possible i.e. mobile devices.
• Disclosure Review Document: the DRD is intended to identify the issues requiring disclosure and agree on the exchange proposals for extended disclosure. A costs estimate is also required in addition to a good understanding of where the data is stored and how this data should be searched and reviewed. This goes beyond previous Electronic Disclosure Questionnaire (EDQ) requirements and early engagement with technologists will be beneficial in assisting with identifying potential data sources, accurately estimating data handling costs and translating complex technology-related concepts for the purposes of a DRD.
• Use of technology: this is encouraged throughout and the use of advanced technologies to address the challenges of disclosure is and will be heavily encouraged.In addition, the DPS provides for a two-stage process of disclosure:
• Initial disclosure, which takes place at the stage that a party files and serves its Particulars of Claim or Defence. It has a cap of 1,000 pages or 200 documents (whichever is the larger) and is intended to include key documents relied upon, and the key documents required to enable the other party to understand the claim or defence.
• Extended disclosure, which is required if it is appropriate to fairly resolve an issue in the case. It should be noted that different issues can require different models to be used depending on the nature of the issue itself. The five models are:
o Model A: Disclosure of known adverse documents only.
o Model B: Limited disclosure. This uses the same test as Initial Disclosure but without any limits, and also includes known adverse documents.
o Model C: Request-led, search-based disclosure. Disclosure of specific documents or categories of documents covering defined issues based on specific requests.
o Model D: Narrow search-based disclosure. Each party is required to undertake a reasonable and proportionate search. This closely resembles the previous Standard Disclosure.
o Model E: Wide search-based disclosure. Parties must disclose documents which are likely to support or adversely affect its claim or defence, or that of another party in relation to one or more Issues for Disclosure, or which may lead to a train of enquiry which may result in the identification of other documents for disclosure.
How Practice Direction 51U will play out will become apparent over time, as more judgements are handed down under the new rules. But early indications, for example UTB LLC v Sheffield United Limited  EWHC 914 (Ch), show that it is being taken seriously and that there will be an ongoing drive towards encouraging cooperation to ensure that disclosure is "...reliable, efficient and cost-effective..."
What is coming down the road?
• Ever growing and varied data volumes – just think of the Internet of Things, autonomous vehicles and the increased use of social data, never mind the ever-increasing data volumes associated with business-as-normal usage now. Data volumes are inevitably going to increase and their types evolve; although the way data is stored, and the technologies/methodologies available to analyse it, will continue to adapt to help negate the effect of increased and varied data loads.
• Intuitive technology – we are seeing the first generation of what is widely termed artificial intelligence embedded within the e-Discovery market now. The sophistication and ability of these tools will only increase, although I do not see them ever totally replacing humans in this respect, at least during my professional career. Technology will continue to become more intuitive, utilising multiple sources of data to enrich existing data and continually learning from previous decisions in a more granular and intelligent way. These technologies are now being used in a proactive manner in certain industries, for example to flag potentially fraudulent transactions or to monitor an employee’s communication for sentiment or behavioural changes. They will continue to become embedded and trusted in the e-Discovery and legal markets and potentially, in the long-term, being used at the point of creation. This would enable the automation of documents/emails/data categorisation and assessment, with appropriate processes managing onwards to the appropriate legal teams. Think of it as an automated application of information governance at the source – although this is certainly not in the near-future.
• Ethical, privacy and data protection concerns will continue to clash with the desire to use more and more data in an increasingly automated and insightful manner. We are seeing the first signs of this now, for example where prejudice becomes built into a machine-learning platform, thus reinforcing and strengthening the prejudice. The development of data protection and privacy concerns and laws, such as the GDPR, could also limit the way technology is implemented, and whereas lawyers will be key to determining how that is managed, technologists will continue to devise processes and methodologies to operationalise those decisions.