CASE STUDY

Infotel UK Consulting

Advancing GDPR compliance through Natural Language Processing (NLP)
Infotel - Logo
Image credit: Infotel UK Consulting

The company

Infotel UK Consulting specialises in creating and implementing IT solutions, driving digital transformation, and managing data across various sectors.  

Established in 2015 and headquartered in Newcastle upon Tyne, Infotel UK is part of The Infotel Conseil Group, providing expertise in software design, systems migration, and GDPR compliance. The company's rapid growth in the UK led to the establishment of a purpose-built innovation lab, further enhancing its ability to support businesses with complex IT requirements. 

The work produced with by their Data Science team went on to be applied to Infotel’s GDPR compliance tool, Deepeo –a software product that provides a user-friendly, configurable system to managed data. 

Passionate about advancing the AI space in a productive, innovative way to support compliance, Infotel’s collaboration with NICD focussed on improving processes and staff upskilling. Ensuring that the team had the support, guidance and resources needed to fully achieve their research goals was at the heart of Infotel UK’s approach to the collaboration. 

The problem 

GDPR compliance challenges in detecting PII within multilingual legal datasets

Infotel UK Consulting developed an internal tool named Deepeo designed primarily for GDPR compliance, with a strong emphasis on Personally Identifiable Information (PII) detection. Deepeo's core function is to assist clients in ensuring that their data is compliant with GDPR regulations, particularly by identifying and addressing any PII within their datasets.

Sean Bayly is a Data Scientist at Infotel: “PII detection is one of Deepeo’s fastest functions, but it is more generally a GDPR compliance tool. Simply, if a customer is not sure if the data they are holding is compliant with GDPR, Deepeo can connect and quite simply figure out and address any articles that need to be deleted or checked over.”

Deepeo as a tool was greatly successful, however with advances in Natural Language Processing (NLP), the team saw an opportunity to leverage Artificial Intelligence (AI) to automatically detect PII with greater accuracy and efficiency. This led them to consider how AI could improve the tool’s ability to process large and complex datasets, specifically those containing multilingual legal transcripts from the European Union.

The dataset in question, known as the EUR-Lex dataset, comprises a vast collection of sentences from legal transcripts across the European Union in 26 different languages. These sentences are labelled at the word or token level, identifying whether each entity is a person, location, vehicle, bank account, or other personal information. The complexity and multilingual nature of this dataset presented a significant challenge in achieving high classification accuracy.

Sean described the motivation for collaborating with NICD: "We have our GDPR compliance tool that focuses on PII, so we thought, why not explore using AI to automatically detect PII? That was the inspiration behind the project and why we engaged with NICD in this way."

Infotel - Sean (1)

Sean Bayly, Data Scientist, Infotel. Image credit: Infotel UK Consulting.

The goal

Enhancing PII detection accuracy using AI and team upskilling

The primary goal was to achieve the highest accuracy in detecting PII within the EUR-Lex dataset using advanced technology. As Sean explained, "Our main goal was to get the best classification score we could."

In addition to improving classification accuracy, a secondary but important goal of the project was to upskill the Infotel team.

By collaborating with the National Innovation Centre for Data (NICD), the team aimed to enhance their expertise in AI and machine learning, ensuring they could independently manage and further develop the tool in the future.

Simon Horrocks who contributed to the software development of the project, emphasised the benefit of upskilling in data science skills: "As a software developer, I've had the chance to work with the Data Science team on this project. Through the NICD sessions, I've picked up new skills that have benefited my professional development. I see myself as now sitting somewhere between data science and software development."

Infotel - Simon and Danny  (1)

Left - Simon Horrocks, Junior Software Developer. Right - Danny Glover, Junior Data Scientist.
Image credit: Infotel UK Consulting.

The solution

To enhance PII detection, the project team, led by NICD’s former Data Scientist Dr. Mac Misiura, guided Infotel through AI model implementation. Supported by Data Scientist Akash Kumar, with contributions from the broader Infotel team—including Senior Java Developer Don Horrell, who brought his extensive experience to the project before his recent retirement, and Junior Data Scientist Danny Glover, the team mastered complex frameworks and methodologies needed to build towards a solution.

The team began by implementing Bi-directional Encoder Representations from Transformers (BERT) style models for classification tasks. These models are known for their effectiveness in understanding the context within sentences, making them well-suited for identifying specific entities like PII. This initial approach allowed the team to consume sentences and generate labels indicating the presence of personal information.

As the project progressed, the team shifted towards more advanced, decoder-style models, which are often used for generating text. This phase of the project was influenced by research from the GPT-NER paper which demonstrates the effectiveness of using models like ChatGPT for labelling sentences based on whether they contain personal information. By adopting this method, the team could leverage AI not just to classify, but also to assist in generating and refining labels, providing a dual approach to PII detection.

To support these efforts, the team primarily utilised the Hugging Face platform. The Hugging Face Transformers API and Pipelines API were crucial in implementing and managing the AI models, enabling the team to streamline the process and efficiently handle the large and complex EUR-Lex dataset.

Infotel - Stock Image

Image credit: Canva

The result

The project successfully achieved its intended outcomes, delivering significant improvements and insights that extended beyond the initial objectives. Sean reflected on the project's success, noting, "Everything we did, we were able to get good scores out." The enhanced accuracy in PII detection demonstrated the effectiveness of the AI models and approaches implemented.

The collaboration provided valuable exposure to various NLP techniques. Post-collaboration, Infotel continued exploring NLP innovations, with plans to integrate findings into Deepeo’s future features.

The initiative also had a broader impact on the professional growth of the team members. Sean highlighted how the project's focus on transformers, an innovative technology in AI, significantly influenced his approach to both professional work and academic research. The insights gained from this project made his ongoing PhD research more relevant to the current state of the art in the field.

The project met its goals of improving PII classification and provided a foundation for future research and development in Natural Language Processing (NLP) within Infotel, giving the team an edge in the space of GDPR compliance and AI-driven data solutions.

 

“Our CEO, Mundeep, is a strong advocate for this transition into the AI space. The project has definitely reignited our passion for innovation.

Sean Bayly, Data Scientist, Infotel

Business impact

The project significantly impacted Infotel's strategy and operations, positioning them as a leader in AI-driven data compliance. Upskilling with NICD has fuelled new AI offerings and academic contributions, further strengthening their market position.

One of the most notable outcomes is the dedicated support from Infotel’s CEO, Mundeep Nayyar, who is backing the company's transition into the AI space. This has led to the development of new AI-based offerings for clients, aimed at driving further business growth. The project has also spurred discussions with Newcastle University about potential industrial placements within Infotel’s innovation lab, reflecting the company’s commitment to fostering talent and innovation in AI.

In terms of tangible outcomes, the project has enabled Sean and the team to create valuable content and case studies that highlight their expertise in AI and data science, positioning them as thought leaders in Natural Language Processing and Machine Learning.

Additionally, the team has been involved in authoring an academic paper that is on the verge of publication, marking a significant achievement for the business and further establishing their expertise in the field. The team is also considering naming Dr Mac Misiura as a co-author on this paper, recognising his significant contribution to the project.

The success of this project has also led Infotel to focus on expanding their data science team and AI capabilities, indicating a long-term commitment to integrating AI into their core business strategy. The project's impact is evident in both the immediate enhancements to Deepeo and aligns with strategic ambitions within the company.

The future 

The project's success has propelled Infotel’s AI and machine learning advancements, leading to applications for further funding to continue their innovation journey.

Looking ahead, Infotel plans to deepen their expertise in deploying AI models at scale. Sean commented, "Now that we are comfortable with the theory, new libraries, and frameworks. The next challenge is mastering how to architect and scale these models for broader applications.”

Infotel is also keen to explore emerging technologies, such as AI agents and the intersection of NLP with machine vision. Sean highlighted the potential for future projects, saying, "There could be some quite exciting opportunities at the crossroads of natural language and machine vision."

Future projects may include further collaboration with NICD as Infotel continues to integrate AI and data science into their products and operations.

 

“I highly recommend the opportunity to collaborate with NICD to other businesses. Being able to engage directly with an expert who is authoritative on the subject, delivers confident and clear answers, and communicates in an accessible manner has a great impact on your work.

Sean Bayly, Data Scientist, Infotel

 

 


To discover more about Infotel UK Consulting, visit their website. 

You can read more of our case studies and sign up to our newsletter to keep up to date with our latest news, events and developments.

 

discovery-link-1440

Our Discovery workshop

Our Discovery workshops enable you to explore the potential of your data and understand the benefit you could gain before committing to a full-scale project.