Generative AI: Data privacy, backup and compliance
Generative or conversational artificial intelligence (AI) tools have attracted a lot of attention, as well as some controversy, as applications such as OpenAI’s ChatGPT and Google’s Bard create human-like responses to queries or prompts.
These apps draw on large databases of content and raise questions around intellectual property, privacy and security. In this article, we look at how chatbots work, the risks posed to data privacy and compliance, and where generated content stands with regards to backup.
These tools – more accurately termed “generative AI” – draw on large language models to create human-like responses (see box). OpenAI’s large language model is the Generative Pre-trained Transformer (or GPT); Google Bard uses Language Model for Dialogue Applications (LaMDA).
However, the rapid growth of these services has caused concern among IT professionals. According to Mathieu Gorge, founder of VigiTrust, in a recent research project, all 15 chief information security officers he interviewed mentioned generative AI as a worry.
“The most serious concerns are IP leakage and confidentiality when using generative AI,” says Gorge, adding that the ease of use of web- or app-based AI tools risks creating another form of shadow IT.
As online services, generative AI apps transmit and process data over the internet. The main services do not detail where they physically store data.
“Every one of these services has different terms and conditions, and you need to read these very carefully,” says Tony Lock at Freeform Dynamics. “Are they using your inputs, so next time you log on they know who you are and how you like to phrase your queries? They are probably saving some of that information. A lot depends on the systems, because some use old data [to answer queries] and others go out and look at everything they can find.”
Chatbots and data privacy
These services do have data privacy policies, however. ChatGPT, for example, allows users to delete conversations one at a time (within a 30 day limit), to delete all their data, or to delete their entire account.
And the service monitors queries to prevent abuse. ChatGPT retains user data to improve its services, but users can opt out. Google, meanwhile, states that Bard collects “conversations, your location, your feedback and usage information” to improve the service and to improve Google’s machine learning services. It does not, despite online rumour, access personal information in Gmail or other Google service accounts.
Despite these safeguards, chatbot services pose a number of challenges for enterprises. They use public data for their models, but unlike enterprise-based machine learning and AI, firms have no control over or visibility into the training data. Nor is there any automated way to stop an employee sharing intellectual property or personally identifiable data, such as health or financial records, with Bard or ChatGPT.
“You need to have a policy and rules for where and when you use it,” says Gorge. Using a generative AI tool to create marketing materials is acceptable, he suggests, but they should not be used for sensitive and critical documents such as contracts.
Also, you need to define where data will be held and what will be used in the model, says Richard Watson-Bruhn, data security expert at PA Consulting.
“You may be using chat-like content in the model or you might be holding it separately for records,” he says. “Chat GPT, for example, records previous chats and typically uses them to improve model outcomes. There might, however, also be important compliance reasons to hold chats on a temporary basis even if they aren’t incorporated into the model.”
Chatbots and compliance
The use of public chatbot services also raises a number of compliance questions. If firms want to use customer data with generative AI, they will need to ensure data processing complies with GDPR. For internally operated systems, it is possible to obtain these consents.
For public chatbots, this is almost certainly impossible, prompting experts to advise against sharing personal data and even state bans.
These have been seen in the Italian DPA temporary ban for GDPR non-compliance (now lifted) and incidents such as the security breaches Samsung suffered using the ChatGPT tool. Heads of security and privacy are being drawn into considerations and questions on the business use, risks and compliance requirements of AI use.
There is a further compliance issue if enterprises use generative AI to make decisions that affect customers. Regulators are looking more closely at decisions made by AI or machine learning systems, and they will want to see these are made on reasonable grounds and free of bias and discrimination.
For in-house technology, keeping the records of decisions should be straightforward, and firms should also record details of the data used to train models. None of this is possible with public chatbots. Moreover, it is possible a generative AI system will make different decisions based on seemingly similar queries – the language models can interpret different words or phrases in different ways to a human analyst – and if training data or the large language model changes, this will also affect results.
This makes it hard for firms to explain decisions made by generative AI systems and to justify them.
“One of the issues is repeatability, or lack of repeatability,” says Patrick Smith, field chief technology officer for Europe at Pure Storage. “If you put the same queries into one of these AI tools, will you get the same response? I suspect you won’t if they are constantly updating their training data. If you look at the tools you can put into your own systems, then you can clearly lock down the training data at any given point.”
Chatbots and backup
This then raises the question of how organisations backup chatbot data, or whether that is possible at all. Services such as ChatGPT save queries for 30 days, and it is possible to export queries and responses. Once again, though, it is down to the individual using the service to do this – there are as yet no enterprise-level automated backup and compliance tools for what are largely experimental services – and there is no way to capture a snapshot of training data for any one query (see box).
This suggests that, while CIOs and chief data officers will want to experiment with generative AI, the technology still has some way to go before it is mature enough for mainstream enterprise use.