All’s Not Well with ChatGPT: How AI Fails on Privacy by Default and Design

ChatGPT-4 has taken everyone by storm, but not without causing alarm. From concerns about replacing human jobs to its rapid adoption without consideration for data privacy and security, there is a lot to learn about how businesses and individuals can use this technology—and ensure their confidential information and personal information remains out of the grasp of the robots and safe from unintended disclosure.

What is ChatGPT?

ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. ChatGPT is made possible by a language learning model, also known as a large language model, that requires an enormous amount of data to function and improve its ability to respond to user prompts.

Why is the Rapid Adoption of ChatGPT Without Restriction Concerning?

According to CNN, adoption quickly soared to 100 million active users within two months of the launch of ChatGPT—sans significant data privacy protections.

With no built-in restrictions around using personal information or other sensitive business information, such as trade secrets, the privacy community was hit by a shockwave with the system. ChatGPT’s quick release and public adoption failed to consider significant data privacy risks, and it is apparent that the language learning system failed to consider privacy by design and default principles mandated under GDPR Article 25, Data Protection By Design and By Default.

Data Privacy and Security Questions that Surround AI

Naturally, questions are being raised about the security of the systems and the security of personal information processed within them. All information used with ChatGPT is also used to train the chatbot. A few concerns include the following:

Questionable sources

There is little control as to where the AI tool obtains its data. Some of the information used to train ChatGPT includes information scraped from posts in sources such as Facebook, Twitter, and LinkedIn, without any regard for accuracy or the personal information that has found its way into the system. Information is also scraped from sources such as Wikipedia, which relies on unaffiliated individuals to contribute content without verifying the accuracy of the information.

Lack of consent

There is no consent to the processing or use of personal information obtained by ChatGPT, whether obtained from public sources or entered into the Chatbot directly. As a result, the system is able to produce personal information sufficient to identify individuals according to name, health conditions, locations where they live, work, shop, and more. Moreover, there is no opportunity for an individual to opt out of the processing. Significant data privacy concerns, particularly under the EU’s GDPR have been raised as a result.

Risk of exposure

Unfortunately, the important questions around the security of ChatGPT’s processing of personal information as well as confidential business information, have already been answered in the form of a data breach suffered by ChatGPT earlier this year. In March, ChatGPT experienced a data breach caused by a bug, which “exposed payment-related and other personal information” to other users, in addition to other users’ conversation histories.

AI: It’s a “little bit” scary

On Twitter, OpenAI CEO Sam Altman responded to ChatGPT’s data breach, stating: “We feel awful about this.” The company quickly implemented a patch after the breach was reported, but that did little to quell the concerns over data privacy and security around the system more generally. Shortly after that, Altman admitted that personally, he is a “little bit” scared about the release of AI tools, such as ChatGPT, when it comes to how the system uses information for training purposes and its ability to potentially replace human workers.

Shortly after ChatGPT’s data breach, on March 20, 2023, the FTC posted a blog warning companies:

“If you decide to make or offer a [generative AI] product . . . take all reasonable precautions before it hits the market. The FTC has sued businesses that disseminated potentially harmful technologies without taking reasonable measures to prevent consumer injury.” — Michael Atleson, Attorney, FTC Division of Advertising Practices

While addressing the intentional misuse of large language models by “bad actors” using the models to create deep fakes or “create or spread deception” (e.g., posts intended to mimic the style and vocabulary of famous individuals that are synthetic and intended to deceive the public), the warning is equally well-taken when it comes to the use of AI tools to create legitimate content.

Further efforts by the Biden Administration were announced earlier this month: “The U.S. Commerce Department on Tuesday [April 11] said it will spend the next 60 days fielding opinions on the possibility of AI audits, risk assessments and other measures that could ease consumer concerns about these new systems.”

The Administration is hoping for self-regulatory efforts by companies like OpenAI, as well as efforts to prevent the ingestion of personal information into the language learning models.

There is simply insufficient security around AI chatbots’ use of the content for training purposes and a lack of safeguards against inputting personal information into the systems to adopt AI tools such as ChatGPT at this time.

With ChatGPT, Security is Murky

The general security of the system has yet to be fully understood because of how the large language models use data supplied to the system to train and improve the AI. Even though ChatGPT’s terms prohibit using personal information with the model, a significant percentage of employees do just that.

Robert Lemos has reported that “[m]ore than 4% of employees have put sensitive corporate data into [ChatGPT], and such data could be retrieved if the proper data security isn’t in place.”Lemos further notes that “companies and security professionals have begun to worry that sensitive data ingested as training data into the models could resurface when prompted by the right queries.”

A system built without safeguards

Imagine requesting ChatGPT to provide a list of Social Security or financial account numbers. We are learning now that this could happen unless safeguards are built into the AI systems to avoid ingesting or using such information for learning purposes.

Danger in international waters

Privacy regulators are beginning to take notice of the dangers of ChatGPT and similar large language models. The first to take action were the regulatory authorities in Italy, which, on the heels of ChatGPT’s data breach, temporarily banned the use of ChatGPT, noting that there “appears to be no legal basis underpinning the massive collection and processing of personal data to ‘train’ the algorithms on which the platform relies.”

The privacy regulator, Garante, further criticized ChatGPT for its lack of age restrictions and ability to generate factually erroneous information in its generated content. The criticisms were based on an apparent failure of ChatGPT to build privacy protections into the design of the systems, as well as safeguards against falsely impersonating others.

While the European Union is developing affirmative regulations around the use of AI, the United Kingdom is taking a bit of a slower approach, announcing its intention to begin regulating the use of AI and currently looking to apply existing regulations to AI platforms. Of course, that means that the Privacy by Design and Default rules are in effect.

On the heels of the ChatGPT’s data breach, the United Kingdom may begin paying closer attention to Part 3, Chapter 4, Section 56 (General Obligations), in the U.K. Data Protection Act of 2018 and could take affirmative action against the use of AI models until more is understood and privacy built into the design of the systems.

What are some organizations doing about AI?

Because of the data privacy and confidentiality risks posed by the use of AI models, some large organizations have already cracked down on the use of AI Chatbots and large language models within their walls.

Walmart, Amazon, and Microsoft have warned employees not to share customer information with ChatGPT or other AI bots due to security and data privacy risks. JPMorgan and Verizon have also restricted and implemented bans on using ChatGPT due to compliance concerns over the security of personal information, including account information, that could be input into the large language models and then reproduced in response to a simple question.

While the leakage of customer information is a key concern, issues also exist over the unintended sharing by the AI models of source code, other trade secrets, and highly confidential information. Until privacy by design is built into ChatGPT and similar large language models, ChatGPT and others may have already seen the peak of its lawful use, given that regulators are fast on the trail of the AI models for privacy rights violations.

What’s in the Future for ChatGPT?

Advocacy groups are already involved in efforts to stop the use of ChatGPT and similar models. One advocacy group filed a complaint at the beginning of April to halt the use of ChatGPT, due to bias, deception, and privacy and public safety risks, among other ills. One assertion is that OpenAI’s attempt to hide behind its usage and privacy policies to shield it from liability is “unconscionable.” This is the first action in the United States, but not likely to be the last, targeting OpenAI and its wildly popular ChatGPT model.

Where does this leave companies seeking to take advantage of the benefits of large language models? Companies are advised to:

Tread lightly or hold off until privacy by design practices are built into the models and can filter out personal information, particularly sensitive information.
If you proceed with the use, privacy policies and consent tools must clearly reveal the use of an individual’s information for training purposes.
If you proceed with the use, ensure that data subjects are given a clear and easy way to opt out or opt in, depending on the jurisdiction, to the processing of personal information with the AI system.
Prepare a Privacy Impact Assessment with respect to the use of ChatGPT, which is now required in the majority of States that have enacted privacy laws, as well as in the EU and UK.

modCounsel: Get Proactive With Your Company’s Data Privacy

modCounsel provides privacy and risk counseling for new product launches, incorporation of third-party software into systems, and general privacy compliance.

Our team of experienced legal professionals is currently assisting clients with the development of Privacy Impact Assessments triggered by GDPR, UK, and U.S. state privacy law requirements in response to changes in product offerings.

If you would like to discuss your company’s use of a large language model, privacy laws, or Privacy Impact Assessments, please reach out to us today.

Ready To Move Your Business Forward?

Our client engagement team always sets a great first impression. Schedule a quick call with us or send a brief message to see for yourself.