MIT Technology Review Subscribe

How LLMs could supercharge mass surveillance in the US

The technology could make commercially available bulk datasets even more of a privacy concern.

There are pieces of your life scattered all over the internet, and some of them are for sale. Data brokers amass web searches, financial records, and location data from millions of individuals and sell them to various clients, including the US government. Information on your recent online purchases or the route that you take to work could be sitting on hard drives around the world, waiting to be used.

While reassembling those pieces isn’t trivial, there is early evidence that LLMs might make it far easier. LLM agents could potentially do the work of intelligence analysts in a fraction of the time and for a fraction of the cost, which would enable the state to aim its all-seeing eye toward anyone, not just its highest-priority targets.

Advertisement

“A lot of what we think of as privacy protection isn’t so much like something that’s written in the law,” says Karen Levy, a professor of information science at Cornell University. “It just has to do with how hard or how expensive it is to learn stuff about people.” When mobile phones became widespread, gathering data about people got much cheaper, but making use of that data remained difficult. Powerful LLMs could change that.

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Subscribe now Already a subscriber? Sign in
You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Subscribe now Already a subscriber? Sign in

Worries over how LLMs could facilitate mass surveillance recently made headlines around the world. According to reporting from the New York Times and the Atlantic, contract negotiations between Anthropic and the US Department of Defense fell apart in late February because Anthropic balked when the DOD demanded leeway to use the company’s models to analyze commercially available data on US citizens. When Anthropic’s rival OpenAI agreed to a DOD deal mere hours later, OpenAI faced an immediate wave of public backlash for apparently swanning past Anthropic’s red lines. Under pressure, OpenAI and the DOD later revised the contract terms.

For avid followers of Anthropic CEO Dario Amodei, the company’s firm stance probably didn’t come as a surprise. In a lengthy essay published to his personal website in January, Amodei had argued that AI-enabled mass surveillance could constitute a crime against humanity. The core concern underlying his dispute with the DOD was that the government might use LLM-based systems such as Claude to analyze reams of data obtained from brokers and build detailed profiles of individual Americans at scale.

There’s plenty of precedent for AI being used for mass surveillance: Most notably, governments worldwide use facial recognition to track citizens and noncitizens alike, and recent reporting indicates that US Immigrations and Customs Enforcement (ICE) agents have leaned heavily on facial recognition apps in order to carry out the Trump administration’s mass deportation campaign. While there’s not yet any smoking-gun evidence that the US government (or anyone else) is using LLMs to conduct surveillance in the way that Amodei warns about, there’s a clear appetite for such capabilities.

Artificial analysts

The sort of surveillance against which Amodei cautions is only possible in the United States because of a legal loophole. If the police suspect you of a crime and want to peruse the location data stored on your phone to see if you were present at the scene, they need a signed warrant from a judge. That’s because the Fourth Amendment of the Constitution protects anyone in the United States from “unreasonable searches” by the government.

But when the government buys bulk data from brokers, it isn’t itself searching—it’s taking advantage of searches conducted by the people who collected and compiled the data. That creates a paradox: The government can’t look at the location information on your phone without a warrant, but if a dataset that the government has purchased contains your phone’s location data, and the government is able to link it to you, then it can effectively perform an end run around the Fourth Amendment.

The good news is that finding your information in these databases probably isn’t as easy as just searching for your name. The data that brokers sell is often stripped of obvious identifiers—it might contain, for example, location traces from millions of cell phones but not the corresponding phone numbers. But that’s not an insurmountable obstacle. A 2019 New York Times investigation of bulk cell phone location data found that it was often possible to identify the owners of individual phones by making note of their apparent work and home locations.

Even though deanonymizing data does take some effort, it’s safely within the skill set of intelligence analysts, or indeed any competent internet user, and law enforcement has used ostensibly anonymized location data to tie people to specific crimes. But those are focused searches. While the government, or other organizations, might be able to access data that describes the locations of tens or hundreds of millions of Americans, it would take an utterly impractical number of human analysts to tie all of that data to specific individuals. AI agents could potentially do it faster and more cheaply.

Advertisement

“One way to look at the kerfuffle with Anthropic is that the DOD wants to be able to exploit this [commercial data] loophole to the max,” says Greg Nojeim, director of the security and surveillance project at the Center for Democracy & Technology.

There’s evidence indicating that LLM agents are up to the job. At the start of this year, Northeastern University professor Tianshi Li shared a particularly ironic example. Using an LLM agent, Li analyzed a publicly available Anthropic dataset that consisted of interviews with several scientists about how they use AI. Anthropic had redacted some personally identifying information in the scientists’ responses, but the agent that Li used was able to connect descriptions of some of the subjects’ research with specific studies that they had authored. Though the agent only managed to identify a fraction of the interviewed scientists, when it did succeed it was fast and cheap: Each attempt took about four minutes and cost less than fifty cents. Anthropic did not respond to a request for comment for this story.

Other studies have shown that LLMs can match pseudonymous forum accounts to LinkedIn profiles; identify writers’ native languages; isolate potentially identifying information from a user’s online post history; and infer social media users’ psychological traits, locations, incomes, sexes, and ages, among other attributes. While some of these tasks could be completed by the average person if given adequate time, others, such as native-language identification, would be challenging for anyone but an expert. “In practice, they’re doing what a competent investigator would do,” wrote Nico Dekens, senior vice president of engineering at the intelligence software company ShadowDragon, in an email to MIT Technology Review.

All of these results suggest that agents could give an unskilled worker the capabilities of a team of highly trained intelligence analysts. “[An agent] can gather information on its own and it can make plans, so it’s not like a static search query,” Li says. “It both lowers the barrier to entry and maybe pushes the limits of surveillance even farther.”

If political leaders wanted to quash dissent or punish opponents at scale, the combination of bulk data and countless virtual analysts could enable them to do so. A team of LLM agents might be able to identify the real people behind social media accounts that express negative views about the government or leverage a location dataset to compile a list of people present at a protest. Those in power could then make their lives difficult. “If government agents want to harass people, there are many opportunities to do so,” says Darrell West, a senior fellow at the Brookings Institution.

In the United States, such harassment might take subtle forms—being pulled aside for excessive screenings at the airport, for example. Elsewhere, the consequences could be more drastic. Members of China’s Uighur ethnic minority, for example, have been extensively surveilled by their government for years, and police may choose to investigate individuals based on surveillance data. For Uighurs, such investigations can result in internment and forced labor. And China appears to be interested in integrating LLMs into its surveillance system: An unsecured dataset discovered last year on a Baidu server indicates that Chinese companies are using LLMs to flag online posts for the purpose of “public opinion monitoring,” which is a priority for the Chinese government.

All of this is made worse by the fact that LLMs make mistakes. The advantage of using LLMs for mass surveillance is that they can do far more work than human analysts far more quickly, but that also makes thoroughly checking their work impossible. And because mass surveillance is, by its very nature, secretive, some who fall victim to such errors may not have any recourse.

Privacy on a precipice

For now, these threats are theoretical. It’s almost impossible to determine how the US intelligence community is using LLMs in any detail: While most government agencies are required to report how they use AI, intelligence agencies are exempt. And the companies that provide tools that the government might use to conduct surveillance are cagey about the details of their tech, at least in public materials.

Advertisement

“There are legitimate reasons for secrecy about how informational assets are being obtained and used for intelligence or defense purposes,” Nojeim says. “But the amount of secrecy that surrounds this use is particularly troubling because the tech is so powerful and so new and so difficult for Congress to oversee.”

That said, there are some suggestive signs about how the government might be using LLMs. For example, government agencies, including ICE and the Drug Enforcement Administration hold subscriptions to ShadowDragon’s software, and, according to Dekens, the company is currently working on incorporating LLMs into the tools it offers. “LLM agents are already very good at the mechanical side of analysis,” he wrote. “Right now, the most effective way to use them is as a copilot and workflow layer.”

And the government could certainly take advantage of those capabilities, as it has access not only to commercially available bulk data but also to proprietary datasets compiled by individual agencies. Historically, those datasets had been siloed in different agencies—the Internal Revenue Service had your tax data and the Centers for Medicare & Medicaid Services had your health records, and they didn’t share. Last year, however, the Elon Musk–led Department of Government Efficiency reportedly mounted a crusade to centralize all that data. With the data in one place and powerful AI tools at their fingertips, members of government agencies can, in principle, construct detailed profiles of anyone.

DOGE’s data-centralization efforts and the LLM-enabled acceleration of analyst work are two sides of the same coin. In principle, both changes could help the government operate more effectively and economically. But there’s a cost. “I think there’s sometimes an assumption that inefficiency is always bad,” says Levy. “But in privacy, you actually really want things to be hard.”

Few organizations would choose inefficient procedures of their own volition, but Congress could force the government down that path. Shortly after the Anthropic debacle, a bipartisan group of senators and representatives introduced a bill that would require the government to obtain a warrant before purchasing data from data brokers. Public outcry, too, seems to have had an effect: After OpenAI was overwhelmed by opprobrium for accepting DOD contract terms that Anthropic had rejected, the company and the Pentagon modified the contract to include additional surveillance protections.

But government surveillance is not the only concern. Private companies could just as easily purchase bulk data and analyze it with LLM agents, and they are less subject to legal constraints and public opposition, especially if they aren’t household names. If it is indeed possible for LLM agents to build detailed profiles of large numbers of individuals using bulk data, companies could use those capabilities to investigate job applicants or determine whether someone is insurable. “It is very, very hard to hold to account companies that are doing whatever they want to with our data,” Levy says. “It’s hard to even know what’s happening.”

In the absence of legislation preventing such uses, we might need to rethink how we understand our own privacy. It has always been possible that someone online might unearth your address or connect you with your pseudonymous accounts, but given the effort that would take, it was easy to feel safe. Even in the wake of Edward Snowden’s 2013 revelations about the National Security Agency’s extensive surveillance of US citizens, many people reassured themselves that their privacy was still intact because the government had no reason to look into their lives.

That kind of privacy depends entirely on friction: the time and effort required to link a secret social media account with its real-life owner, or the skill and resources needed to analyze bulk datasets. Stay under the radar, and no one will care enough to overcome that friction. But LLM agents could lessen that effort, or remove it entirely. If the government and other organizations can construct detailed profiles of millions of people at the drop of a hat, no one is beneath their notice.

Advertisement

This is your last free story.
Sign in Subscribe now

Your daily newsletter about what’s up in emerging technology from MIT Technology Review.

Please, enter a valid email.
Privacy Policy
Submitting...
There was an error submitting the request.
Thanks for signing up!

Our most popular stories

Advertisement