Here’s what’s really going on inside an LLM’s neural network

3 3 minutes read

Aurich Lawson | Getty Images

With most computer programs—even complex ones—you can meticulously trace through the code and memory usage to figure out why that program generates any specific behavior or output. That’s generally not true in the field of generative AI, where the non-interpretable neural networks underlying these models make it hard for even experts to figure out precisely why they often confabulate information, for instance.

Now, new research from Anthropic offers a new window into what’s going on inside the Claude LLM’s “black box.” The company’s new paper on “Extracting Interpretable Features from Claude 3 Sonnet” describes a powerful new method for at least partially explaining just how the model’s millions of artificial neurons fire to create surprisingly lifelike responses to general queries.

Opening the hood

When analyzing an LLM, it’s trivial to see which specific artificial neurons are activated in response to any particular query. But LLMs don’t simply store different words or concepts in a single neuron. Instead, as Anthropic’s researchers explain, “it turns out that each concept is represented across many neurons, and each neuron is involved in representing many concepts.”

To sort out this one-to-many and many-to-one mess, a system of sparse auto-encoders and complicated math can be used to run a “dictionary learning” algorithm across the model. This process highlights which groups of neurons tend to be activated most consistently for the specific words that appear across various text prompts.

Enlarge / The same internal LLM “feature” describes the Golden Gate Bridge in multiple languages and modes.

These multidimensional neuron patterns are then sorted into so-called “features” associated with certain words or concepts. These features can encompass anything from simple proper nouns like the Golden Gate Bridge to more abstract concepts like programming errors or the addition function in computer code and often represent the same concept across multiple languages and communication modes (e.g., text and images).

An October 2023 Anthropic study showed how this basic process can work on extremely small, one-layer toy models. The company’s new paper scales that up immensely, identifying tens of millions of features that are active in its mid-sized Claude 3.0 Sonnet model. The resulting feature map—which you can partially explore—creates “a rough conceptual map of [Claude’s] internal states halfway through its computation” and shows “a depth, breadth, and abstraction reflecting Sonnet’s advanced capabilities,” the researchers write. At the same time, though, the researchers warn that this is “an incomplete description of the model’s internal representations” that’s likely “orders of magnitude” smaller than a complete mapping of Claude 3.

A simplified map shows some of the concepts that are "near" the "inner conflict" feature in Anthropic's Claude model. — Enlarge / A simplified map shows some of the concepts that are “near” the “inner conflict” feature in Anthropic’s Claude model.

Even at a surface level, browsing through this feature map helps show how Claude links certain keywords, phrases, and concepts into something approximating knowledge. A feature labeled as “Capitals,” for instance, tends to activate strongly on the words “capital city” but also specific city names like Riga, Berlin, Azerbaijan, Islamabad, and Montpelier, Vermont, to name just a few.

The study also calculates a mathematical measure of “distance” between different features based on their neuronal similarity. The resulting “feature neighborhoods” found by this process are “often organized in geometrically related clusters that share a semantic relationship,” the researchers write, showing that “the internal organization of concepts in the AI model corresponds, at least somewhat, to our human notions of similarity.” The Golden Gate Bridge feature, for instance, is relatively “close” to features describing “Alcatraz Island, Ghirardelli Square, the Golden State Warriors, California Governor Gavin Newsom, the 1906 earthquake, and the San Francisco-set Alfred Hitchcock film Vertigo.”

Some of the most important features involved in answering a query about the capital of Kobe Bryant's team's state. — Enlarge / Some of the most important features involved in answering a query about the capital of Kobe Bryant’s team’s state.

Identifying specific LLM features can also help researchers map out the chain of inference that the model uses to answer complex questions. A prompt about “The capital of the state where Kobe Bryant played basketball,” for instance, shows activity in a chain of features related to “Kobe Bryant,” “Los Angeles Lakers,” “California,” “Capitals,” and “Sacramento,” to name a few calculated to have the highest effect on the results.

Source link

Haiti to replace national police chief in effort to counter gang violence | Police News

House of the Dragon fans point out major inconsistency in upcoming civil war

Apple joins the race to find an AI icon that makes sense

A new UI for sharing content via Google Messages is being pushed out in waves

iOS 18 can do math anywhere, using the keyboard

UK royal Kate Middleton makes first public appearance since cancer revealed | Health News

Samsung US Father’s Day deals: Galaxy S24 Ultra, Tab S9+ and Galaxy Book4 Edge

News Weekly: Galaxy Watch FE launch, massive Galaxy Z Fold 6 leak, and more

Baldur’s Gate 3 devs completely rewrote Shadowheart to “dial down” her sassiness

Who was behind the plot to steal Graceland?

Here’s what’s really going on inside an LLM’s neural network

Opening the hood

Real Hacker Staff

These apps ruled over US residents’ screens and wallets in 2023

Gmail’s unsubscribe button finally arrives on Android

Apple finally acknowledges a change made to a major app with iOS 16

Rivian R2, R3, R3X: Price, Specs, Release Date

7 Best Ways To Build Generational Wealth, According to Experts

Haiti to replace national police chief in effort to counter gang violence | Police News

China earthquake: More than 100 killed after quake in Gansu Province

Coal miners lead paleontologists to partial mammoth fossil in North Dakota

UN and Humanitarian Partners Seek $46 Billion for Humanitarian Assistance — Global Issues

While three teens rob a store, another thief steals their getaway car

AAC and UltraSense Partner to Create Touch Experience Solutions for Cars and Consumer Electronics

Opening the hood

Truecaller's AI assistant can now clone your voice to answer calls

Apple is challenging the €1.8 billion antitrust fine related to a Spotify complaint

Related Articles

Haiti to replace national police chief in effort to counter gang violence | Police News

China earthquake: More than 100 killed after quake in Gansu Province

Coal miners lead paleontologists to partial mammoth fossil in North Dakota

UN and Humanitarian Partners Seek $46 Billion for Humanitarian Assistance — Global Issues

While three teens rob a store, another thief steals their getaway car

AAC and UltraSense Partner to Create Touch Experience Solutions for Cars and Consumer Electronics