By using this site, you agree privacy policies
Accept
Geek RoomGeek RoomGeek Room
  • Home
  • Tech
    TechShow More
    Split Technology Park welcomes first tenants: 26 MPSs and 6 startups
    October 31, 2024
    INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans
    October 31, 2024
    Diaspora 4 Innovation: Kick-off event launches a new era for Albanian higher education
    October 31, 2024
    AI for good: Generative AI – Tirana chapter empowers Albanian Youth in tech innovation
    October 29, 2024
    Business Angel Summit 2024: Pioneering Investment and Startup Growth in Sarajevo
    October 29, 2024
  • Mobile
    MobileShow More
    Xiaomi 15 and 15 Pro set to launch on October 29: Official renders released
    October 24, 2024
    Dangerous virus infects millions of mobile phones through popular apps
    October 3, 2024
    The new iPhone 16 arrives in Croatia with a steep price tag
    September 26, 2024
    Beware of these phone numbers: Block them immediately to avoid scams
    September 11, 2024
    Beyond the brand: What really matters when buying a mobile phone
    September 5, 2024
  • Apps
    AppsShow More
    Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content
    October 31, 2024
    Intel prevails in long-running legal battle against €1 billion EU fine
    October 31, 2024
    New definition of open source artificial intelligence released by OSI
    October 29, 2024
    CaSys introduces “Pay by Link” payment service for SMEs in Macedonia
    October 24, 2024
    Kickstarter surpasses $8 billion in donations across all projects
    October 17, 2024
  • Science
    ScienceShow More
    Sofia Tech Park: A thriving innovation hub for Southeast Europe
    October 29, 2024
    Breakthrough in prostate cancer treatment: Croatian scientists develop Vini, a tool to predict effective drug combinations
    October 24, 2024
    Digital Realty partners with Ecolab to pilot AI-powered water conservation solution
    October 24, 2024
    Sofia Tech Park to host the Southeast European Innovators Challenge Conference
    October 11, 2024
    ACG accelerates European growth with major expansion in Croatia
    October 9, 2024
  • Gaming
    GamingShow More
    “Windblown” – The new game from the creators of Dead Cells
    October 24, 2024
    Kraken Empire’s Journey and the creative brilliance of Toy Tactics
    October 21, 2024
    Serbian game studio Tricoman set to make a mark with their new RPG ‘Godforged’ on Steam
    October 16, 2024
    Release the demon with Kill Knight: A phenomenal combat experience with untapped potential
    October 14, 2024
    Nordeus launches new football game “Top Goal: Football Champion” in Serbia
    October 9, 2024
  • Cars
    CarsShow More
    Serbia signs strategic agreement with Hyundai Engineering for 1 GW of Solar Power
    October 16, 2024
    Stara Zagora: Poised to lead Bulgaria’s automotive revolution
    October 15, 2024
    Dacia unveils new Bigster: The flagship model for the C-SUV segment
    October 9, 2024
    Kineton Albania: Pioneering innovation in the automotive industry
    October 8, 2024
    Albania’s vehicle numbers surge in 2024: 73% of registered cars are over 15 years old
    August 20, 2024
  • Entertainment
    EntertainmentShow More
    Where are Generation Z’s famous tech entrepreneurs?
    October 29, 2024
    AllWeb offers special discounts for startups: A unique opportunity for networking and growth
    October 23, 2024
    Montenegro census reveals no ethnic majority, Montenegrins and Serbs nearly equal
    October 16, 2024
    “Primordial Passion” is the first luxury Albanian watch valued at €1.4 million by Argjendari Pirro
    October 15, 2024
    Albania takes the stage at BIG event Paris: Culture and innovation as economic drivers
    October 12, 2024
Search
Reading: MIT researchers advance automated interpretability in AI models
Notification Show More
Aa
Geek RoomGeek Room
Aa
  • Tech
  • Mobile
  • Apps
  • Science
  • Gaming
  • Cars
  • Entertainment
Search
  • Home
  • Tech
  • Mobile
  • Apps
  • Science
  • Gaming
  • Cars
  • Entertainment
Geek Room > Blog > Tech > MIT researchers advance automated interpretability in AI models
Tech

MIT researchers advance automated interpretability in AI models

Last updated: 2024/07/30 at 8:19 PM
Share
10 Min Read

As artificial intelligence models become increasingly prevalent and are integrated into diverse sectors such as health care, finance, education, transportation, and entertainment, understanding how they work under the hood is critical. Interpreting the mechanisms underlying AI models enables us to audit them for safety and biases, with the potential to deepen our understanding of the science behind intelligence itself.

Contents
Neuron by neuronPeeking inside neural networks

Imagine if we could directly investigate the human brain by manipulating each of its individual neurons to examine their roles in perceiving a particular object. While such an experiment would be prohibitively invasive in the human brain, it is more feasible in another type of neural network: one that is artificial. However, somewhat similar to the human brain, artificial models containing millions of neurons are too large and complex to study by hand, making interpretability at scale a very challenging task.

To address this, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers decided to take an automated approach to interpreting artificial vision models that evaluate different properties of images. They developed “MAIA” (Multimodal Automated Interpretability Agent), a system that automates a variety of neural network interpretability tasks using a vision-language model backbone equipped with tools for experimenting on other AI systems.

“Our goal is to create an AI researcher that can conduct interpretability experiments autonomously. Existing automated interpretability methods merely label or visualize data in a one-shot process. On the other hand, MAIA can generate hypotheses, design experiments to test them, and refine its understanding through iterative analysis,” says Tamar Rott Shaham, an MIT electrical engineering and computer science (EECS) postdoc at CSAIL and co-author on a new paper about the research. “By combining a pre-trained vision-language model with a library of interpretability tools, our multimodal method can respond to user queries by composing and running targeted experiments on specific models, continuously refining its approach until it can provide a comprehensive answer.”

The automated agent tackles three key tasks: it labels individual components inside vision models and describes the visual concepts that activate them, it cleans up image classifiers by removing irrelevant features to make them more robust to new situations, and it hunts for hidden biases in AI systems to help uncover potential fairness issues in their outputs. “But a key advantage of a system like MAIA is its flexibility,” says Sarah Schwettmann PhD ’21, a research scientist at CSAIL and co-lead of the research. “We demonstrated MAIA’s usefulness on a few specific tasks, but given that the system is built from a foundation model with broad reasoning capabilities, it can answer many different types of interpretability queries from users, and design experiments on the fly to investigate them.”

Neuron by neuron

In one example task, a human user asks MAIA to describe the concepts that a particular neuron inside a vision model is responsible for detecting. To investigate this question, MAIA first uses a tool that retrieves “dataset exemplars” from the ImageNet dataset, which maximally activate the neuron. For this example neuron, those images show people in formal attire, and closeups of their chins and necks. MAIA makes various hypotheses for what drives the neuron’s activity: facial expressions, chins, or neckties. MAIA then uses its tools to design experiments to test each hypothesis individually by generating and editing synthetic images — in one experiment, adding a bow tie to an image of a human face increases the neuron’s response. “This approach allows us to determine the specific cause of the neuron’s activity, much like a real scientific experiment,” says Rott Shaham.

MAIA’s explanations of neuron behaviors are evaluated in two key ways. First, synthetic systems with known ground-truth behaviors are used to assess the accuracy of MAIA’s interpretations. Second, for “real” neurons inside trained AI systems with no ground-truth descriptions, the authors design a new automated evaluation protocol that measures how well MAIA’s descriptions predict neuron behavior on unseen data.

The CSAIL-led method outperformed baseline methods describing individual neurons in a variety of vision models such as ResNet, CLIP, and the vision transformer DINO. MAIA also performed well on the new dataset of synthetic neurons with known ground-truth descriptions. For both the real and synthetic systems, the descriptions were often on par with descriptions written by human experts.

How are descriptions of AI system components, like individual neurons, useful? “Understanding and localizing behaviors inside large AI systems is a key part of auditing these systems for safety before they’re deployed — in some of our experiments, we show how MAIA can be used to find neurons with unwanted behaviors and remove these behaviors from a model,” says Schwettmann. “We’re building toward a more resilient AI ecosystem where tools for understanding and monitoring AI systems keep pace with system scaling, enabling us to investigate and hopefully understand unforeseen challenges introduced by new models.”

Peeking inside neural networks

The nascent field of interpretability is maturing into a distinct research area alongside the rise of “black box” machine learning models. How can researchers crack open these models and understand how they work?

Current methods for peeking inside tend to be limited either in scale or in the precision of the explanations they can produce. Moreover, existing methods tend to fit a particular model and a specific task. This caused the researchers to ask: How can we build a generic system to help users answer interpretability questions about AI models while combining the flexibility of human experimentation with the scalability of automated techniques?

One critical area they wanted this system to address was bias. To determine whether image classifiers displayed bias against particular subcategories of images, the team looked at the final layer of the classification stream (in a system designed to sort or label items, much like a machine that identifies whether a photo is of a dog, cat, or bird) and the probability scores of input images (confidence levels that the machine assigns to its guesses). To understand potential biases in image classification, MAIA was asked to find a subset of images in specific classes (for example “labrador retriever”) that were likely to be incorrectly labeled by the system. In this example, MAIA found that images of black labradors were likely to be misclassified, suggesting a bias in the model toward yellow-furred retrievers.

Since MAIA relies on external tools to design experiments, its performance is limited by the quality of those tools. But, as the quality of tools like image synthesis models improve, so will MAIA. MAIA also shows confirmation bias at times, where it sometimes incorrectly confirms its initial hypothesis. To mitigate this, the researchers built an image-to-text tool, which uses a different instance of the language model to summarize experimental results. Another failure mode is overfitting to a particular experiment, where the model sometimes makes premature conclusions based on minimal evidence.

“I think a natural next step for our lab is to move beyond artificial systems and apply similar experiments to human perception,” says Rott Shaham. “Testing this has traditionally required manually designing and testing stimuli, which is labor-intensive. With our agent, we can scale up this process, designing and testing numerous stimuli simultaneously. This might also allow us to compare human visual perception with artificial systems.”

“Understanding neural networks is difficult for humans because they have hundreds of thousands of neurons, each with complex behavior patterns. MAIA helps to bridge this by developing AI agents that can automatically analyze these neurons and report distilled findings back to humans in a digestible way,” says Jacob Steinhardt, assistant professor at the University of California at Berkeley, who wasn’t involved in the research. “Scaling these methods up could be one of the most important routes to understanding and safely overseeing AI systems.”

Rott Shaham and Schwettmann are joined by five fellow CSAIL affiliates on the paper: undergraduate student Franklin Wang; incoming MIT student Achyuta Rajaram; EECS PhD student Evan Hernandez SM ’22; and EECS professors Jacob Andreas and Antonio Torralba. Their work was supported, in part, by the MIT-IBM Watson AI Lab, Open Philanthropy, Hyundai Motor Co., the Army Research Laboratory, Intel, the National Science Foundation, the Zuckerman STEM Leadership Program, and the Viterbi Fellowship. The researchers’ findings will be presented at the International Conference on Machine Learning this week.

You Might Also Like

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups

INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans

Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content

Intel prevails in long-running legal battle against €1 billion EU fine

Diaspora 4 Innovation: Kick-off event launches a new era for Albanian higher education

Share This Article
Facebook Whatsapp Whatsapp Copy Link
Previous Article North Korean hackers stealing military secrets, say US and allies
Next Article OpenAI announces SearchGPT, its AI-powered search engine

Social networks

Instagram Follow

Latest news

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups
Tech October 31, 2024
INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans
Tech October 31, 2024
Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content
Apps October 31, 2024
Intel prevails in long-running legal battle against €1 billion EU fine
Apps October 31, 2024

Related articles

Tech

Split Technology Park welcomes first tenants: 26 MPSs and 6 startups

October 31, 2024
Tech

INNVEST Summit 2024: A premier event for innovation and economic competitiveness in the Western Balkans

October 31, 2024
Apps

Shoppable widget by EmbedSocial: Revolutionizing E-commerce with authentic shopper content

October 31, 2024
Apps

Intel prevails in long-running legal battle against €1 billion EU fine

October 31, 2024

About us

Geek Room is dedicated to technology and its enthusiasts through real-time information and videos about the latest innovations. Connect with our staff via email at: [email protected]
For cooperation opportunities, write to us at: [email protected]

Find us:

© 2023 Geekroom All Rights Reserved. Developed by MIMS
adbanner
AdBlock Detected
Our site is an advertising supported site. Please whitelist to support our site.
Okay, I'll Whitelist
Welcome Back!

Sign in to your account

Lost your password?