<aside> 👉 The Neuron Explanation project aims to understand what Large Language Models learn during the training process by analyzing what individual neurons are able to recognize in terms of concepts (e.g., cat, building, etc). Specifically, we aim to achieve this goal by associating logical rules to each neuron that express the context in which those concepts are recognized (e.g., (Cat OR Dog) AND NOT person)). The rules are usually extracted by using search algorithms and statistical analysis of neurons activation.

</aside>

Fall 2025 Auditor Project

The Fall 2025 Auditor Project for the neuron explanations project will be to compute, analyze, and compare explanations for neurons in large language models across different layers of one’s choice.

Week 1: AIEA Lab Onboarding

Week 2: Project Onboarding (Get the reference code repository running locally. )

Week 3: Nautilus

Week 4: Explore open source large language models tutorials for sentence classification or q/a.

Week 5: Get the reference code repository running on Nautilus in a headless environment (you will need to be well acquainted with the terminal). Document the process

Week 6: Fine-tune your LLM on NLI and compare results

Week 7: Read the paper and write a small report summarizing it in your own words.

Week 8: Hook LLMs activations

Week 9: Explain 50 neurons in a layer of your choice in your LLM

Week 10: Auditor Offboarding + final report writing (5 pages max)