<aside> 👉 The Neuron Explanation project aims to understand what Large Language Models learn during the training process by analyzing what individual neurons are able to recognize in terms of concepts (e.g., cat, building, etc). Specifically, we aim to achieve this goal by associating logical rules to each neuron that express the context in which those concepts are recognized (e.g., (Cat OR Dog) AND NOT person)). The rules are usually extracted by using search algorithms and statistical analysis of neurons activation.
</aside>
The Fall 2025 Auditor Project for the neuron explanations project will be to compute, analyze, and compare explanations for neurons in large language models across different layers of one’s choice.
Week 1: AIEA Lab Onboarding
Week 2: Project Onboarding (Get the reference code repository running locally. )
Week 3: Nautilus
Week 4: Explore open source large language models tutorials for sentence classification or q/a.
Week 5: Get the reference code repository running on Nautilus in a headless environment (you will need to be well acquainted with the terminal). Document the process
Week 6: Fine-tune your LLM on NLI and compare results
Week 7: Read the paper and write a small report summarizing it in your own words.
Week 8: Hook LLMs activations
Week 9: Explain 50 neurons in a layer of your choice in your LLM
Week 10: Auditor Offboarding + final report writing (5 pages max)