Inside the Black Box

Large Language Models (LLMs) often hallucinate. To trust them, we need to see what they are thinking.

The Visualization Pipeline

Extract Activations: Hook into the PyTorch model layers.
Reduce Dimensionality: Use t-SNE or UMAP to project 1024 dimensions to 3D.
Render: WebGL scatter plot.

Neural Map

Python Snippet


python
import torch
from sklearn.manifold import TSNE

# Get hidden states
with torch.no_grad():
    outputs = model(input_ids, output_hidden_states=True)
    hidden_states = outputs.hidden_states[-1]

# Project to 2D
tsne = TSNE(n_components=2)
projected = tsne.fit_transform(hidden_states.numpy())

Download Jupyter Notebook (.ipynb)