How exactly do Language Models perceive time?
This is one of the best papers I've read this year (from Kai Nylund, @ssgrn, @nlpnoah), and here's what it suggests (IMO) 👇
First off, the paper is a joy to read. Lots of dense insights, actually following paths that led to negative results, well linked citations, and repeated explanations of definitions.
Somehow it does this while being concise!
All right let's get into it
The core concept is simple. They:
- take Twitter and news data and segment by year and month
- finetune an LLM on each month/year to get new weights
- subtract these weights from the original weights to get 'time vectors'
This delta can be an interesting proxy for exploration into what the model learned for this time period.
Now what can we do with this vector?
First, we can check that the finetuning works - it does. Model perplexity and F1 strongly suggest that the model improves on task performance when inputs fit within the finetuned time.
also shoutout to them for testing to ensure low contamination on the training and test sets!
What's also interesting is that performance degrades somewhat linearly as you move away in time from the training data.
This holds for months and years, but models trained on a particular month also do relatively well for other months of the same year (diagonal stripes).
I wonder if this is due to semantic similarities (same month name) as opposed to deeper understanding in the model. It would be useful to look at the strength of the delta across different layers to see how deep this is.
What's also interesting is how the vectors are organized.
To me the organization suggests a model of internal time, which is pretty amazing. We still have no idea how time works in the brain, but if we are language-driven learners (like LLMs), and consciousness is the function of a looping inner monologue, there might be similarities.
This is where it gets even more interesting: Once you have the vectors, you can interpolate between them to get better performance on years that you didn't finetune!
Interpolation between TVector 1 and TVector 2 is simple arithmetic - addition with coefficients.
Same as task-specific vectors before it, this might be our next stop before we figure out true transfer learning. If we can interpolate from finetunes, we now have fine-grained (and cheap) control of model outputs, without the cost and time of a finetune.
Conjecture time:
1. This work is done on standard pretrained models (three sizes of T5). If we start training models with constraints that force better clustering of concepts and time in the latent space, likely this approach will lead stronger results.
2. Another interesting avenue of exploration (which I've seen a little in the llama-2 and anthropic research) is looking at model activations as a way to understand time periods being constrained.
If I could I'd bet that the time vectors here have a correlation to those.
3. Currently we rely on models to "figure out" time and concept from the prompt, and activate the right parts. More structure might be useful here, even something like an MoE style router trained to direct between different finetunes of the same model token-to-token.
This is a really interesting segment. Not sure if they mean that they swap the weights with just the deltas, would be amazing if that worked
Swapping individual layers or classes of weights is also a study I'd love to see more data on.
Ever since the Llama-2 paper I've been heavily interested in understanding how language models perceive time - and how we can use this to better control outputs, or even understand our own brains.
For the first time ever we have a general intelligence that we can look inside of.
New understandings about intelligence (at least language based intelligence) will come from our ability to live-edit and play with the weights of these brains - and I strongly suspect (or hope) that some of it will help us understand the human brain.
What a time to be alive 🫨