Exploring DeepSeek's new approach to compress context using vision, and how it connects with Claude Sonnet 4.5's memory management capabilities.
An innovative approach
This is a bit geeky but I’m sharing this paper from the new DeepSeek-OCR.
The most interesting thing is not the OCR itself, but the new approach it proposes for storing and compressing context optically.
DeepSeek introduces what they call “contexts optical compression”: converting text into visual representations (vision tokens) that achieve compression ratios of up to 10–20× with hardly any information loss.
The key idea
Using vision as a means of context compression: instead of storing millions of text tokens, render them into images that a model can decode later.
Hierarchical memory for models
This opens the door to something very powerful: hierarchical memory for models, where the most recent context is kept sharp and the oldest progressively “blurs”, simulating natural human forgetting.
Connection with Claude Sonnet 4.5
And interestingly, this connects with what Anthropic is doing with Claude Sonnet 4.5:
They’ve launched context editing and memory tool, two capabilities that allow AI agents to dynamically manage their context.
- Context editing: Automatically removes obsolete information from context (like old tool results) to keep the conversation flowing.
- Memory tool: Allows storing information outside the context, in persistent files, thus creating long-term memory controlled by the developer.
Common goal
Both ideas, DeepSeek’s optical compression and Claude’s dynamic context management, point toward the same goal:
Models capable of reasoning for longer, without losing information or saturating their context limits.
The future of memory in AI
If we can combine these approaches, we could be witnessing the beginning of AI systems with self-organized visual memory, where an agent’s complete history is condensed, remembered, and forgotten intelligently.
References
