Skip to content

Commit 8ad8bbc

Browse files
authored
Merge pull request #16 from ngocbh/master
add mop and hbct
2 parents 5efe4ad + 7023676 commit 8ad8bbc

File tree

13 files changed

+124
-0
lines changed

13 files changed

+124
-0
lines changed

app/projects/hbct/assets/bct.png

225 KB
Loading

app/projects/hbct/assets/hbct.png

2.9 MB
Loading
130 KB
Loading

app/projects/hbct/assets/tab1.png

684 KB
Loading

app/projects/hbct/assets/vis.png

2.89 MB
Loading

app/projects/hbct/page.mdx

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import { Authors, Badges } from '@/components/utils'
2+
3+
# Learning Along the Arrow of Time: Hyperbolic Geometry for Backward-Compatible Representation Learning
4+
5+
<Authors
6+
authors="Ngoc Bui, Yale University; Menglin Yang, Yale University; Runjin Chen, The University of Texas at Austin; Leonardo Neves, Snap Inc.; Mingxuan Ju, Snap Inc.; Rex Ying, Yale University; Neil Shah, Snap Inc.; Tong Zhao, Snap Inc."
7+
/>
8+
9+
<Badges
10+
venue="ICML 2025"
11+
github="https://github.com/snap-research/hyperbolic_bct"
12+
arxiv="https://arxiv.org/abs/2506.05826"
13+
pdf="https://arxiv.org/pdf/2506.05826"
14+
/>
15+
16+
17+
## Introduction
18+
19+
Modern applications like search, recommendation, and retrieval-augmented generation rely heavily on pretrained embedding models. These models map raw data into vector representations, which are often stored in large vector databases for fast retrieval. However, when a model is updated, its embeddings usually change—breaking compatibility with existing systems unless everything is reprocessed. This is costly and risky, especially when reprocessing involves sensitive data.
20+
21+
**Backward-compatible representation learning** solves this by aligning the representation of the updated model to its predecessor. However, existing compatibility methods aim to align old and new representations in Euclidean space, forcing the new model to match outdated representations of the old model, even if they were suboptimal— thus hindering the new model's learning.
22+
23+
![A typical setting of the backward-compatible training problem. A large gallery set is embedded and indexed into a vector database using the old model. Updating the model may require re-indexing (backfilling) the entire vector database.|scale=0.4](./assets/bct.png)
24+
25+
## Method
26+
27+
We propose a new perspective: treat time as a natural axis of model evolution, and embed representations in hyperbolic space, which naturally handles uncertainty and structure growth.
28+
29+
We introduce Hyperbolic Backward-Compatible Training (HBCT). The idea is simple:
30+
31+
* Lift old and new embeddings into hyperbolic space.
32+
* Keep new embeddings inside the entailment cone of the old ones (preserving compatibility).
33+
* Use a contrastive alignment loss that adjusts based on how uncertain the old embedding is.
34+
35+
![Simulating the model evolution in hyperbolic space with the entailment cone.|scale=0.3](./assets/hbct.png)
36+
37+
This allows the new model to evolve while being aligned with old representations adaptively according to their quality.
38+
39+
## Experiments
40+
41+
We conduct the experiment on two dataset, CIFAR100 and TinyImageNet, and the following scenarios:
42+
43+
- **Extended class**: The old model is trained on 50% of classes in the training datasets. The new model is trained on the full training dataset. The model is kept the same (Resnet18).
44+
- **New architecture**: We can access full training data for old and new models. The model is updated from ResNet18 to ViT-B-16.
45+
46+
![The comparisons of different backward-compatibility methods. The columns **self** and **cross** refer to the original new-to-new and new-to-old retrieval performance (CMC@1 or mAP), respectively. Gray rows are base models without compatibility, and the pink rows indicate the proposed method. *Bold* indicates the best performance and underline indicates the second best method.|scale=0.5](./assets/tab1.png)
47+
48+
As shown in the table aove, HBCT consistently outperforms Euclidean baselines, achieving significantly higher compatibility scores across most settings. On average, it boosts CMC@1 by 21.4% and mAP by 44.8% over the strongest Euclidean competitor. Importantly, these gains in compatibility come with minimal impact on the performance of the updated model.
49+
50+
51+
![Visualization of old and new gallery embeddings in CIFAR100. We compress 128-dimensional embeddings into a 2-dimensional hyperboloid using UMAP and visualize them in the tangent space. The top histogram is the distribution of the uncertainty estimation for those embeddings.|scale=0.4](./assets/vis.png)
52+
53+
68.4 KB
Loading
114 KB
Loading

app/projects/mop/assets/mop.png

747 KB
Loading

app/projects/mop/assets/prompt.png

212 KB
Loading

0 commit comments

Comments
 (0)