Graph-and-Geometric-Learning
diff --git a/‎app/projects/faformer/page.mdx
Lines changed: 5 additions & 5 deletions b/‎app/projects/faformer/page.mdx
Lines changed: 5 additions & 5 deletions
diff --git a/‎app/projects/heart/fig/FM.png
189 KB b/‎app/projects/heart/fig/FM.png
189 KB
diff --git a/‎app/projects/heart/fig/arch.png
263 KB b/‎app/projects/heart/fig/arch.png
263 KB
diff --git a/‎app/projects/heart/fig/ehr.png
79.7 KB b/‎app/projects/heart/fig/ehr.png
79.7 KB
diff --git a/‎app/projects/heart/fig/exp.png
195 KB b/‎app/projects/heart/fig/exp.png
195 KB
diff --git a/‎app/projects/heart/fig/objective.png
197 KB b/‎app/projects/heart/fig/objective.png
197 KB
diff --git a/‎app/projects/heart/page.mdx
Lines changed: 75 additions & 0 deletions b/‎app/projects/heart/page.mdx
Lines changed: 75 additions & 0 deletions
diff --git a/‎app/projects/molgroup/fig/arch.png
204 KB b/‎app/projects/molgroup/fig/arch.png
204 KB
diff --git a/‎app/projects/molgroup/fig/bi-level.png
20.9 KB b/‎app/projects/molgroup/fig/bi-level.png
20.9 KB
diff --git a/‎app/projects/molgroup/fig/fingerprint.png
62.2 KB b/‎app/projects/molgroup/fig/fingerprint.png
62.2 KB
@@ -21,7 +21,7 @@ Understanding and predicting how protein forms a complex with nucleic acid/prote
 
 Motivated by this, we propose **contact map prediction-based aptamer screening paradigm**. Specifically, as presented in Figure 2(a), our model is trained to identify specific contact pairs between residues and nucleotides when forming a complex. The maximum contact probability across all pairs is then interpreted as the binding affinity, which is subsequently used for aptamer screening.
 
-![Figure 2: (a) The pipeline of contact map prediction between protein and nucleic acid, and applying the predicted results for screening in an unsupervised manner. (b) Comparison between Transformer with vanilla frame averaging framework and FAFormer, where the blue cells indicate FA-related modules. |scale=0.5](./fig/overview.png)
+![Figure 2: (a) The pipeline of contact map prediction between protein and nucleic acid, and applying the predicted results for screening in an unsupervised manner. (b) Comparison between Transformer with vanilla frame averaging framework and FAFormer, where the blue cells indicate FA-related modules. |scale=0.8](./fig/overview.png)
 
 Learning E(3) equivariant transformation is the key factor to modeling the protein/nucleic acid 3D structures. In this paper, we propose **FAFormer**, an equivariant Transformer architecture that integrates FA as a geometric module within each layer, as shown in Figure 2(b). FA as a geometric component offers flexibility to effectively integrate geometric information into node representations while preserving the spatial semantics of coordinates and without major modicification on the architectures. FAFormer opens new possibilities for designing equivariant architectures in this domain.
 
@@ -33,7 +33,7 @@ Learning E(3) equivariant transformation is the key factor to modeling the prote
 Frame averaging (FA) is an encoder-agnostic framework that can make a given encoder equivariant to the Euclidean symmetry group. Specifically, FA proposes to model
 the coordinates in eight different frames extracted by PCA, achieving equivariance by averaging the encoded representations, as presented in Figure 3. 
 
-![Figure 3: Frame Averaging.|scale=0.5](./fig/fa.png)
+![Figure 3: Frame Averaging.|scale=0.3](./fig/fa.png)
 
 You can consider FA as a model "wrapper", where the model architecture doesn't need to be modified but would seperately process 8 inputs. We use $f_{\mathcal{F}}(\mathbf{X})=\{\mathbf{X}^{(g)}\}_{\mathcal{F}}$ to denote the FA operation, where $\mathbf{X}^{(g)}$ is the input in the $g$-th frame. Besides, we use $f_{\mathcal{F}^{-1}}(\{\mathbf{\hat{X}}^{(g)}\}_{\mathcal{F}})=\hat{X}$ to represent the inverse mapping, which is an E(3) equivarnat operation. Note that $\hat{X}^{(g)}$ could be obtained from the encoder. The outcome could be invariant when simply averaging the representations without inverse matrix.
 
@@ -46,7 +46,7 @@ where $\mathbf{W}_g\in\Bbb{R}^{3\times D}$. Note that the output of FA Linear mo
 
 ### Overall architecture of FAFormer
 
-![Figure 4: Overview of FAFormer architecture. The input consists of the node features, coordinates, and edge representations.|scale=0.5](./fig/faformer.png)
+![Figure 4: Overview of FAFormer architecture. The input consists of the node features, coordinates, and edge representations.|scale=0.8](./fig/faformer.png)
 
 As shown in Figure 4(a), the input of FAFormer comprises the node features $\mathbf{Z}\in\Bbb{R}^{N\times D}$, coordinates $\mathbf{X}\in\Bbb{R}^{N\times 3}$, and edge representations $\mathbf{E}\in\Bbb{R}^{N\times K\times D}$ where $K$ is the number of nearest neighbors. Each core modules are dedicatedly integrated with FA, including 
 
@@ -61,11 +61,11 @@ As shown in Figure 4(a), the input of FAFormer comprises the node features $\mat
 
 This task aims to predict the exact contact pairs between protein and protein/nucleic acids, which conducts binary classification over all pairs. This task is challenge due to the sparsity of the contact pairs. We compare FAFormer with six state-of-the-art models, and the results are presented in Table 1.
 
-![Figure 5: Contact Map Prediction.|scale=0.5](./fig/exp1.png)
+![Figure 5: Contact Map Prediction.|scale=0.7](./fig/exp1.png)
 
 ### Unsupervised Aptamer Screening
 
 This task aims to screen the positive aptamers from a large number of candidates for a given protein target. We quantify the binding affinities between RNA and the protein target as the highest contact probability among the residue-nucleotide pairs. The models are first trained on the protein-RNA complexes training set using the contact map prediction, then the aptamer candidates are ranked based on the calculated highest contact probabilities. Top-10 precision, Top-50 precision, and PRAUC are used as the metrics.
 
-![Figure 6: Aptamer screening.|scale=0.5](./fig/exp2.png)
+![Figure 6: Aptamer screening.|scale=0.7](./fig/exp2.png)
 
@@ -0,0 +1,75 @@
+import { Authors, Badges } from '@/components/utils'
+
+# HEART: Learning Better Representation of EHR Data with a Heterogeneous Relation-Aware Transformer
+
+<Authors
+  authors="Tinglin Huang, Yale University; Syed Asad Rizvi, Yale University; Rohan Krishna Thakur, Yale University; Vimig Socrates, Yale University; Meili Gupta, Yale University; David van Dijk, Yale University; R. Andrew Taylor, Yale University; Rex Ying, Yale University"
+/>
+
+<Badges
+  venue="Journal of Biomedical Informatics 159 (2024): 104741"
+  github="https://github.com/Graph-and-Geometric-Learning/HEART"
+  paper="https://www.sciencedirect.com/science/article/abs/pii/S153204642400159X"
+/>
+
+
+## Introduction
+Electronic health records (EHRs) is a tabular data which digitizes the medical information of an encounter, such as demography, diagnosis, medication, lab results, procedures, as shown in Figure 1:
+
+![Figure 1: Illustration of EHRs.|scale=0.3](./fig/ehr.png)
+
+Many researches focus on distilling meaningful clinical information from cohorts with **foundation model**. Specifically, such models treat medical entities in EHRs as tokens and organize the entities included in the encounters as sentences. These “sentences” can then be encoded by a transformer, allowing the entities to be represented in an embedding space, as shown in Figure 2(a):
+
+![Figure 2: Comparison between current foundation model and ours.|scale=0.8](./fig/FM.png)
+
+However, we argue that the heterogeneous correlations between medical entities are critical for representation but have largely been overlooked. For example, understanding the relationship between "Antibiotics" (medication) and both "Fever" (diagnosis) and "Antibody Tests: Positive" (lab test) enables the model to recommend more clinically plausible drugs.
+
+Motivated by this, we propose **HEART**, a Heterogeneous Relation-Aware Transformer for EHR data, which explicitly parameterizes pairwise representations between entities heterogeneously. Additionally, we introduce a multi-level attention mechanism to mitigate the computational cost associated with multiple visits, as demonstrated in Figure 2(b). Finally, two dedicated pretraining objectives are applied to enhance the model during pretraining.
+
+
+## Method
+
+### Heterogeneous Relation Embedding & Multi-level Attention Scheme
+
+Given a patient, we flatten the corresponding historical visits into several sequences of entities:
+$$
+[[D_1, V_{1,1},\cdots,V_{1,N_1}],\cdots,[D_S, V_{S,1},\cdots,V_{S,N_S}]]
+$$
+where $S$ is the number of visits, $N_i$ is the number of entities in the $i$-th visit, and $D_i$ represents the demography token for the patient in the $i$-th visit. A learnable embedding will be assigned to each entity. Besides the entity embeddings, we explicitly encode the pairwise representation for each entity pair. Specifically, for an entity pair in the same visit $(V_n,V_m)$, we calculate the pairwise embedding $\textbf{R}_{n\leftarrow m}$ as follow: 
+$$
+\textbf{R}_{n}=\text{Linear}_{\tau(V_n)}(\textbf{V}_{n}), \textbf{R}_{m}=\text{Linear}_{\tau(V_m)}(\textbf{V}_{m});\\
+\textbf{R}_{n\leftarrow m}=\text{Linear}(\textbf{R}_{n}||\textbf{R}_{m})
+$$
+where $\text{Linear}_{\tau(\cdot)}$ denotes a type-specific linear transformation. This encoding will operate on each pair of entities in the same visit.
+
+Computation cost will be the one of the biggest challenge to encode these heterogeneous representations. To alleviate this, we implemented a hierarchical encoding scheme to
+combine the encounter-level and entity-level attentions, as shown in Figure 3:
+
+![Figure 3: Frame Averaging.|scale=0.8](./fig/arch.png)
+
+Specifically, as for entity-level context, we conduct attention among the entities within the same visit:
+$$
+[\mathbf{D}',\mathbf{V}_1',\cdots,\mathbf{V}_N']=\text{Entity-Attn}([\mathbf{D}',\mathbf{V}_1',\cdots,\mathbf{V}_N'])
+$$
+Besides, the heterogeneous relation will be introduced as a bias term to refine the attention map and the context to update the entity embeddings. As for the encounter-level context, we limit the attention to demography tokens across all historical encounters:
+$$
+[\mathbf{D}_1',\cdots,\mathbf{D}_S']=\text{Enc-Attn}([\mathbf{D}_1,\cdots,\mathbf{D}_S])
+$$
+
+
+### Pretrained Objective
+
+Most previous approaches adopt masked token prediction (MTP) for pretraining, which replaces actual tokens with [MASK] and performs single-label classification at each masked position. However, MTP is position-dependent and thus not suitable for EHR due to the unordered nature of medical entities. In light of this, we adapt MTP to the missing entity prediction (MEP) task, which is position-agnostic and heterogeneity-aware.  The main idea is to let the model perform multi-label classification based on one [MASK] for each entity type, as shown in Figure 4.
+
+![Figure 4: Comparison between masked token prediction and missing entities prediction.|scale=0.7](./fig/objective.png)
+
+Besides, we also incorporate anomaly detection as an additional pretraining task to encourage the model to identify unrelated entities given a context and to learn more robust representations. Specifically, we replace some of the entities with random entities with the same type to synthesize anomaly data. A binary classifier is applied to predict whether it is an anomaly. 
+
+
+## Downstream tasks
+
+We evaluate HEART across 5 downstream tasks on 2 EHR datasets:
+* Dataset: [MIMIC-III](https://mimic.mit.edu/docs/iii/) and [eICU](https://eicu-crd.mit.edu/about/eicu/).
+* Downstream task: death prediction, prolonged length of stay (PLOS) prediction, readmission prediction, and next diagnosis prediction in 6/12 months.
+
+![Figure 5: Benchmarking.|scale=0.7](./fig/exp.png)