All

270 repositories

rethinking-hybrid-attention
Public
Rethinking the Role of Efficient Attention in Hybrid Architectures
Python
•
MIT License
•0•1•0•0•Updated Jun 17, 2026Jun 17, 2026
NOSA
Public
The official implementation of NOSA
Python
•
MIT License
•0•17•0•0•Updated Jun 11, 2026Jun 11, 2026
OPD
Public
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
mechanism llms on-policy-distillation
mechanism llms on-policy-distillation
Python
•43•687•2•0•Updated May 30, 2026May 30, 2026
DECO
Public
Source code for paper "DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices".
Python
•
Other
•0•2•0•0•Updated May 12, 2026May 12, 2026
ProactiveAgent
Public
A LLM-based Agent that predict its tasks proactively.
Python
•
Apache License 2.0
•62•612•6•0•Updated May 12, 2026May 12, 2026
LexRel
Public
Python
•0•1•0•0•Updated May 7, 2026May 7, 2026
CPMobius
Public
Python
•
Apache License 2.0
•0•1•1•0•Updated Apr 29, 2026Apr 29, 2026
JustRL
Public
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Python
•13•279•1•0•Updated Apr 18, 2026Apr 18, 2026
hybrid-linear-attention
Public
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
Python
•2•36•1•0•Updated Apr 9, 2026Apr 9, 2026
APB
Public
Official Implementation of APB (ACL 2025 main Oral) and Spava (ACL 2026 main).
C++
•5•37•0•0•Updated Apr 6, 2026Apr 6, 2026
KARL
Public
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Python
•
MIT License
•1•68•1•0•Updated Apr 5, 2026Apr 5, 2026
LexChain
Public
Python
•0•4•1•0•Updated Mar 25, 2026Mar 25, 2026
SE-Bench
Public
Official repo for "SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization"
Python
•
MIT License
•4•28•5•0•Updated Mar 24, 2026Mar 24, 2026
LLMxMapReduce
Public
Python
•
Apache License 2.0
•62•874•0•0•Updated Mar 5, 2026Mar 5, 2026
ACDiT
Public
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
Python
•
MIT License
•1•42•2•0•Updated Jan 29, 2026Jan 29, 2026
KG-Infused-RAG
Public
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
Python
•1•23•0•0•Updated Jan 18, 2026Jan 18, 2026
H-Neurons
Public
The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
Python
•
MIT License
•13•66•1•0•Updated Jan 14, 2026Jan 14, 2026
BlockFFN
Public
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
Python
•5•19•0•0•Updated Jan 10, 2026Jan 10, 2026
LLaVA-UHD
Public
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
Python
•
Apache License 2.0
•20•424•7•0•Updated Dec 20, 2025Dec 20, 2025
ChartCoder
Public
[ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Python
•5•79•3•0•Updated Dec 8, 2025Dec 8, 2025
StateX
Public
The official implementation of the paper "StateX: Enhancing RNN Recall via Post-training State Expansion".
machine-learning memory rnn
machine-learning memory rnn recall ssm mamba linear-attention llm long-context
Python
•0•3•0•0•Updated Oct 24, 2025Oct 24, 2025
AgentRM
Public
[ACL 2025 main] AgentRM: Enhancing Agent Generalization with Reward Modeling
Python
•0•6•1•0•Updated Sep 29, 2025Sep 29, 2025
stuffed-mamba
Public
The code of the paper Stuffed Mamba: Oversized States Lead to the Inability to Forget
machine-learning rnn mamba
machine-learning rnn mamba long-context
Python
•0•1•0•0•Updated Sep 28, 2025Sep 28, 2025
BurstEngine
Public
BurstEngine is an efficient framework designed to train LLMs on long-sequence data.
Python
•3•9•0•0•Updated Sep 25, 2025Sep 25, 2025
cost-optimal-gqa
Public
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"
natural-language-processing transformer attention
natural-language-processing transformer attention long-context llms
Python
•1•4•1•0•Updated Sep 14, 2025Sep 14, 2025
SIR-Bench
Public
Python
•
Apache License 2.0
•0•5•1•0•Updated Sep 12, 2025Sep 12, 2025
Seq1F1B
Public
Sequence-level 1F1B schedule for LLMs.
Python
•
Other
•4.1k•37•1•0•Updated Aug 26, 2025Aug 26, 2025
FR-Spec
Public
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
C++
•3•55•3•0•Updated Jul 15, 2025Jul 15, 2025
TritonBench
Public
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
Python
•
Apache License 2.0
•14•134•4•1•Updated Jun 14, 2025Jun 14, 2025
ClueAnchor
Public
[EMNLP 2025 Findings] ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation
rag llm knowledge-augmentation
rag llm knowledge-augmentation
Python
•0•12•1•0•Updated Jun 11, 2025Jun 11, 2025

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THUNLP

All

All

270 repositories

rethinking-hybrid-attention

NOSA

OPD

DECO

ProactiveAgent

LexRel

CPMobius

JustRL

hybrid-linear-attention

APB

KARL

LexChain

SE-Bench

LLMxMapReduce

ACDiT

KG-Infused-RAG

H-Neurons

BlockFFN

LLaVA-UHD

ChartCoder

StateX

AgentRM

stuffed-mamba

BurstEngine

cost-optimal-gqa

SIR-Bench

Seq1F1B

FR-Spec

TritonBench

ClueAnchor

All

All

Repositories list

270 repositories