๐ฑ Currently, I am an engineer at OpenDataLab, focusing on Large Vision-Language Models (LVLMs) Data, particularly on document understanding and parsing.
๐ Recent Projects
๐ญ Here is my google scholar
โก Reach out to me: [email protected]
๐ฑ Currently, I am an engineer at OpenDataLab, focusing on Large Vision-Language Models (LVLMs) Data, particularly on document understanding and parsing.
๐ Recent Projects
๐ญ Here is my google scholar
โก Reach out to me: [email protected]
This is a repository for ACMMM22 paper "Exploring Effective Knowledge Transfer for Few-shot Object Detection"
DocLayout-YOLO: An effecient and robust Document Layout Analysis method
Python 2
Forked from Vision-CAIR/MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
Python 1
Forked from haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Python 1