Project 10- Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem #29877
PraroopChanda
started this conversation in
Google Summer of Code
Replies: 1 comment 2 replies
-
I sent you my initial proposal draft on your email id, Thank You |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @rajeshgangireddy @adrianboguszewski @mlukasze
Hope you're doing well!
I am Praroop, currently a masters student at Texas A&M , focused on computer vision, multi-model learning and Generative AI.
I am highly interested in the project 10 - (Fine-tuning Vision Language Models (VLMs) for Object Detection and Hierarchical Classification using the OpenVINO Ecosystem)
I did some preliminary research and settled down on GroundDINO, I set up the code base and ran a small fine tune on KITTI dataset, training only the decoder layer.
I used NVIDIA A100 GPU and keeping the batch size small to 6, training was using 9~10 GB of VRAM.
You can find the GitHub repo with the setup and initial detection results here: - https://github.com/PraroopChanda/GroundDINO_FineTune
Further I am planning to: -
Would really love to know your thoughts on this.
Attaching a couple of preliminary visual results below:
Best,
Praroop Chanda
https://praroopchanda.github.io/
Beta Was this translation helpful? Give feedback.
All reactions