Skip to content

【Hackathon 8th No.36】Add gme-Qwen2-VL for PaddleMix #1103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 27, 2025

Conversation

ZhijunLStudio
Copy link
Contributor

@ZhijunLStudio ZhijunLStudio commented Mar 10, 2025

Add gme-Qwen2-VL for PaddleMix

[New Feature] Implement GmeQwen2VL Model and Multimodal Inference Pipeline

Changes Proposed

  1. Model Architecture & Inference Pipeline Implementation

    • Added the text, image, and fused embedding computation modules of the GmeQwen2VL model to paddlemix/models.
    • Integrated a multimodal retrieval task example in paddlemix/examples, including text-to-image similarity calculation and information retrieval workflows.
  2. Functional Enhancements

    • Supports custom instructions for text embeddings (e.g., "Find an image that matches the given text.").
    • Provides fused embedding interfaces (get_fused_embeddings()) and encoding functions for queries/databases (encode_queries() / encode_corpus()).
  3. Performance Validation

    • Achieved performance comparable to the original repository through internal testing, with validated accuracy in similarity calculations (see documentation for test results).

Copy link

paddle-bot bot commented Mar 10, 2025

Thanks for your contribution!

lyuwenyu
lyuwenyu previously approved these changes Mar 10, 2025
@lyuwenyu lyuwenyu merged commit 486b496 into PaddlePaddle:develop Mar 27, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor PaddlePaddle Hackathon 飞桨黑客松活动issue与PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants