Skip to content

Strange generation speed #391

@ghost

Description

When I use 30B model in 4bit and --chat / --cai-chat my gen speeds are 4-6 times slower. You can see it on caps -
first
second
Is it bug or feature?
My system specs - 12700k, 32gb RAM (tried with 64gigs, but still the same issue), 3090.
OS - Manjaro Linux, cuda 11.7.
To sum up - with --chat and --cai-chat it gens 3it/s even with empty context. Without - 20it/s, stable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions