What is the purpose of `c_proj` here?

https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L42

Why do we need an additional linear transformation after the MHA and before the MLP when the dimensions are the same?

(I understand that this is how the initial transformer implementation was written, but I took this operation to be for dimension consistency between sequential attention operations. It seems superfluous here since the first linear in the MLP can already take linear combinations of the attention outputs.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the purpose of `c_proj` here? #135

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What is the purpose of c_proj here? #135

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

What is the purpose of `c_proj` here? #135