Wrong number of trainable parameters printed with `strategy="deepspeed_stage_3"`

## 🐛 Bug

In the beginning of training a table with the number of trainable parameters is printed.
This is not the correct answers (66) when `strategy="deepspeed_stage_3"` (even when using only 1 gpu):

**deepspeed_stage_3:**
<img width="225" alt="ds3" src="https://user-images.githubusercontent.com/12113751/155897470-6927fb41-d3c8-41d9-abca-1667ac115e34.png">

**ddp/deepspeed_stage_1/deepspeed_stage_2:**
<img width="225" alt="ddp" src="https://user-images.githubusercontent.com/12113751/155897427-d88276d0-5ed8-44cb-ae85-9c2fc506f6c5.png">

I'm guessing this is something to do with the parameter shard-ing used in stage3...? 

I'm curious to know which 2 parameters are counted, since there are no other parameters in the model.

### To Reproduce
```
import os
import torch
from torch.utils.data import DataLoader, Dataset
from pytorch_lightning import LightningModule, Trainer

class RandomDataset(Dataset):
    def __init__(self, size, num_samples):
        self.len = num_samples
        self.data = torch.randn(num_samples, size)
    def __getitem__(self, index):
        return self.data[index]
    def __len__(self):
        return self.len
    
class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)
    def forward(self, x):
        return self.layer(x)
    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        return {"loss": loss}
    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)
    
def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    model = BoringModel()
    params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f'TRAINABLE PARAMS: {params}')
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        limit_val_batches=1,
        limit_test_batches=1,
        num_sanity_val_steps=0,
        max_epochs=1,
        enable_model_summary=True,
        strategy="deepspeed_stage_3",
        gpus=[0]
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)

run()
```


### Environment

- pytorch_lightning==1.5.10
- torch==1.10.2
- python==3.8.12
- deepspeed==0.5.10
- OS Linux
- CUDA/cuDNN 10.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong number of trainable parameters printed with `strategy="deepspeed_stage_3"` #12130

🐛 Bug

To Reproduce

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong number of trainable parameters printed with strategy="deepspeed_stage_3" #12130

Description

🐛 Bug

To Reproduce

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Wrong number of trainable parameters printed with `strategy="deepspeed_stage_3"` #12130