Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions references/video_classification/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,8 @@ def main(args):

# Data loading code
print("Loading data")
traindir = os.path.join(args.data_path, 'train_avi-480p')
valdir = os.path.join(args.data_path, 'val_avi-480p')
traindir = os.path.join(args.data_path, args.train_dir)
valdir = os.path.join(args.data_path, args.val_dir)
normalize = T.Normalize(mean=[0.43216, 0.394666, 0.37645],
std=[0.22803, 0.22145, 0.216989])

Expand Down Expand Up @@ -203,6 +203,7 @@ def main(args):

print("Creating model")
model = torchvision.models.video.__dict__[args.model](pretrained=args.pretrained)
model.fc.out_features = args.output_classes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite what I meant.

Changing model.fc.out_features doesn't change the number of output planes, because the model weights have already been generated.
What I meant was to do something like

model = torchvision.models.video.__dict__[args.model](pretrained=args.pretrained, num_classes=args.num_classes)

But then, using pretrained=True and num-classes different than 400 will give an error, because the state-dicts are not compatible.

I'd rather not expose this option to the training scripts, as the user can modify the model as they want to perform their type of fine-tuning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fine-tuning on a custom dataset, one loads in a pretrained model with that model's num_class (400 here), and then changes the output classes to their custom dataset.

Though the output planes are different, the biggest benefit from a pretrained model aren't in the fc part of the model, but the convolutional filters that come in the previous parts of the model. The fc parts get retrained when fine-tuning this way.

I'd rather not expose this option to the training scripts, as the user can modify the model as they want to perform their type of fine-tuning.

This change in model.fc.out_features is mandatory when fine-tuning on a custom dataset, so I thought it would make sense to have it in the args. But of course, that's your call :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that this change doesn't actually change anything in the model. The weights are still the same as before, and they are not resized to a different size.

Putting it differently, you'll only predict at most 400 classes in this way.

model.to(device)
if args.distributed and args.sync_bn:
model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
Expand Down Expand Up @@ -274,7 +275,10 @@ def parse_args():
parser = argparse.ArgumentParser(description='PyTorch Classification Training')

parser.add_argument('--data-path', default='/datasets01_101/kinetics/070618/', help='dataset')
parser.add_argument('--train-dir', default='train_avi-480p', help='name of train dir')
parser.add_argument('--val-dir', default='val_avi-480p', help='name of val dir')
parser.add_argument('--model', default='r2plus1d_18', help='model')
parser.add_argument('--output-classes', default=400, help='no. of output classes (if finetuning)')
parser.add_argument('--device', default='cuda', help='device')
parser.add_argument('--clip-len', default=16, type=int, metavar='N',
help='number of frames per clip')
Expand Down