Skip to content

Support for large video datasets that contain a few corrupted files #1271

Closed
@ekosman

Description

@ekosman

Hello,
I've been trying to load the Kinetics-400 dataset using the following code (the videos are in mp4 format):

from torchvision.datasets.video_utils import VideoClips
from torchvision.datasets.utils import list_dir
from torchvision.datasets.folder import make_dataset
from torchvision.datasets.vision import VisionDataset
frames_per_clip = 16
step_between_clips = 16
extensions = ('avi', 'mp4')
root = r'/kinetics2/kinetics2/train'
classes = list(sorted(list_dir(root)))
class_to_idx = {classes[i]: i for i in range(len(classes))}
samples = make_dataset(root, class_to_idx, extensions, is_valid_file=None)
video_list = [x[0] for x in samples]
video_clips = VideoClips(video_list, frames_per_clip, step_between_clips)

Apparently, it prints "moov atom not found" and stops. I assume there's a corrupted file but I can't download it again. Is there a solution for skipping these corrupted files?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions