Skip to content

Added encoding and bits_per_sample to soundfile's backend save() #1274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 23, 2021

Conversation

prabhat00155
Copy link
Contributor

Addresses #1258.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prabhat00155 thanks for working on this. The PR looks good overall.

@mthrok
Copy link
Contributor

mthrok commented Feb 19, 2021

@prabhat00155
Can you merge/rebase on #1285? That should solve the issue with CI.

Copy link
Contributor

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I tested the code and it works fine for wav format.

I have not checked flac/vorbis but for sph format there is an issue for subtype generation. Please refer to the comment.

The following is the script I used.

import torch
import torchaudio.backend._soundfile_backend as backend


data = torch.randn(2, 124, dtype=torch.float32)

encs = [
    (None, 8),
    ('PCM_U', None),
    ('PCM_U', 8),
    ('PCM_S', None),
    ('PCM_S', 16),
    ('PCM_S', 32),
    ('PCM_F', None),
    ('PCM_F', 32),
    ('PCM_F', 64),
    ('ULAW', None),
    ('ULAW', 8),
    ('ALAW', None),
    ('ALAW', 8),
]

for enc, bps in encs:
    path = f'tmp3/{enc}_{bps}.wav'
    print(path)
    backend.save(path, data, sample_rate=8000, encoding=enc, bits_per_sample=bps)

dtypes = [
    # torch.uint8,
    torch.int16,
    torch.int32,
    torch.float32,
]
for dtype in dtypes:
    path = f'tmp3/None_None_{dtype}.wav'
    print(path)
    backend.save(path, data.to(dtype), sample_rate=8000)

# bpss = [
#     # 8
#     # 16,
#     # 24,
#     # 32,
#     None,
# ]
# for bps in bpss:
#     path = f'tmp3/{bps}.flac'
#     backend.save(path, data, sample_rate=8000, encoding=None, bits_per_sample=bps)

encs = [
    ('PCM_S', None),
    ('PCM_S', 16),
    ('PCM_S', 32),
    ('ULAW', None),
    ('ULAW', 8),
    ('ALAW', None),
    ('ALAW', 8),
]

for enc, bps in encs:
    path = f'tmp3/{enc}_{bps}.sph'
    print(path)
    backend.save(path, data, sample_rate=8000, encoding=enc, bits_per_sample=bps)

@mthrok
Copy link
Contributor

mthrok commented Feb 19, 2021

Can you generate sph with ALAW encoding? I realized that the resulting files from the script above cannot be opened with soxi.

soxi tmp3/*.sph
soxi FAIL formats: can't open input file `tmp3/ALAW_8.sph': sph: unsupported coding `alaw'
soxi FAIL formats: can't open input file `tmp3/ALAW_None.sph': sph: unsupported coding `alaw'

Input File     : 'tmp3/PCM_S_16.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 16-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 1.52k
Bit Rate       : 785k
Sample Encoding: 16-bit Signed Integer PCM


Input File     : 'tmp3/PCM_S_32.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 32-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 2.02k
Bit Rate       : 1.04M
Sample Encoding: 32-bit Signed Integer PCM


Input File     : 'tmp3/PCM_S_None.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 32-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 2.02k
Bit Rate       : 1.04M
Sample Encoding: 32-bit Signed Integer PCM


Input File     : 'tmp3/ULAW_8.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 14-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 1.27k
Bit Rate       : 657k
Sample Encoding: 8-bit u-law


Input File     : 'tmp3/ULAW_None.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 14-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 1.27k
Bit Rate       : 657k
Sample Encoding: 8-bit u-law

@prabhat00155
Copy link
Contributor Author

Can you generate sph with ALAW encoding? I realized that the resulting files from the script above cannot be opened with soxi.

soxi tmp3/*.sph
soxi FAIL formats: can't open input file `tmp3/ALAW_8.sph': sph: unsupported coding `alaw'
soxi FAIL formats: can't open input file `tmp3/ALAW_None.sph': sph: unsupported coding `alaw'

Input File     : 'tmp3/PCM_S_16.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 16-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 1.52k
Bit Rate       : 785k
Sample Encoding: 16-bit Signed Integer PCM


Input File     : 'tmp3/PCM_S_32.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 32-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 2.02k
Bit Rate       : 1.04M
Sample Encoding: 32-bit Signed Integer PCM


Input File     : 'tmp3/PCM_S_None.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 32-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 2.02k
Bit Rate       : 1.04M
Sample Encoding: 32-bit Signed Integer PCM


Input File     : 'tmp3/ULAW_8.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 14-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 1.27k
Bit Rate       : 657k
Sample Encoding: 8-bit u-law


Input File     : 'tmp3/ULAW_None.sph'
Channels       : 2
Sample Rate    : 8000
Precision      : 14-bit
Duration       : 00:00:00.02 = 124 samples ~ 1.1625 CDDA sectors
File Size      : 1.27k
Bit Rate       : 657k
Sample Encoding: 8-bit u-law

SoX doesn't seem to support ALAW for sphere, check this: https://sourceforge.net/p/sox/code/ci/master/tree/src/sphere.c#l79.

@mthrok
Copy link
Contributor

mthrok commented Feb 22, 2021

SoX doesn't seem to support ALAW for sphere, check this: https://sourceforge.net/p/sox/code/ci/master/tree/src/sphere.c#l79.

You are right. ffprobe worked. ALAW is working.

ffprobe version 4.2.4-1ubuntu0.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
...
Input #0, nistsphere, from 'tmp3/ALAW_8.sph':
  Duration: 00:00:00.02, bitrate: 656 kb/s
    Stream #0:0: Audio: pcm_alaw, 8000 Hz, 2 channels, s16, 128 kb/s

@mthrok mthrok merged commit b8fd5e9 into pytorch:master Feb 23, 2021
@mthrok
Copy link
Contributor

mthrok commented Feb 23, 2021

@prabhat00155 Thanks! Can you make (cherry-pick) the same commit and make a PR against release/0.8 branch?

prabhat00155 added a commit to prabhat00155/audio that referenced this pull request Feb 23, 2021
prabhat00155 added a commit to prabhat00155/audio that referenced this pull request Feb 23, 2021
mthrok pushed a commit that referenced this pull request Feb 23, 2021
@prabhat00155 prabhat00155 deleted the prabhat00155/modify_save branch February 23, 2021 18:26
mthrok pushed a commit to mthrok/audio that referenced this pull request Feb 26, 2021
* Add fx graph mode ptq static tuttorial

* Add fx graph mode ptq static tuttorial

* Remove `_tutorial` from the name so it doesn't build, will add _tutorial after 1.8

Co-authored-by: Brian Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants