• YouTube audio: levels, spectrum, sampling ...

    From J. P. Gilliver@21:1/5 to All on Fri Aug 4 02:24:23 2023
    OK, not strictly broadcast, but YouTube is almost a broadcast channel.

    I use yt-dlp a lot, usually on default settings (which AIUI usually gets
    the best available), and then an extractor set to "extract original
    audio" (I use Pazera, as it's easy to be sure it's extracting original
    without any further transcoding; however, it's just ffmpeg-based, and I
    presume any other similar would yield the same result). [Yes, I looked
    into using the audio-only settings for yt-dlp, but they didn't easily
    lend themselves to batching; besides, I sometimes _do_ want the video
    too - clip of an artist performing, and I want the audio-only one for
    use in the car. My muscle memory of the keystrokes to extract the audio
    means I can do it in seconds anyway.] I usually look at the resultant
    audio - sometimes with the intention of reducing the filesize, sometimes
    just out of curiosity. (I use GoldWave, but I presume almost any other
    similar utility - such as Audacity - would yield similar results.)

    Several observations:

    1. The _vast_ majority are coded at 44100 Hz, stereo. I suppose that -
    "CD quality" - is the default setting for many capture/encoding devices,
    but it does seem overkill for mono material, especially of considerable
    age (such as from 78s). Still, I'm not surprised. (I very occasionally
    find one that _has_ been encoded mono. Though I don't think I've seen
    any encoded at less than 44100 - certainly if I have, it's been
    extremely rare.)

    2. The _level_ is often extremely low - especially for some old (say
    1960-1999) video clips. (Not all, by any means - but often enough to be noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or occasionally ×16, to get the peaks above 50% full scale. (I only use
    powers of 2 to avoid distortion.) Is this something YouTube are
    imposing? Is audio level adjustment difficult on some common piece of
    video capture hardware/software? I even came across one recently where
    the uploader _said_ something like "this is quiet, you may have to
    adjust" in the notes, so s/he knew about it. This does seem odd.

    3. (This is the one that finally prompted me to post.) Far more often
    than not - I'd say over 90% of tracks - there's a visible (I can no
    longer hear that high) tone around 15½ kHz. I presume in the majority of
    cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
    for NTSC; even where it's not from an actual video source, I presume it
    has picked it up somewhere in the processing, e. g. from a computer monitor/graphics card. This is _not_ what's puzzling me. What is, is
    that the spectrum is very often brickwalled at that line: even where the
    actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be surprised how much _does_ have nothing valid above those!), and the
    remainder is just uniform noise - it cuts off at the line. Can anyone
    think why? It's nowhere near the Nyquist limit of 22050; I could
    understand a rolloff _towards_ that to avoid aliasing, and that rolloff
    being gentle to avoid other adverse effects, but no, it's brickwalled,
    and at the line (which is _well_ below).
    --
    J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

    I admire him for the constancy of his curiosity, his effortless sense of authority and his ability to deliver good science without gimmicks.
    - Michael Palin on Sir David Attenborough, RT 2016/5/7-13

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to J. P. Gilliver on Fri Aug 4 12:00:55 2023
    J. P. Gilliver <G6JPG@255soft.uk> wrote:
    2. The _level_ is often extremely low - especially for some old (say 1960-1999) video clips. (Not all, by any means - but often enough to be noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or occasionally ×16, to get the peaks above 50% full scale. (I only use
    powers of 2 to avoid distortion.) Is this something YouTube are
    imposing? Is audio level adjustment difficult on some common piece of
    video capture hardware/software? I even came across one recently where
    the uploader _said_ something like "this is quiet, you may have to
    adjust" in the notes, so s/he knew about it. This does seem odd.

    I suspect people aren't doing a lot of postprocessing: recording from
    analogue using their soundcard/etc, uploading the clips to YT, YT does the transcoding but isn't changing levels. It may be people are using line in inputs from analogue sources and not setting levels correctly, I don't know.

    3. (This is the one that finally prompted me to post.) Far more often
    than not - I'd say over 90% of tracks - there's a visible (I can no
    longer hear that high) tone around 15½ kHz. I presume in the majority of cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
    for NTSC; even where it's not from an actual video source, I presume it
    has picked it up somewhere in the processing, e. g. from a computer monitor/graphics card. This is _not_ what's puzzling me. What is, is
    that the spectrum is very often brickwalled at that line: even where the actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be surprised how much _does_ have nothing valid above those!), and the
    remainder is just uniform noise - it cuts off at the line. Can anyone
    think why? It's nowhere near the Nyquist limit of 22050; I could
    understand a rolloff _towards_ that to avoid aliasing, and that rolloff
    being gentle to avoid other adverse effects, but no, it's brickwalled,
    and at the line (which is _well_ below).

    Maybe YT have a filter to block timebase frequencies? In the early days
    there was a lot of material uploaded from VHS (pre-2010 YT videos are often 240p or similar), which I suspect is where it's coming from on your
    examples. I wouldn't be surprised if the timebase leaked onto the audio
    track, but contemporary VHS hardware couldn't play it back so people didn't notice. They might do today, hence a reason to filter it out. And, as
    these mega-platforms go, it's easier to have a one-size-fits-all policy than
    to do any tailoring to the input material.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J. P. Gilliver@21:1/5 to Theo on Fri Aug 4 15:06:16 2023
    In message <J9j*zAZmz@news.chiark.greenend.org.uk> at Fri, 4 Aug 2023
    12:00:55, Theo <theom+news@chiark.greenend.org.uk> writes
    J. P. Gilliver <G6JPG@255soft.uk> wrote:
    2. The _level_ is often extremely low - especially for some old (say
    1960-1999) video clips. (Not all, by any means - but often enough to be
    noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or
    occasionally ×16, to get the peaks above 50% full scale. (I only use
    powers of 2 to avoid distortion.) Is this something YouTube are
    imposing? Is audio level adjustment difficult on some common piece of
    video capture hardware/software? I even came across one recently where
    the uploader _said_ something like "this is quiet, you may have to
    adjust" in the notes, so s/he knew about it. This does seem odd.

    I suspect people aren't doing a lot of postprocessing: recording from >analogue using their soundcard/etc, uploading the clips to YT, YT does the

    I suspect you're right there ...

    transcoding but isn't changing levels. It may be people are using line in >inputs from analogue sources and not setting levels correctly, I don't know.

    ... and there.

    3. (This is the one that finally prompted me to post.) Far more often
    than not - I'd say over 90% of tracks - there's a visible (I can no
    longer hear that high) tone around 15½ kHz. I presume in the majority of
    cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
    for NTSC; even where it's not from an actual video source, I presume it
    has picked it up somewhere in the processing, e. g. from a computer
    monitor/graphics card. This is _not_ what's puzzling me. What is, is
    that the spectrum is very often brickwalled at that line: even where the
    actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be
    surprised how much _does_ have nothing valid above those!), and the
    remainder is just uniform noise - it cuts off at the line. Can anyone
    think why? It's nowhere near the Nyquist limit of 22050; I could
    understand a rolloff _towards_ that to avoid aliasing, and that rolloff
    being gentle to avoid other adverse effects, but no, it's brickwalled,
    and at the line (which is _well_ below).

    Maybe YT have a filter to block timebase frequencies? In the early days >there was a lot of material uploaded from VHS (pre-2010 YT videos are often >240p or similar), which I suspect is where it's coming from on your
    examples. I wouldn't be surprised if the timebase leaked onto the audio >track, but contemporary VHS hardware couldn't play it back so people didn't >notice. They might do today, hence a reason to filter it out. And, as
    these mega-platforms go, it's easier to have a one-size-fits-all policy than >to do any tailoring to the input material.

    Theo

    But that's the point: (A) I'd say 40-60% of clips _do_ have the timebase whistle, or at least _some_ peak between 15 and 16 kHz, so it's _not_
    being filtered - and (B) the _rest_ of the content is brickwalled _at_
    that tone. In other words, there is content (often mostly noise) _up_ to
    that tone, and zero above it:

    /\/\/\/\/\ |
    ---------|
    |
    |________

    Where /\/\ is the meaningful content, ----- is just noise, | is the
    tone, and _ is the nothing, up to Nyquist.

    It was/is the brickwalling that I find puzzling. Sure, if there had been
    a _notch_ around the two timebase frequencies, or a rolloff starting
    _below_ them. But brickwalling _at_ (but not including!) the tone seems
    very odd.

    Have a look at some.
    --
    J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

    After all is said and done, usually more is said.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Gaff@21:1/5 to J. P. Gilliver on Sat Aug 5 10:14:53 2023
    I think you expect too much of people. Most of those with footage tend to do what they always do and just put that up. If you really wanted to do the processing its easy enough, but most just don't bother. I tend to know if I
    get anything off line be it podcast, Youtube or whatever and use Goldwave
    to fix stuff and resave it at the bit rate I like. If you want to hear brick wall filtering, any pop concert recently recorded by the bbc is like that. After all FM never had anything above 15khz except noise most of the time.


    I have a few custom effects saved in Goldwave, one is superfast gain
    update, which effectively compresses the dynamic range. There are a few
    that only compress peaks, and some for matching the upper levels without clipping. I also have quite a few parametric settings like presence reduce,
    and tizzy enhance for those under the blanket recordings. There are very
    light touch noise reductions from cclipboard as well, and some custom
    pop/click ones to clean up crackles.

    I also made a wide spatial stereo one which can enhance some stereo live recordings a lot, and do not cause the distortion in those in the stereo centre.

    Brian

    --

    --:
    This newsgroup posting comes to you directly from...
    The Sofa of Brian Gaff...
    briang1@blueyonder.co.uk
    Blind user, so no pictures please
    Note this Signature is meaningless.!
    "J. P. Gilliver" <G6JPG@255soft.uk> wrote in message news:mAnum6OHNFzkFwHA@255soft.uk...
    OK, not strictly broadcast, but YouTube is almost a broadcast channel.

    I use yt-dlp a lot, usually on default settings (which AIUI usually gets
    the best available), and then an extractor set to "extract original audio"
    (I use Pazera, as it's easy to be sure it's extracting original without
    any further transcoding; however, it's just ffmpeg-based, and I presume
    any other similar would yield the same result). [Yes, I looked into using
    the audio-only settings for yt-dlp, but they didn't easily lend themselves
    to batching; besides, I sometimes _do_ want the video too - clip of an
    artist performing, and I want the audio-only one for use in the car. My muscle memory of the keystrokes to extract the audio means I can do it in seconds anyway.] I usually look at the resultant audio - sometimes with
    the intention of reducing the filesize, sometimes just out of curiosity.
    (I use GoldWave, but I presume almost any other similar utility - such as Audacity - would yield similar results.)

    Several observations:

    1. The _vast_ majority are coded at 44100 Hz, stereo. I suppose that - "CD quality" - is the default setting for many capture/encoding devices, but
    it does seem overkill for mono material, especially of considerable age
    (such as from 78s). Still, I'm not surprised. (I very occasionally find
    one that _has_ been encoded mono. Though I don't think I've seen any
    encoded at less than 44100 - certainly if I have, it's been extremely
    rare.)

    2. The _level_ is often extremely low - especially for some old (say 1960-1999) video clips. (Not all, by any means - but often enough to be noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or occasionally ×16, to get the peaks above 50% full scale. (I only use
    powers of 2 to avoid distortion.) Is this something YouTube are imposing?
    Is audio level adjustment difficult on some common piece of video capture hardware/software? I even came across one recently where the uploader
    _said_ something like "this is quiet, you may have to adjust" in the
    notes, so s/he knew about it. This does seem odd.

    3. (This is the one that finally prompted me to post.) Far more often than not - I'd say over 90% of tracks - there's a visible (I can no longer hear that high) tone around 15½ kHz. I presume in the majority of cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750 for NTSC; even where it's not from an actual video source, I presume it has picked it up somewhere in the processing, e. g. from a computer monitor/graphics card. This is _not_ what's puzzling me. What is, is that the spectrum is very
    often brickwalled at that line: even where the actual valid material is
    all below 15, 12, 11, 10, 8, or 6 kHz (you'd be surprised how much _does_ have nothing valid above those!), and the remainder is just uniform
    noise - it cuts off at the line. Can anyone think why? It's nowhere near
    the Nyquist limit of 22050; I could understand a rolloff _towards_ that to avoid aliasing, and that rolloff being gentle to avoid other adverse
    effects, but no, it's brickwalled, and at the line (which is _well_
    below).
    --
    J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

    I admire him for the constancy of his curiosity, his effortless sense of authority and his ability to deliver good science without gimmicks.
    - Michael Palin on Sir David Attenborough, RT 2016/5/7-13

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brian Gaff@21:1/5 to Theo on Sat Aug 5 10:19:06 2023
    If you have an mp3 file, then mp3gain can change the levels without another pass through encode decode making the sound lumpy and gritty, as it seems
    its phase and levels that lossy encoding affects mostly.
    Brian

    --

    --:
    This newsgroup posting comes to you directly from...
    The Sofa of Brian Gaff...
    briang1@blueyonder.co.uk
    Blind user, so no pictures please
    Note this Signature is meaningless.!
    "Theo" <theom+news@chiark.greenend.org.uk> wrote in message news:J9j*zAZmz@news.chiark.greenend.org.uk...
    J. P. Gilliver <G6JPG@255soft.uk> wrote:
    2. The _level_ is often extremely low - especially for some old (say
    1960-1999) video clips. (Not all, by any means - but often enough to be
    noticeable.) By low, I mean I have to boost them by ×4, or even ×8 or
    occasionally ×16, to get the peaks above 50% full scale. (I only use
    powers of 2 to avoid distortion.) Is this something YouTube are
    imposing? Is audio level adjustment difficult on some common piece of
    video capture hardware/software? I even came across one recently where
    the uploader _said_ something like "this is quiet, you may have to
    adjust" in the notes, so s/he knew about it. This does seem odd.

    I suspect people aren't doing a lot of postprocessing: recording from analogue using their soundcard/etc, uploading the clips to YT, YT does the transcoding but isn't changing levels. It may be people are using line in inputs from analogue sources and not setting levels correctly, I don't
    know.

    3. (This is the one that finally prompted me to post.) Far more often
    than not - I'd say over 90% of tracks - there's a visible (I can no
    longer hear that high) tone around 15½ kHz. I presume in the majority of
    cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
    for NTSC; even where it's not from an actual video source, I presume it
    has picked it up somewhere in the processing, e. g. from a computer
    monitor/graphics card. This is _not_ what's puzzling me. What is, is
    that the spectrum is very often brickwalled at that line: even where the
    actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be
    surprised how much _does_ have nothing valid above those!), and the
    remainder is just uniform noise - it cuts off at the line. Can anyone
    think why? It's nowhere near the Nyquist limit of 22050; I could
    understand a rolloff _towards_ that to avoid aliasing, and that rolloff
    being gentle to avoid other adverse effects, but no, it's brickwalled,
    and at the line (which is _well_ below).

    Maybe YT have a filter to block timebase frequencies? In the early days there was a lot of material uploaded from VHS (pre-2010 YT videos are
    often
    240p or similar), which I suspect is where it's coming from on your
    examples. I wouldn't be surprised if the timebase leaked onto the audio track, but contemporary VHS hardware couldn't play it back so people
    didn't
    notice. They might do today, hence a reason to filter it out. And, as
    these mega-platforms go, it's easier to have a one-size-fits-all policy
    than
    to do any tailoring to the input material.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J. P. Gilliver@21:1/5 to Brian Gaff on Sat Aug 5 12:01:16 2023
    In message <ual3ui$1mkiv$1@dont-email.me> at Sat, 5 Aug 2023 10:14:53,
    Brian Gaff <brian1gaff@gmail.com> writes
    I think you expect too much of people. Most of those with footage tend to do >what they always do and just put that up. If you really wanted to do the >processing its easy enough, but most just don't bother. I tend to know if I

    Indeed.

    get anything off line be it podcast, Youtube or whatever and use Goldwave
    to fix stuff and resave it at the bit rate I like. If you want to hear brick

    Actually, I rarely want to _change_ the sound of things I have
    downloaded - I just like to _look_ at them out of curiosity. (I do
    resave at lower bit rates *and sample rates* if they're fundamentally
    far too high, as it just offends me to have something that I can see is
    mono saved as stereo, or that has nothing above 8 or 10 kHz saved at
    44100.) I _used_ to do it to make smaller files; nowadays the cost of
    storage is so low that that's not that important, though I still do it.

    (I'd always understood that the algorithms look at stereo difference,
    and if there isn't much, should produce a smaller file - or, you can
    specify a lower bitrate and still get the same quality; however,
    manually telling it to encode as mono if I can see it's mono anyway,
    seems to more or less half the filesize, so that aspect of data
    compression isn't having that much effect.)

    wall filtering, any pop concert recently recorded by the bbc is like that. >After all FM never had anything above 15khz except noise most of the time.

    You'd think they'd set the brick wall to remove timebase whistle, if
    that's the case, though.

    I have a few custom effects saved in Goldwave, one is superfast gain
    update, which effectively compresses the dynamic range. There are a few
    that only compress peaks, and some for matching the upper levels without

    Doesn't the built-in "Maximise" function do that? (I use it to _assess_
    the maximum level [and find when it is if not obvious], but I cancel it,
    as it would involve a non-binary gain adjustment.)

    clipping. I also have quite a few parametric settings like presence reduce, >and tizzy enhance for those under the blanket recordings. There are very >light touch noise reductions from cclipboard as well, and some custom >pop/click ones to clean up crackles.

    You're obviously a more sophisticated user than I am. I think the only
    ones I've custom-saved are 11 kHz and 5500 Hz brickwalls that I use if
    I'm going to half or quarter the sample rate of something that has noise
    (but no significant signal) above those (to avoid aliasing it down), and
    a ×4 gain.
    []
    I mainly use it just to look, rather than change, other than the
    recoding as mono or lower sample rate.

    (As for the purists who say _any_ recoding is further distortion - I
    accept this in theory, but on the whole find one such produces nothing I
    can hear; if I decide I want to apply some subsequent adjustment, I go
    back to the original file. Same as jpg images. When I'm trying things
    out in GoldWave, of course, I work in its memory where it's not
    encoded.)
    --
    J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

    Science isn't about being right every time, or even most of the time. It is about being more right over time and fixing what it got wrong.
    - Scott Adams, 2015-2-2

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Williamson@21:1/5 to J. P. Gilliver on Sat Aug 5 13:16:14 2023
    On 05/08/2023 12:01, J. P. Gilliver wrote:

    You'd think they'd set the brick wall to remove timebase whistle, if
    that's the case, though.

    What frequency of line whistle, though? On the current video streaming services, line whistle may be as low as 10 kHz or as high as 20 kHz,
    depending on the original video standard. While the high end isn't
    likely to be a problem, the lower end may mask wanted signals.

    It's not a new problem, one very famous album, much admired for its
    audio quality, has a constant scan whistle on most of the tracks from
    the monitors used on the studio computers.



    --
    Tciao for Now!

    John.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J. P. Gilliver@21:1/5 to John Williamson on Sat Aug 5 23:04:26 2023
    In message <kj6ssfF83n0U1@mid.individual.net> at Sat, 5 Aug 2023
    13:16:14, John Williamson <johnwilliamson@btinternet.com> writes
    On 05/08/2023 12:01, J. P. Gilliver wrote:

    You'd think they'd set the brick wall to remove timebase whistle, if
    that's the case, though.

    What frequency of line whistle, though? On the current video streaming >services, line whistle may be as low as 10 kHz or as high as 20 kHz, >depending on the original video standard. While the high end isn't
    likely to be a problem, the lower end may mask wanted signals.

    Most of the material I'm interested in is from say late 1950s up to
    about turn of the century - almost all SD video, 15625 or 15750 Hz. (I
    think anything early enough to have been on film - or system A [10125
    Hz, which I do remember!] - has probably been converted to SD video well
    before it got to YouTube.)

    It's not a new problem, one very famous album, much admired for its
    audio quality, has a constant scan whistle on most of the tracks from
    the monitors used on the studio computers.

    At the other end, I remember an LP of Bob Newhart I borrowed from the
    library - late '70s or early '80s - had very noticeable mains buzz,
    presumably noticeable to me as being harmonics of 60 Hz, not of 50 which
    I'd probably developed a comb filter for.

    (Was the whistle on the album in question 15xxx Hz, or higher?)

    --
    J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

    I hope you dream a pig.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Paste@21:1/5 to J. P. Gilliver on Mon Aug 21 09:00:21 2023
    On Friday, 4 August 2023 at 02:29:02 UTC+1, J. P. Gilliver wrote:

    3. (This is the one that finally prompted me to post.) Far more often
    than not - I'd say over 90% of tracks - there's a visible (I can no
    longer hear that high) tone around 15½ kHz. I presume in the majority of cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
    for NTSC; even where it's not from an actual video source, I presume it
    has picked it up somewhere in the processing, e. g. from a computer monitor/graphics card. This is _not_ what's puzzling me. What is, is
    that the spectrum is very often brickwalled at that line: even where the actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be surprised how much _does_ have nothing valid above those!), and the remainder is just uniform noise - it cuts off at the line. Can anyone
    think why? It's nowhere near the Nyquist limit of 22050; I could
    understand a rolloff _towards_ that to avoid aliasing, and that rolloff being gentle to avoid other adverse effects, but no, it's brickwalled,
    and at the line (which is _well_ below).


    AFAIU the MP4 (-f 140 is generally the highest quality in that
    codec) audio downloads have a frequency roll-off at around that
    frequency (16kHz iirc). This can be seen in audacity using the
    spectrogram setting; right-click on the vertical kHz bar and
    select "zoom to fit"). I have started to download audio in OPUS
    (-f 251) as it doesn't seem to have this roll-off. Then
    batch-converting files with FFMPEG (thank you to those in here
    who helped me with this in the past!) to MP3 for wider
    compatibility with my various media players.

    If I am downloading videos (usually music videos) I use
    the -f 140 + (whatever 1080 in h.264 video is, I can't remember
    offhand) as this offers best compatibility with media players in
    terms of video.

    Observations: MP4 audio is 44.1 kHz whilst OPUS is 48 kHz. The
    MP3s that FFMPEG create retain the 48 kHz sampling rate, and I
    have had no compatibility problems with any of my players.

    I will include the data from MediaInfo if you are interested. (I
    also like the way YT-DLP includes the YT address ID as part of
    the file name. Very handy!). I hope I haven't wasted your time
    with this reply.

    MP4 file:

    General
    Complete name : C:\David's really great computer\yt-dlp\MP4 Kali Uchis – telepatía [Official Audio] [Dwzk-XZxZ4k].m4a
    Format : MPEG-4
    Format profile : Base Media
    Codec ID : isom (isom/iso2/mp41)
    File size : 2.47 MiB
    Duration : 2 min 40 s
    Overall bit rate mode : Constant
    Overall bit rate : 129 kb/s
    Writing application : Lavf60.5.100

    Audio
    ID : 1
    Format : AAC LC
    Format/Info : Advanced Audio Codec Low Complexity Codec ID : mp4a-40-2
    Duration : 2 min 40 s
    Bit rate mode : Constant
    Bit rate : 128 kb/s
    Channel(s) : 2 channels
    Channel layout : L R
    Sampling rate : 44.1 kHz
    Frame rate : 43.066 FPS (1024 SPF)
    Compression mode : Lossy
    Stream size : 2.45 MiB (99%)
    Title : ISO Media file produced by Google Inc.
    Language : English
    Default : Yes
    Alternate group : 1


    OPUS file:

    General
    Complete name : C:\David's really great computer\yt-dlp\OPUS Kali Uchis – telepatía [Official Audio] [Dwzk-XZxZ4k].webm
    Format : WebM
    Format version : Version 4
    File size : 2.57 MiB
    Duration : 2 min 40 s
    Overall bit rate : 134 kb/s
    Writing application : google/video-file
    Writing library : google/video-file

    Audio
    ID : 1
    Format : Opus
    Codec ID : A_OPUS
    Duration : 2 min 40 s
    Channel(s) : 2 channels
    Channel layout : L R
    Sampling rate : 48.0 kHz
    Bit depth : 16 bits
    Compression mode : Lossy
    Language : English
    Default : Yes
    Forced : No

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J. P. Gilliver@21:1/5 to David Paste on Thu Aug 24 00:38:07 2023
    In message <eae07e89-5e85-433d-8edc-30d438332630n@googlegroups.com> at
    Mon, 21 Aug 2023 09:00:21, David Paste <pastedavid@gmail.com> writes
    On Friday, 4 August 2023 at 02:29:02 UTC+1, J. P. Gilliver wrote:

    3. (This is the one that finally prompted me to post.) Far more often
    than not - I'd say over 90% of tracks - there's a visible (I can no
    longer hear that high) tone around 15½ kHz. I presume in the majority of
    cases, it's timebase - 15625 for "PAL" (yes I know, but YKWIM), 15750
    for NTSC; even where it's not from an actual video source, I presume it
    has picked it up somewhere in the processing, e. g. from a computer
    monitor/graphics card. This is _not_ what's puzzling me. What is, is
    that the spectrum is very often brickwalled at that line: even where the
    actual valid material is all below 15, 12, 11, 10, 8, or 6 kHz (you'd be
    surprised how much _does_ have nothing valid above those!), and the
    remainder is just uniform noise - it cuts off at the line. Can anyone
    think why? It's nowhere near the Nyquist limit of 22050; I could
    understand a rolloff _towards_ that to avoid aliasing, and that rolloff
    being gentle to avoid other adverse effects, but no, it's brickwalled,
    and at the line (which is _well_ below).


    AFAIU the MP4 (-f 140 is generally the highest quality in that
    codec) audio downloads have a frequency roll-off at around that
    frequency (16kHz iirc). This can be seen in audacity using the

    That would explain it. Yes, sometimes there is a trace of something
    between (what I assume is) the timebase signal and 16 kHz, i. e. just
    above the solid line.

    I guess it's just coincidence that the MP4 designers chose a cutoff so
    close to (but above) the timebase line.

    spectrogram setting; right-click on the vertical kHz bar and

    (I use GoldWave - I'd probably be using Audacity, but I bought Goldwave
    before Audacity became common and free, and am used to it - and that has
    a spectrogram option. [I usually have one channel set to spectrogram,
    and the other set to X-Y so I can see immediately whether it's stereo or
    not.])

    select "zoom to fit"). I have started to download audio in OPUS
    (-f 251) as it doesn't seem to have this roll-off. Then
    batch-converting files with FFMPEG (thank you to those in here
    who helped me with this in the past!) to MP3 for wider
    compatibility with my various media players.

    I'll admit I mostly just download with yt-dlp's default and extract the "original" audio, which nearly always comes out as [encoded as!] 44.1
    kHz stereo - very occasionally 48 kHz, equally rarely mono (44.1 I
    think).
    []
    --
    J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

    Old soldiers never die - only young ones

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)