• Re: dpkg -b allowed to build with a non-utf8 control file

    From Guillem Jover@21:1/5 to Juanmi Taboada on Sat Apr 8 12:40:01 2023
    Hi!

    On Sun, 2023-03-19 at 17:09:00 +0100, Juanmi Taboada wrote:
    Checking documentation for deb packages, I read that the control file
    should be UTF-8:

    - Reference:
    https://www.debian.org/doc/debian-policy/ch-controlfields.html
    - 5.1 Syntax of control files at the end: *"All control files must be
    encoded in UTF-8."*

    I was able to build a non-utf8 package using *dpkg -b*.

    This was originally reported in Landscape-Client: https://bugs.launchpad.net/landscape-client/+bug/1813442

    Making reference to the first version, '1.0.0.944' of the package "veeam". The report points:
    "The strange character is the U+FFFD � REPLACEMENT CHARACTER."

    I was able to reproduce the problem in Landscape Client, and I discovered
    the error came from a wrong encoding used in the control file.
    I made a wrong encoded description, which reproduced the error on our side.

    Nevertheless, it is not a bug in Landscape but in dpkg, which allowed building a deb package with a wrong encoded control file.

    The dpkg deb822(5) man page has similar wording, I think mostly
    because it was adapted from the Debian policy. So, while I think
    settling on UTF-8 for the only supported encoding makes sense, dpkg
    itself does not really care, and will work with pretty much any
    encoding thrown at it, for the things it cares it restricts itself
    to just ASCII and tries to validate that strictly.

    In this case I think there might be four (or more) potential bugs
    here:

    1) The deb822(5) man page should probably be clarified to distinguish
    what to expect about encodings.
    2) The dpkg-source (et al), dpkg-deb and dpkg might perhaps need to be
    improved to be more strict when parsing, and validating their
    inputs, including encoding.
    3) The affected packages with wrong encoding should get bugs filed
    and fixed.
    4) The landscape client software should ideally cope more gracefully,
    and not fail when confronted with wrongly encoded files? Because
    these can also be generated by something that is not dpkg-deb, as
    people seem to be fond of creating their own .deb packers for their
    build systems and other tooling.

    The broken description package is attached for further study.

    Thanks, I've added an entry to my TODO to handle the above items from
    the dpkg side.

    Regards,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Juanmi Taboada@21:1/5 to guillem@debian.org on Sat Apr 8 13:20:01 2023
    Thanks for the feedback.

    On Sat, 8 Apr 2023, 12:32 Guillem Jover, <guillem@debian.org> wrote:

    Hi!

    On Sun, 2023-03-19 at 17:09:00 +0100, Juanmi Taboada wrote:
    Checking documentation for deb packages, I read that the control file should be UTF-8:

    - Reference:
    https://www.debian.org/doc/debian-policy/ch-controlfields.html
    - 5.1 Syntax of control files at the end: *"All control files must be
    encoded in UTF-8."*

    I was able to build a non-utf8 package using *dpkg -b*.

    This was originally reported in Landscape-Client: https://bugs.launchpad.net/landscape-client/+bug/1813442

    Making reference to the first version, '1.0.0.944' of the package
    "veeam".
    The report points:
    "The strange character is the U+FFFD � REPLACEMENT CHARACTER."

    I was able to reproduce the problem in Landscape Client, and I discovered the error came from a wrong encoding used in the control file.
    I made a wrong encoded description, which reproduced the error on our
    side.

    Nevertheless, it is not a bug in Landscape but in dpkg, which allowed building a deb package with a wrong encoded control file.

    The dpkg deb822(5) man page has similar wording, I think mostly
    because it was adapted from the Debian policy. So, while I think
    settling on UTF-8 for the only supported encoding makes sense, dpkg
    itself does not really care, and will work with pretty much any
    encoding thrown at it, for the things it cares it restricts itself
    to just ASCII and tries to validate that strictly.

    In this case I think there might be four (or more) potential bugs
    here:

    1) The deb822(5) man page should probably be clarified to distinguish
    what to expect about encodings.
    2) The dpkg-source (et al), dpkg-deb and dpkg might perhaps need to be
    improved to be more strict when parsing, and validating their
    inputs, including encoding.
    3) The affected packages with wrong encoding should get bugs filed
    and fixed.
    4) The landscape client software should ideally cope more gracefully,
    and not fail when confronted with wrongly encoded files? Because
    these can also be generated by something that is not dpkg-deb, as
    people seem to be fond of creating their own .deb packers for their
    build systems and other tooling.

    The broken description package is attached for further study.

    Thanks, I've added an entry to my TODO to handle the above items from
    the dpkg side.

    Regards,
    Guillem


    <div dir="auto">Thanks for the feedback. </div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, 8 Apr 2023, 12:32 Guillem Jover, &lt;<a href="mailto:guillem@debian.org">guillem@debian.org</a>&gt; wrote:<br></div><blockquote class="
    gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi!<br>

    On Sun, 2023-03-19 at 17:09:00 +0100, Juanmi Taboada wrote:<br>
    &gt; Checking documentation for deb packages, I read that the control file<br> &gt; should be UTF-8:<br>
    &gt; <br>
    &gt;    - Reference:<br>
    &gt;    <a href="https://www.debian.org/doc/debian-policy/ch-controlfields.html" rel="noreferrer noreferrer" target="_blank">https://www.debian.org/doc/debian-policy/ch-controlfields.html</a><br>
    &gt;    - 5.1 Syntax of control files at the end: *&quot;All control files must be<br>
    &gt;    encoded in UTF-8.&quot;*<br>
    &gt; <br>
    &gt; I was able to build a non-utf8 package using *dpkg -b*.<br>
    &gt; <br>
    &gt; This was originally reported in Landscape-Client:<br>
    &gt; <a href="https://bugs.launchpad.net/landscape-client/+bug/1813442" rel="noreferrer noreferrer" target="_blank">https://bugs.launchpad.net/landscape-client/+bug/1813442</a><br>

    &gt; Making reference to the first version, &#39;1.0.0.944&#39; of the package &quot;veeam&quot;.<br>
    &gt; The report points:<br>
    &gt; &quot;The strange character is the U+FFFD � REPLACEMENT CHARACTER.&quot;<br>
    &gt; <br>
    &gt; I was able to reproduce the problem in Landscape Client, and I discovered<br>
    &gt; the error came from a wrong encoding used in the control file.<br>
    &gt; I made a wrong encoded description, which reproduced the error on our side.<br>
    &gt; <br>
    &gt; Nevertheless, it is not a bug in Landscape but in dpkg, which allowed<br> &gt; building a deb package with a wrong encoded control file.<br>

    The dpkg deb822(5) man page has similar wording, I think mostly<br>
    because it was adapted from the Debian policy. So, while I think<br>
    settling on UTF-8 for the only supported encoding makes sense, dpkg<br>
    itself does not really care, and will work with pretty much any<br>
    encoding thrown at it, for the things it cares it restricts itself<br>
    to just ASCII and tries to validate that strictly.<br>

    In this case I think there might be four (or more) potential bugs<br>
    here:<br>

     1) The deb822(5) man page should probably be clarified to distinguish<br>
        what to expect about encodings.<br>
     2) The dpkg-source (et al), dpkg-deb and dpkg might perhaps need to be<br>
        improved to be more strict when parsing, and validating their<br>
        inputs, including encoding.<br>
     3) The affected packages with wrong encoding should get bugs filed<br>
        and fixed.<br>
     4) The landscape client software should ideally cope more gracefully,<br>
        and not fail when confronted with wrongly encoded files? Because<br>
        these can also be generated by something that is not dpkg-deb, as<br>
        people seem to be fond of creating their own .deb packers for their<br>     build systems and other tooling.<br>

    &gt; The broken description package is attached for further study.<br>

    Thanks, I&#39;ve added an entry to my TODO to handle the above items from<br> the dpkg side.<br>

    Regards,<br>
    Guillem<br>
    </blockquote></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)