• multiple roles of d/copyright

    From Simon McVittie@21:1/5 to Scott Kitterman on Thu Feb 10 14:30:01 2022
    On Tue, 08 Feb 2022 at 08:59:23 -0500, Scott Kitterman wrote:
    From my point of view, treating something like other common classes of RC bugs
    means that the project is producing tools and processes to make detection of such bugs more automated to remove them from the archive, that developers are actively looking for them, and that they are routinely fixed in the normal course of Debian development.

    I think part of the problem here might be that copyright information is "social", not "technical": software authors can claim copyright and/or authorship in various forms of human-readable, free-form text, which means
    any automated detection is necessarily going to be imperfect, and as long
    as our policy demands perfection, there will be a reluctance to automate
    this (or at least a reluctance to say that we are automating it).

    Another part of the problem is that licensing and copyright-information
    bugs are not something that we are realistically going to find through
    normal use of software: if GTK crashes when you print on a Tuesday, one
    of our users will eventually notice, but if we have missed a copyright
    holder, it's unlikely that anyone is going to notice that omission from
    the list of around 400 potential copyright holders in <https://tracker.debian.org/media/packages/g/gtk4/copyright-4.6.0ds1-3>
    unless they repeat the time-consuming process of collecting possible
    copyright claims from the source code (as the ftp team presumably do). I
    have no idea how the maintainers of larger and more complicated packages
    manage to do this, or how the ftp team manage to review larger and more complicated packages in a finite time.

    I think the copyright file is doing several things which are perhaps in conflict:

    * It lets consumers of packages know what restrictions apply to their
    use of a package
    - This requires *most* of the license information, although not
    necessarily all of it: for example if a package like Linux is licensed
    under a mixture of GPL, LGPL, BSD and MIT licenses, it's usually
    sufficient to be aware of the most restrictive of those licenses, in
    this case GPL
    - Having too much information, however, well-intentioned, actually works
    against this by making it harder to find what you need
    - I would argue that requiring the text of licenses like the CC family
    to be inlined into the copyright file works against this goal, by
    reducing the signal-to-noise ratio: if you are not familiar with a
    particular license, then obviously you will need to read its text
    to see what it means, but if you are looking at packages that have
    content under various semi-common licenses, you only need to read
    each license once
    - I would argue that requiring lists of copyright holders in the same
    file to be inlined into the copyright file also works against this
    goal, again by harming the signal-to-noise ratio

    * It lets consumers of packages know that the package is DFSG-compliant
    - Same requirements as above

    * It's a place to reproduce information that licenses require us to, like
    a comprehensive set of copyright notices (if our interpretation of the
    applicable licenses is that pointing to nearby source code and calling
    it extremely comprehensive accompanying documentation is insufficient)
    - In this role, it's essentially write-only: we're doing this because
    we have been required to do it, more than because it's practically
    useful, and I don't expect anyone to actually read this, except for
    the maintainer when collecting it and the ftp team when verifying
    that it has been collected
    - In another subthread, Stephan Lachnit suggests using the SPDX format
    for this write-only information, which I think might be intended as
    a way to eventually separate it from the other roles of d/copyright

    * It gives authors due credit (which we are not *required* to do, but
    in previous discussions of d/copyright I've seen this cited as a reason
    why we *should* do this in order to be good citizens)
    - Note that collecting copyright holders is not necessarily actually
    helpful here, because that often means we are required to "credit"
    an employer, rather than mentioning the actual author
    - In a medium-sized package like GTK, it's not clear to me that a list of
    about 400 possible copyright holders is actually serving this purpose,
    because any individual contributor is lost in the noise

    * It lets us meet our self-imposed rules
    - This is circular, so I'm inclined to disregard it when discussing what
    the rules should be: we should set rules because they help us to
    achieve a goal, rather than for the sake of having rules

    * It lets the ftp team (or other interested reviewers) duplicate the
    info-collecting process to check that all of the above have been done
    - This is somewhat circular, because this is a way to support the other
    goals, not really a goal in its own right

    * Are there other relevant goals that I've missed here?

    I don't think conflating those goals and assuming they all need to be
    satisfied by a single file is necessarily going to lead to meeting any
    of those goals in an efficient way, let alone meeting all of them in
    an efficient way.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Kitterman@21:1/5 to All on Thu Feb 10 15:10:01 2022
    On Thursday, February 10, 2022 8:26:23 AM EST Simon McVittie wrote:
    On Tue, 08 Feb 2022 at 08:59:23 -0500, Scott Kitterman wrote:
    From my point of view, treating something like other common classes of RC bugs means that the project is producing tools and processes to make detection of such bugs more automated to remove them from the archive,
    that developers are actively looking for them, and that they are
    routinely fixed in the normal course of Debian development.

    I think part of the problem here might be that copyright information is "social", not "technical": software authors can claim copyright and/or authorship in various forms of human-readable, free-form text, which means any automated detection is necessarily going to be imperfect, and as long
    as our policy demands perfection, there will be a reluctance to automate
    this (or at least a reluctance to say that we are automating it).

    Another part of the problem is that licensing and copyright-information
    bugs are not something that we are realistically going to find through
    normal use of software: if GTK crashes when you print on a Tuesday, one
    of our users will eventually notice, but if we have missed a copyright holder, it's unlikely that anyone is going to notice that omission from
    the list of around 400 potential copyright holders in <https://tracker.debian.org/media/packages/g/gtk4/copyright-4.6.0ds1-3> unless they repeat the time-consuming process of collecting possible copyright claims from the source code (as the ftp team presumably do). I
    have no idea how the maintainers of larger and more complicated packages manage to do this, or how the ftp team manage to review larger and more complicated packages in a finite time.

    I think the copyright file is doing several things which are perhaps in conflict:

    * It lets consumers of packages know what restrictions apply to their
    use of a package
    - This requires *most* of the license information, although not
    necessarily all of it: for example if a package like Linux is licensed
    under a mixture of GPL, LGPL, BSD and MIT licenses, it's usually
    sufficient to be aware of the most restrictive of those licenses, in
    this case GPL
    - Having too much information, however, well-intentioned, actually works
    against this by making it harder to find what you need
    - I would argue that requiring the text of licenses like the CC family
    to be inlined into the copyright file works against this goal, by
    reducing the signal-to-noise ratio: if you are not familiar with a
    particular license, then obviously you will need to read its text
    to see what it means, but if you are looking at packages that have
    content under various semi-common licenses, you only need to read
    each license once
    - I would argue that requiring lists of copyright holders in the same
    file to be inlined into the copyright file also works against this
    goal, again by harming the signal-to-noise ratio

    * It lets consumers of packages know that the package is DFSG-compliant
    - Same requirements as above

    * It's a place to reproduce information that licenses require us to, like
    a comprehensive set of copyright notices (if our interpretation of the
    applicable licenses is that pointing to nearby source code and calling
    it extremely comprehensive accompanying documentation is insufficient)
    - In this role, it's essentially write-only: we're doing this because
    we have been required to do it, more than because it's practically
    useful, and I don't expect anyone to actually read this, except for
    the maintainer when collecting it and the ftp team when verifying
    that it has been collected
    - In another subthread, Stephan Lachnit suggests using the SPDX format
    for this write-only information, which I think might be intended as
    a way to eventually separate it from the other roles of d/copyright

    * It gives authors due credit (which we are not *required* to do, but
    in previous discussions of d/copyright I've seen this cited as a reason
    why we *should* do this in order to be good citizens)
    - Note that collecting copyright holders is not necessarily actually
    helpful here, because that often means we are required to "credit"
    an employer, rather than mentioning the actual author
    - In a medium-sized package like GTK, it's not clear to me that a list of
    about 400 possible copyright holders is actually serving this purpose,
    because any individual contributor is lost in the noise

    * It lets us meet our self-imposed rules
    - This is circular, so I'm inclined to disregard it when discussing what
    the rules should be: we should set rules because they help us to
    achieve a goal, rather than for the sake of having rules

    * It lets the ftp team (or other interested reviewers) duplicate the
    info-collecting process to check that all of the above have been done
    - This is somewhat circular, because this is a way to support the other
    goals, not really a goal in its own right

    * Are there other relevant goals that I've missed here?

    I don't think conflating those goals and assuming they all need to be satisfied by a single file is necessarily going to lead to meeting any
    of those goals in an efficient way, let alone meeting all of them in
    an efficient way.

    smcv

    How about it enables the project to comply with license requirements? I may have missed it, but I don't see that on your list. Like it or not, copyright is a thing and licenses (and our compliance with their requirements) are the only things that give us the right to distribute the packages that make up Debian.

    Policy 4.5.1 did relax the requirements around listing copyright holders (from the upgrading checklist):

    The copyright information for files in a package must be copied
    verbatim into "/usr/share/doc/PACKAGE/copyright" when

    1. the distribution license for those files requires that copyright
    information be included in all copies and/or binary
    distributions;

    2. the files are shipped in the binary package, either in source or
    compiled form; and

    3. the form in which the files are present in the binary package
    does not include a plain text version of their copyright
    notices.

    While I think listing all the copyright holders is a good idea (for the
    reasons you mention above) and do so, even when legally it's not required, the Debian policy requirement is now the minimum needed to satisfy license requirements.

    Scott K
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE53Kb/76FQA/u7iOxeNfe+5rVmvEFAmIFG/IACgkQeNfe+5rV mvGjRBAAyHnWBX87LLXmchCVXAwYq3rfTTkXt/8D0J9unN3naHlG+BMp6wq4vfHF LjwwiI7bxxSgmP84TRefxix6Z71JO5Sd8sJD/rFw43l1tR6raRu4vXRG4mcRGpvu oc4MxKumhXL3LNKtWt6QqSW52pvYjKdam60n4Pq0qJbUhSOuIR0HI1b3VqDUtJcj AXOKuG8RqZnctLyOL69Ox6fm0moY8szfNDrpzzqdtpIhhAy0uEih7zcGPC1qc+HF sLEcgJ78b7uRViP0WmZ9ALvSkGkhM+lg9fxIiGn01Rxq2SUEFILXflUku9xaOa/U iSq5KHre4rovAH5Sc3mrGQMbWQtJjDl6cBxWjm71edPX5wTmcLU0P4Y3T6DJiB3h jTLA91BlVVRl56bzv1GnQRR5f/zMWqlfEa+bf8T12RknAhyYBADcdHmEVOHpOBOf GYWD0j469JtbgIo8uFO7/x98rWz/eCor8qmXGkqjTgQ7qmEVPcXCraGbKGkfkfcx dOflFH7aFWxeKIvqX4XO9k1HrUZMWBtkm+CZo4CV413i0z3clWvi9S6zf5oWk+Oo xys7DLgYkAqVDeNgIHimTJABfx6RPGkTX/m2+Wze2WWY3A21N+roPik+/tXITdAe 2i/nskzWP/298nZQ7QhDs9aujzLruELpSwfvAUzxrK+pIKrSVFU=
    =6vYb
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Wanderer@21:1/5 to Scott Kitterman on Thu Feb 10 15:20:01 2022
    This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
    On 2022-02-10 at 09:06, Scott Kitterman wrote:

    On Thursday, February 10, 2022 8:26:23 AM EST Simon McVittie wrote:

    I think the copyright file is doing several things which are perhaps in
    conflict:

    * It's a place to reproduce information that licenses require us to, like
    a comprehensive set of copyright notices (if our interpretation of the
    applicable licenses is that pointing to nearby source code and calling
    it extremely comprehensive accompanying documentation is insufficient)

    How about it enables the project to comply with license requirements? I may have missed it, but I don't see that on your list.

    Isn't that the above paragraph?

    --
    The Wanderer

    The reasonable man adapts himself to the world; the unreasonable one
    persists in trying to adapt the world to himself. Therefore all
    progress depends on the unreasonable man. -- George Bernard Shaw


    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEJCOqsZEc2qVC44pUBKk1jTQoMmsFAmIFHYkACgkQBKk1jTQo MmvmYA//Wt9XPD3V6ZVtaMRWYA4yrAaLE80nHCvlz7llUvV8i+5FcpRd7kQ4rX2y MDbr99hyPMGODeV8runTIDsiPy83/x271xWca1Zx7sp40SButZ2+0BIpmgg6coIG PQup4JxeuseLzp21ntTpJCDaBfFbayomPChP8T9tfRvn7fINe20yK/OTIfjRAuFQ a07s7oL5e2S3BatAUmUEtEGOLWq13assfcxxsF26Nw9jU0XDJkA/51tFIq5EMd2H HdUtr7gc7zu9aG8ckskTeJm14mh4Of1gwzu/om07N1vCEZl9y7v6UYWVkK/W/j1v LCVK3BAr9HzjFX8EINL85w7Pw4CrvYnhomG1VfOLbbVtq1hei0PqQCBzcnVYIK93 wWIsZgtUkBE9WEPaRAv1DJ6GB54tD9mGM12WtK8hGQyoL9XL9pP4udyVdSfaABt1 8pc0Ij7n1CqCozK53uE1NclE2Fp6XwrQDfrL2/Q7ILer2qSqlN1xsKgKUv5+T9/3 uTO6IxyWN8/CcBbDwsiUehILGk7gue6rRi1tgAe/PddkDrDpOBzJidTy5NJ/XRD1 T9OS8yWRixmn48Rnwsf2wQC7ue03HIRKfFbT+e9aW5SuBvzsYMPZJUOQBMgt/s7B rqcDc/+mWMbD+hMNh/5h/Z1uCQES
  • From Scott Kitterman@21:1/5 to All on Thu Feb 10 15:30:01 2022
    On Thursday, February 10, 2022 9:13:29 AM EST The Wanderer wrote:
    On 2022-02-10 at 09:06, Scott Kitterman wrote:
    On Thursday, February 10, 2022 8:26:23 AM EST Simon McVittie wrote:
    I think the copyright file is doing several things which are perhaps in
    conflict:

    * It's a place to reproduce information that licenses require us to, like >>
    a comprehensive set of copyright notices (if our interpretation of the >> applicable licenses is that pointing to nearby source code and calling >> it extremely comprehensive accompanying documentation is insufficient)

    How about it enables the project to comply with license requirements? I may have missed it, but I don't see that on your list.

    Isn't that the above paragraph?

    It is. Thanks.

    Scott K

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE53Kb/76FQA/u7iOxeNfe+5rVmvEFAmIFH3kACgkQeNfe+5rV mvHaURAAkJt3NKtQNZgxqt4P7MlayMd4TgrOKF6nSrmmZ92E9n0ilRsm3qyOBotZ DZPL51RkxqH3CMNn8ppcb5a7GF4x65yyoBf8sySntBOPZdGjXhbrhuyJMdhTyu0m fAOb8+d9gCxOyXqpzlL2DwRCKNGqkkiZsk+lshHqrbjwbAOggzEHCFT2yYWOP579 YJhO9HwcdoIUYHf+MFWRhSFhEH/EdExjeqgA0XXX55FwjiNKwhA/XZychtlbc5l8 u39fBmVckmrgivvBF2YQplNi6cq+YpjP5k+XRBPdDy2eyoW9Gp6O/foLhltLVE1s AecOThminCTkF2YLCcCf+3Nd8YaJL+8zvwXv17VL+L035+/6VsvgtgM0aH22QT0T +VlsKrj7MkA1IeMyN/1YCb7qBb92CZPzE0AAnAe5YFsDoxddaFEEwloijdapgHMn zNzlUj2CxZqrhQwoc2egwAnRaTHhvASK3APIL3T//dsaQsmu4rK7RXOS6Vi1QLtc XzCESMD8kSEthOF7S5QWSeeLiU0YvprVivYgY+d83Ma7IrvR2GnJNbN+U0qlfg3T dupsD8QN4GGbDXyZDk1J6SmJY4U+VRF8woJaH+WavweY/9oVDVPsKD7M8ViIHddd Sdp1VzNsDrXWP/w1bwShwHj9AoKVJ0ICDAJd4ANkRWmJCI8fZ5o=
    =lKJU
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)