• Rethink about who we (Debian) are in rapid dev cycle of deep learning

    From M. Zhou@21:1/5 to All on Thu Jan 13 01:20:01 2022
    Keywords: GPU computing support, AI applications & ML-Policy.

    Deep learning is a new area. From our past discussions, we have already noted that this area introduces many new questions to Debian. For example, the new AI applications may even challenge the definition of free software. In this article I shall share my latest reviews on related topics across multiple domains, reviews on some of my past forecasts, as well as some relevant development advices.

    Note, the whole article only conveys my own personal opinion, and does not represent any official opinion of the Debian Project.

    # Debian's GPU computing support -- how much should we do? ####################

    The recent success of partly depend on the development of GPU, which can compute matrix multiplication hundreds of times faster than a CPU. Thus, GPU computation is very valuable. And intuitively, supporting GPU computation as much as we can from the Debian side is useful and valuable as well. Due to software license issues from some certain vendor, I've been seeking for the boundary for long time -- how much should we do to support a certain type of GPU computation? Now I finally figured out my own answer.

    Debian is merely a _downstream_ in terms of providing GPU support for the end-users. As long as the upstream is willing to give us chance (legally) and is easy to cooperate, we can support that. Otherwise a dead-end will soon be reached, unsurprisingly.

    I've had some discussions with several fellow developers on suggesting Debian to buy some GPUs to extend its infrastructures for better GPU support.
    The plan to put forward those ideas to a larger audience inside Debian had
    been indefinitely postponed because we know the requirement of non-free
    driver (there is no free alternative) would be a big problem.

    Although my initial thought is to make Debian useful in more areas like GPU computing, I finally realized that by accepting new non-free blobs as an organization, we are further loosing our core value written on our homepage -- "a complete free operating system".

    My conclusion is: "Users with special demands can take care of themselves,
    as we are unable to go far on our own." In terms of GPU computing, Debian
    is providing a great system as a foundation for development and applications.

    Of course, deep learning frameworks are regular software we are already familiar enough with. Their GPU support simply depends on whether the necessary drivers and libraries are maintained in Debian.

    # AI Applications & ML-Policy #################################################

    I predict that the ML-Policy [1] will work as a warning on potential issues instead of some practical guidance on packaging, because there are (and will be) long-existing issues hard to overcome which make our packages not really useful without external components. Throughout the whole ML-Policy, I think the most valuable warning is the definition of "ToxicCandy Model", which identifies software freedom trap for random developers interested in AI software.

    Cool and useful stuff keeps emerging -- e.g., Facial Authentication for Linux
    https://github.com/boltgolt/howdy
    And it depends on some pre-trained models (licence: CC0-1.0):
    https://github.com/davisking/dlib-models
    People may still have some impression on the past discussions on ML-Policy. When we treat pre-trained models as something like a picture or a song,
    they may enter our main archive. But when we try to exercise software freedom, things will go wrong. For example, we can study a painting/song and analyze it to learn something, but this does not work for pre-trained models. Without
    the training data there is no much way to study/learn/reproduce the pre-trained models. As per definition in ML-Policy the mentioned model is ToxicCandy model.

    Based on my interpretation, it means Debian might step aside from the world of AI applications to fully exercise software freedom. It's a pity but Debian's major role in the whole thing is a solid system.

    Workarounds to address that pity are possible. For example, the past "Debian User Package Repository" idea. By distributing only package building scripts
    to end-users so they can build corresponding packages locally. In this way
    the license issues and software freedom issues are bypassed as the user has determined to accept the potential issues.

    On the other hand, I'd advise people who want to package interesting AI applications carefully evaluate whether it is mature enough -- and never package a pure academic research project. This is largely due to our development cycle is much slower than the revolution cycle in the deep learning field. Something better may appear before it clear's our NEW queue...

    As for AI applications that require considerable computing power (GPU), the answer rather distinct.

    [1] https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst

    # Concluding Remarks

    We maintain and provide a free operating system, and we value software freedom. My contribution here is to provide my understanding on the boundary between what we can do and what we can't do with respect to a new interesting area. At least I learned a lot when thinking about this, and got a deeper understanding on "what Debian is".

    Debian is wonderful because this is one of the only few places on the earth where people will shout when software freedom is potentially infringed.
    Indeed, Debian must have its own uniqueness in the impression of every long term members of the project.

    Thank you for the excellent system, fellow developers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Goirand@21:1/5 to M. Zhou on Thu Jan 13 12:20:02 2022
    On 1/13/22 01:00, M. Zhou wrote:
    Thank you for the excellent system, fellow developers.

    Thanks to you for all of your work in this field.

    Cheers,

    Thomas Goirand (zigo)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davide Prina@21:1/5 to M. Zhou on Thu Jan 13 22:50:01 2022
    On 13/01/22 01:00, M. Zhou wrote:

    Cool and useful stuff keeps emerging -- e.g., Facial Authentication for Linux
    https://github.com/boltgolt/howdy

    note that EU (European Union) Privacy is managed by the GDPR Regulation
    and the ePrivacy Directive.
    The Directive will be replaced by the ePrivacy Regulation that will have
    more strict rules (probably this will be approved this year).

    Note: a directive must be implemented in national law of each EU states
    and each state can select how to "implement" it. A Regulation becomes
    effective law for each EU states simultaneously, same rules for all. (In reality this is not true for only EU states, but also for all states
    that are in the European Single Market that don't have contract some
    special exception for the field ruled by the Regulation. For example
    Norway, who is not an EU state, is subject to GDPR Regulation...
    societies have been fined by Privacy Norway Board for violating GDPR).

    I have read that the new ePrivacy Regulation will introduce new strict
    rules, for example no one can use AI for doing a facial recognition
    (only Police can do it and only on regulated cases), but also cannot be
    used in more generic fashion, for example for identify people type that
    are making a demonstration (for example identify if they are woman/man
    or most woman/man, the religion that they have, the color of they skin,
    the origin country/region, ...).
    Note: in reality facial recognition in public spaces is illegal also today.

    So facial recognition will be illegal for doing workers authentication
    or for identify clients in your shop or...

    Note also that actually some data use are illegal in EU, for example a
    society has used public photos to training AI and that society has been
    fined for that action, because that society don't have a user consent
    for this data treatment.

    If I don't mistake also other extra-UE states are introducing
    law/privacy law that limit AI usage.

    All of this to say that AI in Debian cannot only introduce license
    problems, but also legal problems.

    I think that if Debian give to users general AI product that can be used
    to train models, than, I think, it is a user responsibility (it is the
    user that select what data to use to training and the use of the
    training data). But if Debian give users a package that use a trained
    model for doing something than, I think, that there must be at least a disclaimer... so if there will be a package frdm (Facial Recognition
    Display Manager) that let user authentication with only facial
    recognition, probably who install/configure it will have to be
    informed/accept that the use of this package in some states can violate
    the law if not used only for personal use (or something similar).

    I'm not a legal expert and neither a privacy expert.
    But I will be interested to know what other people think about that and
    if they are legal/privacy experts.

    Ciao
    Davide

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Free unofficial Italian translation@21:1/5 to All on Fri Jan 14 09:30:01 2022
    Thanks Davide, for talking about this. This is not just a legal problem, but a de facto reality implemented in disregard of any right to freedom. In order to prevent artificial intelligence from being used against the privacy of third parties, it is
    necessary to eliminate "the opportunity" (following the model of the fraud triangle, which includes cyber fraud), informing people about "cyber insecurity" and the undesirable effects of databases. It might seem like a trivial solution, but sometimes the
    simplest tools are the best.
    I personally thank the Debian teams and AI developers for their invaluable contribution.






    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office"><head><!--[if gte mso 9]><xml><o:OfficeDocumentSettings><o:AllowPNG/><o:PixelsPerInch>96</o:PixelsPerInch></o:
    OfficeDocumentSettings></xml><![endif]--></head><body>
    <div style="-webkit-text-size-adjust: auto; word-wrap: break-word !important;">Thanks Davide, for talking about this. This is not just a legal problem, but a de facto reality implemented in disregard of any right to freedom. In order to prevent
    artificial intelligence from being used against the privacy of third parties, it is necessary to eliminate "the opportunity" (following the model of the fraud triangle, which includes cyber fraud), informing people about "cyber insecurity" and the
    undesirable effects of databases. It might seem like a trivial solution, but sometimes the simplest tools are the best.</div><div style="-webkit-text-size-adjust: auto; word-wrap: break-word !important;"><br style="word-wrap: break-word !important;"></
    <div style="-webkit-text-size-adjust: auto; word-wrap: break-word !important;">I personally thank the Debian teams and AI developers for their invaluable contribution.</div><br><br><blockquote class="iosymail"><blockquote></blockquote></blockquote>
    </body></html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From tomas@tuxteam.de@21:1/5 to Davide Prina on Fri Jan 14 10:20:02 2022
    On Thu, Jan 13, 2022 at 10:07:05PM +0100, Davide Prina wrote:
    On 13/01/22 01:00, M. Zhou wrote:

    Cool and useful stuff keeps emerging -- e.g., Facial Authentication for Linux
    https://github.com/boltgolt/howdy

    note that EU (European Union) Privacy is managed by the GDPR Regulation and the ePrivacy Directive.
    The Directive will be replaced by the ePrivacy Regulation that will have
    more strict rules (probably this will be approved this year).

    As far as I understand the GDPR won't restrict the tech itself, but only
    its use. Which makes sense. Basically, no consent => no use, except in
    very restricted scenarios (e.g. public security).

    That said, to have a workable face recognition, you'll need a training
    set (at least with current "solutions"), so you'll have to collect
    consent from all those face "providers".

    All the above said, I'm not a lawyer. Nor do I play one on TV :)

    Cheers
    --
    t

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCYeERnQAKCRAFyCz1etHa Rm7yAJ9+4DIms+ZZczRxfAX42LTDLKM8hACeMMW3PiHyUrVY2TvvhNC4BXscIVA=
    =0Dbn
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Free unofficial Italian translation@21:1/5 to All on Fri Jan 14 11:50:01 2022
    The effectiveness of the privacy law depends on the context. In fact, in cases of public security or if crimes are in progress, the effectiveness of the privacy law is limited or in more serious cases not taken into consideration.But tools such as
    artificial intelligence are also used to commit abuses of power (and not just by private individuals).Unfortunately, there is no efficient preventive "defensive" strategy (and in general, preventive "defensive" strategies are never efficient).Laws
    against illegal forms of control exist, however Snowden is still in Russia (and Obama was a civil rights advocate). It is a paradox, but no written law can prevent injustice.




    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office"><head><!--[if gte mso 9]><xml><o:OfficeDocumentSettings><o:AllowPNG/><o:PixelsPerInch>96</o:PixelsPerInch></o:
    OfficeDocumentSettings></xml><![endif]--></head><body>
    <span style="-webkit-text-size-adjust: auto;">The effectiveness of the privacy law depends on the context. In fact, in cases of public security or if crimes are in progress, the effectiveness of the privacy law is limited or in more serious cases not
    taken into consideration.</span><div style="-webkit-text-size-adjust: auto; word-wrap: break-word !important;">But tools such as artificial intelligence are also used to commit abuses of power (and not just by private individuals).</div><div style="-
    webkit-text-size-adjust: auto; word-wrap: break-word !important;">Unfortunately, there is no efficient preventive "defensive" strategy (and in general, preventive "defensive" strategies are never efficient).</div><div style="-webkit-text-size-adjust:
    auto; word-wrap: break-word !important;">Laws against illegal forms of control exist, however Snowden is still in Russia (and Obama was a civil rights advocate). It is a paradox, but no written law can prevent injustice.</div><blockquote class="iosymail">
    <blockquote></blockquote></blockquote>
    </body></html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Free unofficial Italian translation@21:1/5 to All on Fri Jan 14 12:40:02 2022
    I don't know the law in Switzerland, but in your case you need to take into account the Worker Rights and not just the privacy law. Furthermore, the nature of the goods and services produced by the company must also be considered.




    <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office"><head><!--[if gte mso 9]><xml><o:OfficeDocumentSettings><o:AllowPNG/><o:PixelsPerInch>96</o:PixelsPerInch></o:
    OfficeDocumentSettings></xml><![endif]--></head><body>
    <span style="-webkit-text-size-adjust: auto;">I don't know the law in Switzerland, but in your case you need to take into account the Worker Rights and not just the privacy law. Furthermore, the nature of the goods and services produced by the company
    must also be considered.</span><blockquote class="iosymail"><blockquote></blockquote></blockquote>
    </body></html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Goirand@21:1/5 to Davide Prina on Fri Jan 14 12:20:02 2022
    On 1/13/22 22:07, Davide Prina wrote:
    So facial recognition will be illegal for doing workers authentication
    or for identify clients in your shop or...

    Let's say we have facial recognition to enter a data center, is this
    illegal as well? Will that be also illegal in Switzerland?

    Cheers,

    Thomas Goirand (zigo)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gard Spreemann@21:1/5 to M. Zhou on Fri Jan 14 16:00:01 2022
    Thank you for your work in this area, and wise thoughts, as always!

    "M. Zhou" <lumin@debian.org> writes:

    My conclusion is: "Users with special demands can take care of themselves,
    as we are unable to go far on our own." In terms of GPU computing, Debian
    is providing a great system as a foundation for development and applications.

    […]

    Based on my interpretation, it means Debian might step aside from the world of
    AI applications to fully exercise software freedom. It's a pity but Debian's major role in the whole thing is a solid system.

    I understand how you reach these conclusions, both from the POV of
    hardware driver non-freedom and from the POV of the toxic candy problem
    of trained models. And while I agree with your conclusions, I do worry
    about the prospect of the lines blurring.

    It's not unreasonable to expect that AI models become standard
    components of certain classes of software relatively soon. Nomatter our position on the matter, I suspect the matter will affect lots of
    "non-special", "ordinary" software sooner rather than later. That is not
    to say that that should change our position – it is just to say that I
    think we should worry.

    What do we do if/when an image compression scheme involving a deep
    learning model becomes popular? What do we do if/when every new FOSS
    game ships with an RL agent that takes 80 GPU-weeks of training to
    reproduce (and upstream supports nvidia only)? When every new text
    editor comes with an autocompleter based on some generative model that
    upstream trained on an unclearly licensed scraping of a gazillion
    webpages?


    -- Gard


    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJGBAEBCgAwFiEEz8XvhRCFHnNVtV6AnRFYKv1UjPoFAmHhjYESHGdzcHJAbm9u ZW1wdHkub3JnAAoJEJ0RWCr9VIz6AZsP/ROfWoP2aFh6ll5crPIzHtGPH4PpyHHb Cu6i1GS40gvzWud0R2DoMoMWY18mRWVgn8OiK9/JhvBoFJgAMmlUvasbFLi9+oEL 1yPZjyPCoh6eya3c9fIz9eCccOhgxeDk+tYumQfeas1KxncirhM+cPMnkPO+Cvq2 c1C3COiVlFMUxsyJ/vuSli9uGPGWpcqwhzlCSsBdLtGSLS11keuHbRT/F7XJTTwm kAqBwTvqH8iQOejMa9QeTWnN5oxQXHVIb4laPSeWw0NBiUiWb3UaLwlM0lL0mf5U UvcUb7rJ1PTNOPEZ0CzH0FGuvbtNN7IXtZPdScGJzlvaVKz/8CvYto/aiPvjt1zW V37uuN3iD4fIHQfh3oKhux4d+9Pxz/v2ifym6lowon7uJ39XMJ/PI5mlcU6Vdaj0 ejJnrNplVouu+YVORdJ4zZCi4jbO3QqGnVUhZ/d5hEy/hBzZeXA//cILKmEsF+TC ArBrzcrT5KBTbSQyb+6CU7UPqOwfUX979oaf3LxxdYTJIGwPYZ99mUYdhzfkyec0 m7QzSdrz1jXATEeSKqHn6vAMJBzsyE/s0hoQQUUVP0asce1eEI1Ju+EhCKqCZClh /fGCh/a4E1w+JFacGR+qHmORNMva/HR2+bQnplLZYgQ2Prk90+dwly+9T7nWSozl
    p4VyOFanI3kk
    =/at5
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M. Zhou@21:1/5 to Gard Spreemann on Fri Jan 14 17:00:02 2022
    On Fri, 2022-01-14 at 15:35 +0100, Gard Spreemann wrote:

    I understand how you reach these conclusions, both from the POV of
    hardware driver non-freedom and from the POV of the toxic candy
    problem
    of trained models. And while I agree with your conclusions, I do
    worry
    about the prospect of the lines blurring.

    Indeed. But I eventually figured out that "lazy evaluation" on this
    problem is the most realistic solution for distribution developers.
    I'm not worried about it. See the reason below.


    It's not unreasonable to expect that AI models become standard
    components of certain classes of software relatively soon. Nomatter
    [...]
    What do we do if/when an image compression scheme involving a deep
    learning model becomes popular? What do we do if/when every new FOSS
    game ships with an RL agent that takes 80 GPU-weeks of training to
    reproduce (and upstream supports nvidia only)? When every new text
    editor comes with an autocompleter based on some generative model
    that
    upstream trained on an unclearly licensed scraping of a gazillion
    webpages?


    Indeed. Deep Learning has been demonstrated effective in video
    compression as well. However, research projects are not entering
    Debian. Only those implementations for industrial standard enter
    our archive. Only when standards like H.267 (imagined) really
    introduces deep learning as a part of the core algorithm, should
    we worry about the blurred borderline. However, even if that
    happened eventually, upstreams such as videolan and ffmpeg will
    have to think about GPL interpretation before we think about it.
    There is already an historical example from ffmpeg where pre-trained convolution kernels (in header file) are excluded from the GPL
    source code. And I bet even the ISO standard group has to
    think about the potential license/legal issues before introducing
    that.

    An RL agent that takes 80 GPU-weeks is also highly likely to
    require a powerful GPU for inference when we play such game.
    I play lots of games and what kind of open source game has
    reached that level of being so GPU-demanding? Before that
    comes true for free software games, they will first appear on
    commercial titles, ahead of free software games by decades.

    Generative model for code completion is already a widely known
    problem, such as Github's codepilot. They are fancy and useful
    but before we really think about the blurred borderline, we
    have already seen how controversy it was.

    Let's step back a little bit. When what you said all comes true,
    there will be some way for the end users to install them onto
    the system.
    A relevant example is vscode. It is a prevalent editor, being
    fond by a large user group across all systems. vscode's being
    absent from official repository is not stopping the upstream
    from distributing their own .deb packages. I understand how
    tricky it is to package in our archive. I believe the same
    thing will happen for new fancy AI tools (e.g., the face
    authentication for linux tool already has its own .deb package).

    Let me quote a word from a fellow developer: "In Debian we should
    stop from chasing rabbits." To me, "lazy evaluation" on these
    problems is seemlingly the best strategy. Based on Debian's
    role in this ecosystem, thinking about serious issues before
    our upstream does destines to make negligible technical progress.

    When we really have to execute those "lazy evaluation", we
    are not unprepared since the community is already aware of
    the precautions and warnings.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davide Prina@21:1/5 to Thomas Goirand on Fri Jan 14 21:00:01 2022
    On 14/01/22 12:02, Thomas Goirand wrote:
    On 1/13/22 22:07, Davide Prina wrote:
    So facial recognition will be illegal for doing workers authentication
    or for identify clients in your shop or...

    Let's say we have facial recognition to enter a data center, is this
    illegal as well? Will that be also illegal in Switzerland?

    Switzerland is an anomaly: it is the state that gain more advantage from
    the European Single Market but it is not in the European Single Market
    because each state of the UE have single "contract" with it. I know that
    EU is trying to invalidate/stop single "contract" and make Switzerland
    join the European Single Market (I don't know if they have already reach
    an agreement).

    If Switzerland will join the European Single Market then it cannot
    participate to the formation of new EU laws but it will need to adopt
    all the new EU laws that European Single Market require. For example
    privacy laws.

    Note: ePrivacy Regulation is not already approved and so it can be
    changed before approval.

    But, for the actual privacy law, the Privacy Italian Board has forbid
    and fined a public administration that have start to use worker
    fingerprint as a method of let them enter/exit the society.

    If you know Italian can read the following (I have take a random article): http://www.lavorosi.it/rapporti-di-lavoro/riservatezza/garante-privacy-ordinanza-del-14012021-no-alluso-delle-impronte-digitali-dei-dipendenti-s/

    Ciao
    Davide

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From tomas@tuxteam.de@21:1/5 to Free unofficial Italian translation on Fri Jan 14 20:20:01 2022
    On Fri, Jan 14, 2022 at 05:33:32PM +0000, Free unofficial Italian translation - FUIT wrote:
    It may seem like a stupid question, but are there any open source programs based on artificial intelligence for the recognition and forensic analysis of the voice print?

    Wikipedia [1] is your friend. From there: bob.bio.spear [2] (GPLv3),
    ALIZE [3] (LGPL) (there may be others, of course).

    Cheers
    [1] https://en.wikipedia.org/wiki/Speaker_recognition
    [2] https://pypi.org/project/bob.bio.spear/
    [3] https://alize.univ-avignon.fr/mediawiki/index.php/Main_Page

    --
    tomás

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCYeHDDgAKCRAFyCz1etHa Rp/ZAJ49fZmwasEhjKWYjW0unY02LJRY5wCffH0fFbd3n+tcXdHpsVTb9sl3Ifs=
    =Bojc
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Davide Prina@21:1/5 to tomas@tuxteam.de on Sat Jan 15 12:20:02 2022
    On 14/01/22 07:01, tomas@tuxteam.de wrote:
    On Thu, Jan 13, 2022 at 10:07:05PM +0100, Davide Prina wrote:
    On 13/01/22 01:00, M. Zhou wrote:

    Cool and useful stuff keeps emerging -- e.g., Facial Authentication for Linux
    https://github.com/boltgolt/howdy

    note that EU (European Union) Privacy is managed by the GDPR Regulation and >> the ePrivacy Directive.
    The Directive will be replaced by the ePrivacy Regulation that will have
    more strict rules (probably this will be approved this year).

    As far as I understand the GDPR won't restrict the tech itself, but only
    its use. Which makes sense. Basically, no consent => no use, except in
    very restricted scenarios (e.g. public security).

    That said, to have a workable face recognition, you'll need a training
    set (at least with current "solutions"), so you'll have to collect
    consent from all those face "providers".

    I think that is not so simple. The reply can be very long and
    articulated, I will try to be very concise and let you know some points
    that I think can be very "interesting".

    If you manage biometric data of EU citizen you must consider also:

    * citizen can revoke the consent: so probably you must retire you model
    and generate new one without the data revoked. But if you have saved
    your model in a CVS/DVCS or similar... or you have distributed the
    model... how can you do that?

    * with the new ePrivacy legislation, in some cases, the consent have a
    time of validity (I don't know if applicable also for this uses type)
    and you need to have a renewed consent... or delete the data (there are
    some exceptions, but I don't think they are applicable in this cases;
    and in any case these exceptions can have longer time validity)

    * if you store and use biometric data you have to inform the Privacy
    State Board and also have the OK for the use you are declaring. the
    consent has validity only if you have done previously this step.

    * in theory, for the few thing I know about AI, a model is something
    similar to an aggregation/anonymization... but for facial recognition a researcher have been able to extract original face from a model used to generate faces of not existing people. Other researchers have
    demonstrate that using anonymized data, aggregated with public data,
    they can identify some real people of the anonymized data. In these
    cases the biometric data can be stored only in EU territory and the
    servers where are stored must not be accessible by servers external the
    EU territory (as my previous reply in reality there are other territory external EU if they are part of the...)

    All the above said, I'm not a lawyer. Nor do I play one on TV :)

    I'm not a law/privacy expert, so I can mistake something.

    Ciao
    Davide

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rahmatullin@21:1/5 to Free unofficial Italian translation on Sat Jan 15 14:20:01 2022
    On Fri, Jan 14, 2022 at 08:23:03PM +0000, Free unofficial Italian translation - FUIT wrote:
    I know wikipedia. I was hoping there was a forensic court expert for the voiceprint among you.
    This sounds like a wrong topic for debian-project@.

    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmHiuLEtFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh DfgP/2sCKs9OkRR7RaDINN6Cw1xB8D/EkDF2615yPxg2viH/F8T/197EmAoSTipk 274DbVTAoWAVNEXRwIXp8TxvlzmqMA225D6L1L6baA+k6kJ37M/5f6z1IVX75oWW +vmxxUlNGTB3+9QSaSf9FARLh64GJ6Twxby7Oe9roDibSdZ3DAr7fsawObDF4mtI S6C6LRjs2rzYnEXQXyjemTSu+VK18jJ7hE/KCQSx+9ag+TYOdHWp5w0z6f6vbYEl K26ZOQ3RYFGmVEkgM3XkIrE4qreWqtjWmna0/BGJQNHpIYMKwEyuuQxyGUAPKqPy psvX4AboMKoqzSXVnWVgXyEmQqkf/8GQqGQ7NH1NJ+Ih5geU/4zSRQyexNBjMWgO BwNCf8wbTRp/6KPsY7xLi+4tJ4BLb4IzLpaReFkSfUt5H9BJtBS/7mi2ZfS0dgje medJ8dDtuUhTKZ8KY33SdYlSkXURv33rPZssqbLWezkfY5u2uPEwI64tRSn4A/bf 2u7NIN80u9KKn3T9wMrKAw/JECc6H+GjXGnJ7x+qKESNHY/1GO7d+cZEs6cj4CIR NjzhNLwRi7R+PH0nzzK2ASp3d3wC1AaSgXJtPjQKZMZlH6+XLioN7fgPFtTIOCkB QXK2qZbVdvaEIqcspcPRwMh0LlNMx+Hlj9F2eFh/pWk8TT+S
    =l4ml
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From tomas@tuxteam.de@21:1/5 to Davide Prina on Sat Jan 15 14:40:01 2022
    On Sat, Jan 15, 2022 at 10:45:35AM +0100, Davide Prina wrote:
    On 14/01/22 07:01, tomas@tuxteam.de wrote:

    [...]

    That said, to have a workable face recognition, [...] you'll have to collect
    consent from all those face "providers".

    I think that is not so simple. The reply can be very long and articulated, I will try to be very concise and let you know some points that I think can be very "interesting".

    Basically, we do agree: perhaps "collect consent" was a bit sloppy and suggested an one-time action. That wasn't what I wanted to convey -- for
    each image you use in your training set, you'd have to keep enough
    metadata to document the person's consent (and to make revocation
    possible). At each change, you'd have to re-train your model (or do
    something equivalent).

    Cheers
    --
    t

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQRp53liolZD6iXhAoIFyCz1etHaRgUCYeLLNQAKCRAFyCz1etHa RiGnAJ4yMJ+wMwFlQqzgAQ0bHPaBNsFQHACcDRk0NbO+2Rpdxm6+lOD81UtSuyE=
    =EvQu
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to M. Zhou on Sun Jan 16 03:30:01 2022
    On Wed, 2022-01-12 at 19:00 -0500, M. Zhou wrote:

    I've had some discussions with several fellow developers on suggesting Debian to buy some GPUs to extend its infrastructures for better GPU support.

    Was there a plan for what to use these GPUs for?

    Were they needed for driver/other package building/testing?

    Were they to be used for libre model training?

    Although my initial thought is to make Debian useful in more areas like GPU computing, I finally realized that by accepting new non-free blobs as an organization, we are further loosing our core value written on our homepage --
    "a complete free operating system".

    This isn't any different to most modern hardware devices, which either
    have non-free blobs embedded in them or have non-free blobs uploaded to
    them or both. Even worse, server hardware often requires proprietary
    software running in userspace to manage parts of the server. The modern hardware industry does not produce hardware that allows Debian to avoid
    dealing with these blobs in some way. GPUs aren't any different here
    IMO. Things may change with RISC-V, OpenBMC and other efforts though.

    I predict that the ML-Policy [1] will work as a warning on potential
    issues instead of some practical guidance on packaging

    Mostly agreed with this section.

    Based on my interpretation, it means Debian might step aside from the
    world of AI applications to fully exercise software freedom. It's a
    pity but Debian's major role in the whole thing is a solid system.

    I think we should simply follow our social contract and guidelines as
    usual. Package useful things, but place them in contrib or non-free as appropriate depending on the situation. Advocate for the release of
    libre training data, retraining from scratch, license changes etc.

    PS: I note that we already have Toxic Candy models in Debian main.
    For example the rnnoise model was trained from proprietary data
    but is available in Debian source packages:

    $ apt-file search -I dsc rnnoise

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmHjeFcACgkQMRa6Xp/6 aaMfZg//SkSy2HwjXnmCPY9DpSWO7gwgdO0BrEQhwlov7A6I9PnagyASqybh+wfE 9mvAWI0FwC4FIXE1dfATllotOm6RvBbEsM7ym7F8PpLyfTv837vO4GnN/3qqmNsp +pI40jPDnTvF2GGfmJlfiz6zeVqrwYw+AF4uJlWNUg/WzIAKB6E0L+YIVQfzfnPp pzHAYIHUJhTc6aS4BvMnQiuf8BayaSWlL1TRBfyMNkz75nVvIfc7AjTIQP9YFgEy I/9FpVh+nzGmdAEJGOr72y/qHKzi1HEw1iVdPliJ8bl4+qWPJQA8CmG5JtHGMlVT ogstyWFQvOZMzwXO1Hk+0rBRU0pHi+gts8VCcbRzRUxcQLqz3Vf1Zk2QFBQHUZKj t7Ji9Cz+bqJqLcU/RstpN0X7k6lqg6uWDiKKw4ANjDpS53eYqLe0QZGQELpiCz4+ uokK61ahzq0Fgqn7Qk5/hck2K37Ko3JQGXK3HfvyutmywO7OnSo0zfxU5UzvsJXE eicxP0Y2GAyifnP5+KFcM/0ydD38Psk15O40uvFh5Hv6tuCupgXbzYFOtknhSD9H P0J126EQVprOXDy4dNhps0xi4g0dmQHoL8HxlNw7it+r2+MzmMMuYkGaeGBf81Jh eYVCNVNjYOG8kjTHSrL119QRT106/aslXQuKkKD5oosSlJWQg1M=
    =p67O
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M. Zhou@21:1/5 to Paul Wise on Sun Jan 16 03:40:01 2022
    Hi Paul,

    Thanks for the additional questions.

    On Sun, 2022-01-16 at 09:43 +0800, Paul Wise wrote:
    On Wed, 2022-01-12 at 19:00 -0500, M. Zhou wrote:

    I've had some discussions with several fellow developers on
    suggesting Debian
    to buy some GPUs to extend its infrastructures for better GPU
    support.

    Was there a plan for what to use these GPUs for?

    Not specific plan, but I can list some of its usage if we have one.

    Assuming it's an nvidia GPU, we can use it for

    1. building and testing cuda-related computational software,
    such as tensorflow-cuda, pytorch-cuda, magma, etc.
    (this demand is confirmed by we debian deep learning team)

    2. building and testing some multimedia tools, such as ffmpeg
    (when linked against nvidia's library, the resulting ffmpeg
    binary is not redistributable).
    (this demand is not confirmed with multimedia team)

    3. building and testing GPU acceleration for software such as
    blender.
    (not confirmed with maintainer)

    4. transcoding our videos (e.g., our debconf videos.)
    (not confirmed with debconf team)

    5. train neural networks?
    (Such demand should be quite rare, given my view point
    in the original post)

    And the problem is that nvidia-driver is non-free. It is
    inevitable for any upper layer application. The open source
    driver nouveau cannot do any of the above.

    Assuming it's an AMD GPU, we can use it for

    1. building and testing ROCm (the AMD's opensource counterpart
    to CUDA). It looks like the amdgpu driver in kernel
    is enough to drive the ROCm without requiring non-free blob.
    (I'm not sure whether firmware is still required)
    (people in debian-ai@l.d.o is recently working on packaging)

    2. some deep learning framework has added ROCm support,
    such as pytorch. we can build and test it

    3. build/test any software with OpenCL support, such as
    opencv, etc. So we don't have to do everything with pocl.

    4. 5. same to nvidia's 4 and 5.

    Assuming it's an Intel GPU,

    I simply don't know. Let's wait and see the news.
    Intel is making effort on SYCL (an abstraction of OpenCL), which
    is called DPC++ by the upstream. Intel has not yet merged SYCL
    into LLVM upstream.

    Were they needed for driver/other package building/testing?

    Non-free driver is required for nvidia GPU. Unfortuately for
    industry users (especially machine learning users) nvidia GPU
    is the most widely-supported and mature option.

    Kernel already has the driver for AMD GPU. I'm just not sure
    whether firmware is required to run ROCm or OpenCL etc.

    Were they to be used for libre model training?

    As long as we finish the deep learning framework packaging
    with specific hardware support, we can do so -- as long as
    we have the corresponding "libre" data.



    This isn't any different to most modern hardware devices, which
    either
    have non-free blobs embedded in them or have non-free blobs uploaded
    to
    them or both. Even worse, server hardware often requires proprietary
    software running in userspace to manage parts of the server. The
    modern
    hardware industry does not produce hardware that allows Debian to
    avoid
    dealing with these blobs in some way. GPUs aren't any different here
    IMO. Things may change with RISC-V, OpenBMC and other efforts though.

    I still remember the microcode example from the last discussion,
    and it's true. But the server proprietary software are inevitable
    to make it fully functional, while GPU doesn't.
    An infra server can be fully functional without a GPU -- GPU
    not inevitable.



    Based on my interpretation, it means Debian might step aside from
    the
    world of AI applications to fully exercise software freedom. It's a
    pity but Debian's major role in the whole thing is a solid system.

    I think we should simply follow our social contract and guidelines as
    usual. Package useful things, but place them in contrib or non-free
    as
    appropriate depending on the situation. Advocate for the release of
    libre training data, retraining from scratch, license changes etc.

    Yes, recalling our initial motivation and principals is a very good
    idea when facing complicated issues. I fully agree.

    PS: I note that we already have Toxic Candy models in Debian main.
    For example the rnnoise model was trained from proprietary data
    but is available in Debian source packages:

    $ apt-file search -I dsc rnnoise


    Well... right. I've seen related bug reports. Thanks!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)