Forum: >>> Magnum BBS <<<

[gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to All on Tue Feb 27 15:50:01 2024

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around
generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we
can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmXd9X0SHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQOvwYH/2uJnHzd6zPYOBnP8DKu7QUnFiIWWwn0 5n2DoKxApzNjcFdHMMcl0PYI76lanT1rdu1tJJm3YdxMTVgy6DrHz3P1DSsiUhne qv199f1AYvgwq08lf5zDr7Xdgh0tDc6/oz/Ou66YQIh9fCFX8W7so5VPIBYtLwIS DKFObvFblXtMMHTFg9pPyQWQE3df14axTbSXo62EhAbwjNKK6lmz2jmIo8+OaCe7 fvLb5hWKtRswpnjljFYbhwP0O3hkOa2uutiPGbx2VbWnWI8Emqxw0RpTNSA612HI GkaJfCCd4q31Z56PTIQ9E4E2XIwQ4rDRGR7C/LpvqnhSU8vYhsqZflY=
=Alux
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?Arsen_Arsenovi=C4=87?=@21:1/5 to mgorny@gentoo.org on Tue Feb 27 16:20:01 2024

Michał Górny <mgorny@gentoo.org> writes:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

+1. All I've seen from "generatative" (read: auto-plagiarizing) A"I" is
spam and theft, and have the full intention of blocking it where-ever my
vote counts.
--
Arsen Arsenović

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iOcEARYKAI8WIQT+4rPRE/wAoxYtYGFSwpQwHqLEkwUCZd37dl8UgAAAAAAuAChp c3N1ZXItZnByQG5vdGF0aW9ucy5vcGVucGdwLmZpZnRoaG9yc2VtYW4ubmV0RkVF MkIzRDExM0ZDMDBBMzE2MkQ2MDYxNTJDMjk0MzAxRUEyQzQ5MxEcYXJzZW5AZ2Vu dG9vLm9yZwAKCRBSwpQwHqLEk2UUAQDUY7SCkOP16cEnIklkBM+fBz6bJvSuXOJo 0aaUcQq5egEA9Op13vDT/y3HFmjgaGxkI2X+4sNrE2VjJTcz/449SgU=lKlB
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenton Groombridge@21:1/5 to All on Tue Feb 27 16:30:01 2024

On 24/02/27 03:45PM, Michał Górny wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

--
Best regards,
Michał Górny

I completely agree.

Your rationale hits the most important concerns I have about these
technologies in open source. There is a significant opportunity for
Gentoo to set the example here.

--
Kenton Groombridge
Gentoo Linux Developer, SELinux Project

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEP+u3AkfbrORB/inCFt7v5V9Ft54FAmXd/ehfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNG RUJCNzAyNDdEQkFDRTQ0MUZFMjlDMjE2REVFRkU1NUY0NUI3OUUACgkQFt7v5V9F t54ZexAAnw8QKXHaWEA3geVV6v1xlWcOA4JJD7NVmo92UNx8gkGlkfCzt0JcqsVf ZqqfD48C02RHlDjJHymw0SLACc5Wqq3O6GiXRa4nP9bTYTj/R+QArpUYLuYwhIoh Bm1IBQ5O86nLpqnkjsZOgWnjcjlj+GIUwPI408fqJ8ju0+VJAZrHLLI95SVI+KT7 AHzU5otD4wnfwOwDVFOQcbnHtNKz5ZsRBXwQuCSZ/oxFrgc873RzWJ4mpCJ715Sv RZ+5wVPOZgGjQm87T0oZj3iGnD8s217Z54+4ZMGP72vmDlhb6xO8rO4yRNFbT8Gf UXQAJBwoMIxGShiw68FNWo3wjo3hPNfkuumS4j7C2N8qvXiSsVrebPJa7tB+h2+M PKqTR53FhfdI3zqK+QqUNdqPHpAAP7hak69M3fMO9VG628DSIKXRuaxUfWtIEbA2 UJjXzy86K/vDY09hte7w21NKh938qEyQajThHdQ/FFrp1MPraloIG7RDdqpWXvUp 9Of/r5y3LXfI60SdZnco5e58UEhT4UV5aTfSZvs84K2eW8QQ6Nm4+ktM0FQlBKBt eP/rA6tJ/IUYA6UZMpdOsIyKwX0Gjhw3kS42ZZ5uOT0ZjLHpPxe4dKTCCSODyvId H/9OERiGpEvYHcMP9ju3fhzPkxeiF2QZ2r2ufoDLOB4enETrCaA=
=aJqM
----

From Alex Boag-Munroe@21:1/5 to Kenton Groombridge on Tue Feb 27 16:40:04 2024

On Tue, 27 Feb 2024 at 15:21, Kenton Groombridge <concord@gentoo.org> wrote:

On 24/02/27 03:45PM, Michał Górny wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to create ebuilds, code, documentation, messages, bug reports and so on for use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at: https://github.com/pkgxdev/pantry/issues/5358

--
Best regards,
Michał Górny

I completely agree.

Your rationale hits the most important concerns I have about these technologies in open source. There is a significant opportunity for
Gentoo to set the example here.

--
Kenton Groombridge
Gentoo Linux Developer, SELinux Project

A thousand times yes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marek Szuba@21:1/5 to All on Tue Feb 27 17:20:01 2024

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------J708hfQ0y6iwhWBSrqqwxdba
Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: base64

T24gMjAyNC0wMi0yNyAxNDo0NSwgTWljaGHFgiBHw7Nybnkgd3JvdGU6DQoNCj4gSW4gbXkg b3BpbmlvbiwgYXQgdGhpcyBwb2ludCB0aGUgb25seSByZWFzb25hYmxlIGNvdXJzZSBvZiBh Y3Rpb24NCj4gd291bGQgYmUgdG8gc2FmZWx5IGJhbiAiQUkiLWJhY2tlZCBjb250cmlidXRp b24gZW50aXJlbHkuICBJbiBvdGhlcg0KPiB3b3JkcywgZXhwbGljaXRseSBmb3JiaWQgcGVv cGxlIGZyb20gdXNpbmcgQ2hhdEdQVCwgQmFyZCwgR2l0SHViDQo+IENvcGlsb3QsIGFuZCBz byBvbiwgdG8gY3JlYXRlIGVidWlsZHMsIGNvZGUsIGRvY3VtZW50YXRpb24sIG1lc3NhZ2Vz LA0KPiBidWcgcmVwb3J0cyBhbmQgc28gb24gZm9yIHVzZSBpbiBHZW50b28uDQoNCkkgdmVy eSBtdWNoIHN1cHBvcnQgdGhpcyBpZGVhLCBmb3IgYWxsIHRoZSB0aHJlZSByZWFzb25zIHF1 b3RlZC4NCg0KPiAyLiBRdWFsaXR5IGNvbmNlcm5zLiAgTExNcyBhcmUgcmVhbGx5IGdyZWF0 IGF0IGdlbmVyYXRpbmcgcGxhdXNpYmx5IA0KPiBsb29raW5nIGJ1bGxzaGl0LiAgSSBzdXBw b3NlIHRoZXkgY2FuIHByb3ZpZGUgZ29vZCBhc3Npc3RhbmNlIGlmIHlvdQ0KPiBhcmUgY2Fy ZWZ1bCBlbm91Z2gsIGJ1dCB3ZSBjYW4ndCByZWFsbHkgcmVseSBvbiBhbGwgb3VyIGNvbnRy aWJ1dG9ycw0KPiBiZWluZyBhd2FyZSBvZiB0aGUgcmlza3MuDQoNCmh0dHBzOi8vYXJ4aXYu b3JnL2Ficy8yMjExLjAzNjIyDQoNCj4gMy4gRXRoaWNhbCBjb25jZXJucy4NCg0KLi4ueWVh aC4gU2VlaW5nIGFzIHdlIGZhaWxlZCB0byBjb25kZW1uIHRoZSBSdXNzaWFuIGludmFzaW9u IG9mIFVrcmFpbmUgDQppbiAyMDIyLCBJIHdvdWxkIHByb2JhYmx5IGF2b2lkIHF1b3Rpbmcg dGhpcyBhcyBhIHJlYXNvbiBmb3IgYmFubmluZyANCkxMTS1nZW5lcmF0ZWQgY29udHJpYnV0 aW9ucy4gRXZlbiB0aG91Z2ggSSBkbywgYXMgbWVudGlvbmVkIGFib3ZlLCB2ZXJ5IA0KbXVj aCBhZ3JlZSB3aXRoIHRoaXMgcG9pbnQuDQoNCi0tIA0KTWFyZWNraQ0K

--------------J708hfQ0y6iwhWBSrqqwxdba--

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE+MBeYVMkcD2jfqCrKMQ7KFUeMgEFAmXeCaQACgkQKMQ7KFUe MgEhXhAAlfrE5BTL2YhJMG8QTn42Y6IKrCvkmFHhFZjvdmGh/Dh6wc6ZcRpUXOAh wq+I37YnXD3N8nJdz6ZegYhK+REy466tj7Um0IAkykl+3zIa7FKDIzBZQzuxHiGz N5H1nweyarnu6WwRsWk4EX/KMyQ59dRIhoqofG4Itd+0xEE63LEDaqkAFPQwucXH nwi4Jug59AxFzqoCarOdXd7K/m+b54GTQsPaCvLnz30hLstl8v6nWQxX9tLqHtcX ZpFGgUEhEz1lUSG60tBH8VqCK2n54XENeXBQHNpX3CS6g6eqt8czZ5Uev5/XEP8r pK8a9WQDsQhDJbT0JXXCwrOtQhpKUCyY6C2wXE++Ds6Q6StAutQMmlaqJf9CQyRU Qnj70ED6Yy1ogQgCNLp+HGXA24nZELf5IfoyqF05mhHYuoP6ian6oxzvfFCAmDs9 69Z+5BVbeMDJuIoJsqLQ7SYJ3hOFGqsjsxITIrypB+DJ3lP3v8R9VWTze+Vgu8nt oGr7Fj/slYOV4e5ncBSwA2hx4NQZbcPRWz2aIskMuS2nlTI4XurXRCHnUPNAEnbH khhkPlvRav8q9VLTyEmgeTq1MAAWzvpjCPGeDEKyKfvo8U4T79yl/1sRpOpobrni jed1yq6CypQmbwNmpR5tt3ydFbFbwZ80jQX3yVWHEIJZGaep2h8=
=58q0
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam James@21:1/5 to Marek Szuba on Tue Feb 27 17:40:02 2024

Marek Szuba <marecki@gentoo.org> writes:

On 2024-02-27 14:45, Michał Górny wrote:

In my opinion, at this point the only reasonable course of action
would be to safely ban "AI"-backed contribution entirely. In other
words, explicitly forbid people from using ChatGPT, Bard, GitHub
Copilot, and so on, to create ebuilds, code, documentation, messages,
bug reports and so on for use in Gentoo.

I very much support this idea, for all the three reasons quoted.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you
are careful enough, but we can't really rely on all our contributors
being aware of the risks.

https://arxiv.org/abs/2211.03622

3. Ethical concerns.

...yeah. Seeing as we failed to condemn the Russian invasion of
Ukraine in 2022, I would probably avoid quoting this as a reason for
banning LLM-generated contributions. Even though I do, as mentioned
above, very much agree with this point.

That's not a technical topic and we had an extended discussion about
what to do in -core, which included the risks of making life difficult
for Russian developers and contributors.

I don't think that's a helpful intervention here, sorry.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas K. Huettel@21:1/5 to All on Tue Feb 27 17:48:31 2024

Copy: mgorny@gentoo.org (=?utf-8?B?TWljaGHFgiBHw7Nybnk=?=)

Am Dienstag, 27. Februar 2024, 15:45:17 CET schrieb Michał Górny:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Fully agree and support this.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

[...] or implementing it.

So, also, no objections against someone (a real person, by his own mental means) packaging AI software for Gentoo.

--
Andreas K. Hüttel
dilfridge@gentoo.org
Gentoo Linux developer
(council, toolchain, base-system, perl, libreoffice)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQKTBAABCgB9FiEE/Rnm0xsZLuTcY+rT3CsWIV7VQSoFAmXeEl9fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEZE MTlFNkQzMUIxOTJFRTREQzYzRUFEM0RDMkIxNjIxNUVENTQxMkEACgkQ3CsWIV7V QSqsBRAAu5kd4qsipk//E7oPUeaKXEeiCS0dPJ//RBDD+HTQ/twparH0ZcAbYZEr 8fr6IRyhaYULhX7N9uFJvdmnXwx3Wjte8WHRr78PdT+glSS1eEVye8gCjjBRIN4O c6RS1MjmMSuD8bvCVL0Kj9ia0oxZqLDcloJzsffV9+ghh/LKTozs+COx7pHJyT8p Eppz5VcbjP9ItyyusqlKMVqiLF0HJ6/98Wqg2Y/0WBJxEJDidPuPQFt85R1gK+J2 l+YOzZ/Oki8rh0qZ2FwD9wyrIjJyP6t+qqxrc4GAAY5NUAxmfhxpWoXfKCFWFaz6 0RZ2ZgvdlyLRqRKRVq4fUMjVIpClU0IJIkg0wiGRMZTekwQsZS47PLWnry2P+X/o fYa0zPjmxSaWok6uVtkGeujh2fa8dZlqC6YoPAMMudtkYK5+fi07Z/ZZJ2vb7WI+ 1IDPgucs7rg34l1OCEc5O/Bkhx05gmMudRL9o/2gNFhbQZSR4GqUvLF4Gopf13Vp UW6UfMaP8kyzicUk4T4kdkWBSRR8YCqhsnux6W1Fa3Db7p+xHz65uvEVtEKSN5VN aUh6B72d8XkI/aVyX6PE3FMYYfy4k7ql5hq1yK2ZvYwwL1ZGuMVD6Hq

From Ionen Wolkens@21:1/5 to All on Tue Feb 27 18:10:01 2024

On Tue, Feb 27, 2024 at 03:45:17PM +0100, Michał Górny wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

+1 from me, a clear stance before it really start hitting Gentoo sounds
good.
--
ionen

-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEEx3SLh1HBoPy/yLVYskQGsLCsQzQFAmXeFcMACgkQskQGsLCs QzRAgAf9GBVBTKu7oH980wPQ2ko9Qwj6js5ttGwk9JHLpXRRRgzmFR4hLZk+HrXr jy7R7PibObHZTwh6SKzOcni4C5WvUDJsFPo1cTmzDw6asFqEJxOfGgUnbIhR++hX HyIA8bwAxvXqDx575b+KRafTYmbYyZMVIcWDRkZ5S6zkBkv7wTJf+9i/thwtTy/4 xS+T7MU9+qzyDVSV58A9qXHJfbqJEWLJH5iGUY74lWEu2Ftb/Li0GWT2aQ53NzMg gMxHVQCeeBFymgbYCP+yXu+UBWO1Yl8/qb7hAln4iXqTf6Z5CVSaFISZ4snUhMt9 HRKcagl//wlX5RzBt2CXP0/mM6jBMA==
=Th6B
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Matthias Maier@21:1/5 to mgorny@gentoo.org on Tue Feb 27 18:50:01 2024

On Tue, Feb 27, 2024, at 08:45 CST, Michał Górny <mgorny@gentoo.org> wrote:

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

+1

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

This is my main concern, but all of the other points are valid as well.

Best,
Matthias

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Freeman@21:1/5 to mgorny@gentoo.org on Tue Feb 27 18:50:01 2024

On Tue, Feb 27, 2024 at 9:45 AM Michał Górny <mgorny@gentoo.org> wrote:

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns.

1. Copyright concerns.

I do think it makes sense to consider some of this.

However, I feel like the proposal is redundant with the existing
requirement to signoff on the DCO, which says:

By making a contribution to this project, I certify that:

1. The contribution was created in whole or in part by me, and
I have the right to submit it under the free software license
indicated in the file; or

2. The contribution is based upon previous work that, to the best of
my knowledge, is covered under an appropriate free software license,
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same free software license (unless I am permitted to submit under a
different license), as indicated in the file; or

3. The contribution is a license text (or a file of similar nature),
and verbatim distribution is allowed; or

4. The contribution was provided directly to me by some other person
who certified 1., 2., 3., or 4., and I have not modified it.

Perhaps we ought to just re-advertise the policy that already exists?

2. Quality concerns.

As far as quality is concerned, I again share the concerns you raise,
and I think we should just re-emphasize what many other industries are
already making clear - that individuals are responsible for the
quality of their contributions. Copy/pasting it blindly from an AI is
no different from copy/pasting it from some other random website, even
if it is otherwise legal.

3. Ethical concerns.

I think it is best to just avoid taking a stand on this. Our ethics
are already documented in the Social Contract.

I think everybody agrees that what is right and wrong is obvious and
clear and universal. Then we're all shocked to find that large
numbers of people have a universal perspective different from our own.
Even if 90% of contributors agree with a particular position, if we
start lopping off parts of our community 10% at a time we'll probably
find ourselves alone in a room sooner or later. We can't make every
hill the one to die on.

I think adding "made by real people" to the list of our advantages
would be a good thing

Somehow I doubt this is going to help us steal market share from the
numerous other popular source-based Linux distros. :)

To be clear, I don't think it is a bad idea to just reiterate that we
aren't looking for help from people who want to create scripts that
pipe things into some GPT API and pipe the output into a forum, bug,
issue, PR, or commit. I've seen other FOSS projects struggling with
people trying to be "helpful" in this way. I just don't think any of
this actually requires new policy. If we find our policy to be
inadequate I think it is better to go back to the core principles and
better articulate what we're trying to achieve, rather than adjust it
to fit the latest fashions.

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roy Bamford@21:1/5 to All on Tue Feb 27 19:00:01 2024

On 2024.02.27 14:45, Michał Górny wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on
for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't
do
much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that
pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff
we
can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you
are
careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations
don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure
shit
doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

--
Best regards,
Michał Górny

Michał,

An excellent piece of prose setting out the rationale.
I fully support it.

--
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods
arm64
-----BEGIN PGP SIGNATURE-----

iQEzBAABCAAdFiEEsOrcx0gZrrCMwJzo/xJODTqpeT4FAmXeINcACgkQ/xJODTqp eT7RdAf8Dt0DZ1e7XASNkeNXwitueKnXWydTfRZhNqSgUPa6RMZsAO3zypyMNE9r i1H3/GVoYxQr/Pj/AOTVtxzB8cEbhwtdzvG76W8JRDblrCnGKs5iuY9kjR/CwrK0 +RGQLCGhTaVXtb410epff6DcUjviZ5MUPWoJYhoRoDDJMS9cfmogaShAHALViI94 nrLfA8/bPjUx17Vy4Vc8zK9IQ4vgqR5jejFV5lyEOgh0rVJJJ3FC+/PnLTW8VGkM xsuuGnc5hT2WVz6BmyLXRi3XgvseRkivZPO9L41fVnJmfSCboV5GiMhCi0rV/oLQ s35ZxDKKWUPpbxkwIaABmpvV6DWhwg==
=Y9Zh
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ulrich Mueller@21:1/5 to All on Tue Feb 27 19:10:02 2024

On Tue, 27 Feb 2024, Rich Freeman wrote:

On Tue, Feb 27, 2024 at 9:45 AM Michał Górny <mgorny@gentoo.org> wrote:

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns.

First of all, I fully support mgorny's proposal.

1. Copyright concerns.

I do think it makes sense to consider some of this.

However, I feel like the proposal is redundant with the existing
requirement to signoff on the DCO, which says:

By making a contribution to this project, I certify that:

1. The contribution was created in whole or in part by me, and
I have the right to submit it under the free software license
indicated in the file; or

2. The contribution is based upon previous work that, to the best of
my knowledge, is covered under an appropriate free software license,
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same free software license (unless I am permitted to submit under a
different license), as indicated in the file; or

3. The contribution is a license text (or a file of similar nature),
and verbatim distribution is allowed; or

4. The contribution was provided directly to me by some other person
who certified 1., 2., 3., or 4., and I have not modified it.

I have been thinking about this aspect too. Certainly there is some
overlap with our GLEP 76 policy, but I don't think that it is redundant.

I'd rather see it as a (much needed) clarification how to deal with AI generated code. All the better if the proposal happens to agree with
policies that are already in place.

Ulrich

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmXeJOgPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4uhj0IAKX+t7W7TrbdO6mJNitPWb5VSTx6hjwM4wCz extUg4mUaTrQ8zOyzQR7z6APWSlkf7U4rBmBVu+fuHQZpPW/KaSo2DXBErfPOlMq vtfhgIuPY5HIGd4SZDBzqrV/u8orOGAOUGljs5KTSDzMkxduEgbTaampPpfdaA51 Ikkz3dKs/3o4MjXE2W+CEg2/kZD1wsg7bghmPF8fGF60t7e14acV1xPQF1RsIebV 2VfGsj+wWsrIF08gOSW5udDohV2rgYLf/14v6TcBbzQ+ySmr/Lz7IqMz72tUXhjq 0Q6J9+vxFa0Kmblmtr/3PckbTCvv5O+/21sYTQIrneGazBLBhbQ�QO
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam James@21:1/5 to mgorny@gentoo.org on Tue Feb 27 19:10:02 2024

Michał Górny <mgorny@gentoo.org> writes:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

I agree with the proposal, just some thoughts below.

I'm a bit worried this is slightly performative - which is not a dig at
you at all - given we can't really enforce it, and it requires honesty,
but that's also not a reason to not try ;)

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

It also makes risk for anyone basing products or tools on Gentoo if
we're not confident about the integrity / provenance of our work.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iOUEARYKAI0WIQQlpruI3Zt2TGtVQcJzhAn1IN+RkAUCZd4kMV8UgAAAAAAuAChp c3N1ZXItZnByQG5vdGF0aW9ucy5vcGVucGdwLmZpZnRoaG9yc2VtYW4ubmV0MjVB NkJCODhERDlCNzY0QzZCNTU0MUMyNzM4NDA5RjUyMERGOTE5MA8cc2FtQGdlbnRv by5vcmcACgkQc4QJ9SDfkZAWdQEAjICwwNfjsVcaqRjrNUhu2jM+jjbfUxbq60ch lC3JJjIBAKb55V8yjHX+pmaNXnAGZ9gnaf/jeXzqOgJKcJFULDoO
=4Yaw
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenton Groombridge@21:1/5 to Ulrich Mueller on Tue Feb 27 19:30:01 2024

On 24/02/27 07:07PM, Ulrich Mueller wrote:

On Tue, 27 Feb 2024, Rich Freeman wrote:

On Tue, Feb 27, 2024 at 9:45 AM Michał Górny <mgorny@gentoo.org> wrote:

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns.

First of all, I fully support mgorny's proposal.

1. Copyright concerns.

I do think it makes sense to consider some of this.

However, I feel like the proposal is redundant with the existing requirement to signoff on the DCO, which says:

By making a contribution to this project, I certify that:

1. The contribution was created in whole or in part by me, and
I have the right to submit it under the free software license
indicated in the file; or

2. The contribution is based upon previous work that, to the best of >>>> my knowledge, is covered under an appropriate free software license, >>>> and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same free software license (unless I am permitted to submit under a
different license), as indicated in the file; or

3. The contribution is a license text (or a file of similar nature), >>>> and verbatim distribution is allowed; or

4. The contribution was provided directly to me by some other person >>>> who certified 1., 2., 3., or 4., and I have not modified it.

I have been thinking about this aspect too. Certainly there is some
overlap with our GLEP 76 policy, but I don't think that it is redundant.

I'd rather see it as a (much needed) clarification how to deal with AI generated code. All the better if the proposal happens to agree with
policies that are already in place.

Ulrich

This is my interpretation of it as well, especially when it comes to
para. 2:

2. The contribution is based upon previous work that, to the best of
my knowledge, is covered under an appropriate free software license,
[...]

It is extremely difficult (if not impossible) to verify this with some of
these tools, and that's assuming that the user of these tools knows
enough about how they work where this is a concern to them. I would
argue it's best to stay away from these tools at least until there is more clear and concise legal interpretation of their usage in relation to
copyright.

--
Kenton Groombridge
Gentoo Linux Developer, SELinux Project

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEP+u3AkfbrORB/inCFt7v5V9Ft54FAmXeKYJfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNG RUJCNzAyNDdEQkFDRTQ0MUZFMjlDMjE2REVFRkU1NUY0NUI3OUUACgkQFt7v5V9F t57F6w//cyQ01mguCOERISyYJbgwGfaZ6Yhft+wf7mNU8f0/bHw4t0DKIAsFtKzL 7C7BiS45AAnMJ+jCT6G+RThy1owLojQ/oA/uQ0X9y+KVyIIX4/HN6PPXfqS7Jzai LvuRXl+pv4IkG02gRFHyStW99P9i9qEfIxKRc5KnG5ChT0uZ4RxlaA51UgBUz6d2 OHlE6yhLli6tZQuO8pFeGyN+5RPdni9P0ziLgXBMA4xKw8+fmLG5P/cuHlws1Lee Y4Zo3PTqNsq2iaILc+FQid9i+IVHlEWU/GFpgRG34K+M5uZ/uak8n5SbizgnHEnc aP4/1+J8ScVULlz/bfK4BmOl86E+S0kkNm5ZJjcMoM0Orjye7G6Dn+rzFzKMPeUD xYupUKCd7CcTIc1Ng9RsX7DTqAeRzcitGYecNr+sPxcCLh3LytFklO23PAipVSRU SxDnsH04C8lj2LBFqzu/xjr4TVc5XXmIXCpPk4zezv9z9ld8NtpaemDY7iyjx7sZ f21TzeZ35jChlF3P2GGwRIg8zFclFm3alQnBJupI0VCx58oTx+1PN64B5OdBKR8t zHB8HK4JZM4SLvnRlgX9jf7djd2Ofn/7T3qOXFX5lbKSCBRiG/w8f9/4Odl7jEVU 0fVQUpYbkXYNQKTJnyD6ePlGF0EPNpZxb4fNpwe46dId882cKM8=
=FrQC
----

From Peter =?ISO-8859-1?Q?B=F6hm?=@21:1/5 to All on Tue Feb 27 19:50:01 2024

Am Dienstag, 27. Februar 2024, 18:50:15 CET schrieb Roy Bamford:

On 2024.02.27 14:45, Michał Górny wrote:

Hello,

[...]

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure
shit
doesn't flow in.

Compare with the shitstorm at: https://github.com/pkgxdev/pantry/issues/5358

Michał,

An excellent piece of prose setting out the rationale.
I fully support it.

I would like to add the following:

Last year we had a chatbot in our Gentoo forum that posted 76 posts on 2024-12-19. An inexperienced moderator (me) then asked his colleagues on the basis of which forum rules we can ban this chatbot:

"Do we have a rule somewhere that an AI and a chatbot are not allowed to log in? I have read our Guideĺines ( https://forums.gentoo.org/viewtopic-t-525.html ) and found no such prohibition. On what basis could we even block
a chatbot ?"

The answer from two experienced colleagues was that this is already covered by our forum rules, because chatbots usually cannot (yet) fulfill the requirements
of a forum post and therefore violate our Guideĺines.

To be honest, I asked myself at the time what would happen if we had a clearly recognizable AI as a user that made (reasonably) sensible posts. We would then have no chance of banning this AI user without an explicit prohibition. I would be much more comfortable if we clearly communicated that we do not accept an AI as a user.

Yes, I would also be very happy to see this proposal implemented.

--
Best regards,
Peter (aka pietinger)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Oskari Pirhonen on Wed Feb 28 04:20:02 2024

On Tue, 2024-02-27 at 21:05 -0600, Oskari Pirhonen wrote:

What about cases where someone, say, doesn't have an excellent grasp of English and decides to use, for example, ChatGPT to aid in writing documentation/comments (not code) and puts a note somewhere explicitly mentioning what was AI-generated so that someone else can take a closer
look?

I'd personally not be the biggest fan of this if it wasn't in something
like a PR or ml post where it could be reviewed before being made final.
But the most impportant part IMO would be being up-front about it.

I'm afraid that wouldn't help much. From my experiences, it would be
less effort for us to help writing it from scratch, than trying to
untangle whatever verbose shit ChatGPT generates. Especially that
a person with poor grasp of the language could have trouble telling
whether the generated text is actually meaningful.

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmXepK8SHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQOWDEH/3NS5SaaWaSK2ieuDUP11dCM8r1eHewq +z53NLmcr5673lsZNPrdtSrHuJZWDZv04L9MPJ5mCTujxCz8cACmQDCd7YRZ7wMI gFEMTm5F5kGhvPfGYq2wUh5jh0sgu/7Hcib78fWBdXU21SuGNFEOqli5I4sYrpHS AlXtoMyJ1klcTIS1f/oI6wOUWH7CiOmpII7DYx2C0/xDc+pcMcY/vEbETNPueX7U qlvC8fT2XoPJcK5U87VD+6tCkNkkTuOJq5NUMc4Qodmkg0MTemxu4SI3Oq1i4B11 YDZY4AZZQ8BflZYZ9oGKqBNj/tAZlfCvEcBPhjNnq9xPqDWbeFNTjaY=
=l356
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ulrich Mueller@21:1/5 to All on Wed Feb 28 11:10:02 2024

On Wed, 28 Feb 2024, Michał Górny wrote:

On Tue, 2024-02-27 at 21:05 -0600, Oskari Pirhonen wrote:

What about cases where someone, say, doesn't have an excellent grasp of
English and decides to use, for example, ChatGPT to aid in writing
documentation/comments (not code) and puts a note somewhere explicitly
mentioning what was AI-generated so that someone else can take a closer
look?

I'd personally not be the biggest fan of this if it wasn't in something
like a PR or ml post where it could be reviewed before being made final.
But the most impportant part IMO would be being up-front about it.

I'm afraid that wouldn't help much. From my experiences, it would be
less effort for us to help writing it from scratch, than trying to
untangle whatever verbose shit ChatGPT generates. Especially that
a person with poor grasp of the language could have trouble telling
whether the generated text is actually meaningful.

But where do we draw the line? Are translation tools like DeepL allowed?
I don't see much of a copyright issue for these.

Ulrich

[1] https://www.deepl.com/translator

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmXfBjkPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4uwVUH/3YP1Lm4hTixRX3MynPCLB/KDMTFuXX+khK6 Obb2waiBlFziQSwAI3MoMHF0MB6LRRFsU03IIMcn9zyFt2JugMezXmFSv6JyDB1j ihOBlKIW3va/dRKsZ5XLiqY9A5c0/SgK/vVCS6xGJsp95JY05YqMDPNyHFCfgH+Y gx0qr0XM5RpQkI80L2HUgIJQ9LDKleEk9KpSzRNQxESS+fW6ORL3mj4UefT1QQH1 beDq33QTnOuH1YbOzjvnies6TjVrzjsQGmpSvTwgrkPdyIuGKWSZroFI83lUjXUe p7DjHcxZULwFjXdWqXDjnw64pYlqKFit/E+oEPLa0koUH5Tsa+E=Lldm
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Ulrich Mueller on Wed Feb 28 14:20:02 2024

On Wed, 2024-02-28 at 11:08 +0100, Ulrich Mueller wrote:

On Wed, 28 Feb 2024, Michał Górny wrote:

On Tue, 2024-02-27 at 21:05 -0600, Oskari Pirhonen wrote:

What about cases where someone, say, doesn't have an excellent grasp of English and decides to use, for example, ChatGPT to aid in writing documentation/comments (not code) and puts a note somewhere explicitly mentioning what was AI-generated so that someone else can take a closer look?

I'd personally not be the biggest fan of this if it wasn't in something like a PR or ml post where it could be reviewed before being made final. But the most impportant part IMO would be being up-front about it.

I'm afraid that wouldn't help much. From my experiences, it would be
less effort for us to help writing it from scratch, than trying to
untangle whatever verbose shit ChatGPT generates. Especially that
a person with poor grasp of the language could have trouble telling
whether the generated text is actually meaningful.

But where do we draw the line? Are translation tools like DeepL allowed?
I don't see much of a copyright issue for these.

I have a strong suspicion that these translation tools are trained
on copyrighted translations of books and other copyrighted material.

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmXfMKQSHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQO/QQIAMZVFchagypZXBx38Ppsi/5HbQixpYmo 5VBJSNRQ0WkH01j9m3LAobXtzkrOHv4sMufwzP/0XxFX80EYFH+4GTYPoKwEUcRz sl0D4IGEhGLD2e+plTuoSa3xvczl9XThMVhVEKx/IadgBYuorM+N+0UIY6I8iDyK KqiVULvkj7Oo3ryV5K8ZPHs1uLUcy2gdDumwF2W4IoPqmn5d2as7qUiHHEmhnhd/ ihuNciBRnJlh8VHK1Jw2IPClAkiPzCOFtJVWiZ1z2FPR4SMkXxY9870y+KhbjWzQ LiOiu40q21bZPb4AAMAAfbFmiAooMQoj/sivJQnrzKsONnPe6dvr7lo=
=xP9h
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arthur Zamarin@21:1/5 to All on Wed Feb 28 20:00:02 2024

This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------rWGTlDDrclTedC0gcSWvpi6L
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 27/02/2024 16.45, Michał Górny wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

I support this motion.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

I know that GitHub Copilot can be limited to licenses, and even to just
the current repository. Even though, I'm not sure that the copyright can
be attributed to "me" and not the "AI" - so still gray area.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

Let me tell a story. I was interested if I can teach an LLM the ebuild
format, as a possible helper tool for devs/non-devs. My prompt got so
huge, where I was teaching it all the stuff of ebuilds, where to input
the source code (eclasses), and such. At one point, it even managed to
output a close enough python distutils-r1 ebuild - the same level that
`vim dev-python/${PN}/${PN}-${PV}.ebuild` creates using the gentoo
template. Yes, my long work resulted in no gain.

For each other ebuild type: cmake, meson, go, rust - I always got
garbage ebuild. Yes, it was generating a good DESCRIPTION and HOMEPAGE
(simple stuff to copy from upstream) and even 60% accuracy for LICENSE.
But did you know we have "intel80386" arch for KEYWORDS? We can RESTRICT="install"? We can use "^cat-pkg/pkg-1" syntax in deps? PATCHES
with http urls inside? And the list goes on. Sometimes it was even funny.

So until a good prompt can be created for gentoo, upon which we *might*
reopen discussion, I'm strongly supporting banning AI generating
ebuilds. Currently good templates per category, and just copying other
ebuilds as starting point, and even just skel.ebuild - all those 3
options bring much better result and less time waste for developers.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Many companies who use AI as reason for layoff are just creating a
reasoning out of bad will, or ignorance. The company I work at is using
AI tools as a boost for productivity, but at all levels of management
they know that AI can't replace a person - best case boost him 5-10%.
The current real reason for layoffs is tightening of budget movement
cross the industry (just a normal cycle, soon it would get better), so management prefer to layoff not themselves. So yeah, sad world.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

Great read, really much WTF. This whole repo is just a cluster of AIs
competing against each other.

--
Arthur Zamarin
arthurzam@gentoo.org
Gentoo Linux developer (Python, pkgcore stack, Arch Teams, GURU)

--------------rWGTlDDrclTedC0gcSWvpi6L--

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEE/axFlFuH2ptjtO5EAqCvUD0SBQQFAmXfgGgACgkQAqCvUD0S BQQB+Qf+PP6wY8ypFI8UEsM+ISY0yj7uz3th8eB8xd+/7ZyDiyYG32gMsbTUQoIB J9LiwzCwWniw5GMMjfzOZkDKGvFmZNH8jI736I7ZZPdJTpWljekyL6b04v3NERM3 iFOnVQ44R/hpET38v8HTPcpTdf61SwyJylsoufUALI53yze+DNZL/cyeyjT5VAq+ 31apbq6mxfp7/rZi9BvgQxD7G6uSdLJFUEDlwpFc+nX1OIh176Dg3Icb6Z2zXoFF OpXG1PkuJZs1day/7olwLA1sYIPkJ6Q6IqePS3VLIFQ1V0bQ3UUsEA1TWsUC9USO huJrPZGIlcbFwznkHNYxypGCosRNOw==
=Pewk
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich Freeman@21:1/5 to arthurzam@gentoo.org on Wed Feb 28 20:30:01 2024

On Wed, Feb 28, 2024 at 1:50 PM Arthur Zamarin <arthurzam@gentoo.org> wrote:

I know that GitHub Copilot can be limited to licenses, and even to just
the current repository. Even though, I'm not sure that the copyright can
be attributed to "me" and not the "AI" - so still gray area.

So, AI copyright is a bit of a poorly defined area simply due to a
lack of case law. I'm not all that confident that courts won't make
an even bigger mess of it.

There are half a dozen different directions I think a court might rule
on the matter of authorship and derived works, but I think it is VERY
unlikely that a court will rule that the copyright will be attributed
to the AI itself, or that the AI itself ever was an author or held any
legal rights to the work at any point in time. An AI is not a legal
entity. The company that provides the service, its
employees/developers, the end user, and the authors and copyright
holders of works used to train the AI are all entities a court is
likely to consider as having some kind of a role.

That said, we live in a world where it isn't even clear if APIs can be copyrighted, though in practice enforcing such a copyright might be
impossible. It could be a while before AI copyright concerns are
firmly settled. When they are, I suspect it will be done in a way
that frustrates just about everybody on every side...

IMO the main risk to an organization (especially a transparent one
like ours) from AI code isn't even whether it is copyrightable or not,
but rather getting pulled into arguments and debates and possibly
litigation over what is likely to be boilerplate code that needs a lot
of cleanup anyway. Even if you "win" in court or the court of public
opinion, the victory can be pyrrhic.

--
Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Zoltan Puskas@21:1/5 to All on Fri Mar 1 07:40:01 2024

Hi,

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

Thank you for this, it made my day.

Though I'm just a proxy maintainer for now, I also support this initiative, there should be some guard rails set up around LLM usage.

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

IANAL, but IMHO if we stop respecting copyright law, even if indirectly via LLMs, why should we expect others to respect our licenses? It could be prudent to wait and see where will this land.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

From my personal experience of using Github Copilot fine tuned on a large private code base, it functions mostly okay as a more smart auto complete on a single line of code, but when it comes to multiple lines of code, even when it comes to filling out boiler plate code, it's at best a 'meh'. The problem is that while the output looks okay-ish, often it will have subtle mistakes or will
hallucinate some random additional stuff not relevant to the source file in question, so one ends up having to read and analyze the entire output of the LLM
to fix problems with the code. I found that the mental and time overhead rarely makes it worth it, especially when a template can do a better job (e.g. this would be the case for ebuilds).

Since during reviews we are supposed to be reading the entire contribution, not sure how much difference this makes, but I can see a developer trusting LLM
too much might end up outsourcing the checking of the code to the reviewers, which means we need to be extra vigilant and could lead to reduced trust of contributions.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

I agree. I'm already tired of AI generated blog spam and so forth, such a waste of time and quite annoying. I'd rather not have that on our wiki pages too. The purpose of documenting things is to explain an area to someone new to it or writing down unique quirks of a setup or a system. Since LLMs cannot write new original things, just rehash information it has seen I'm not sure how could it be helpful for this at all to be honest.

Overall my time is too valuable to shift through AI generated BS when I'm trying
to solve a problem, I'd prefer we keep a well curated high quality documentation
where possible.

Zoltan

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEyzsUa/Bn/zts2f9oJoL7EYNVSs4FAmXhdsoACgkQJoL7EYNV Ss5vHA//Tc15qYkGvmWjVfcM4oxycnfNweU1wkH3Z+h16FPHkVry93G2TeobaaMp HobVJv9io6+b9rtGuHX9GBBaDhH0Yym3AdXiDpt63ssNU3aKL/lphBKg+epM55JR 8tVQVQp2kjg36YM9JbIsrG4C9BXU/GBv7d1yTp97v0BiytvNb3YMDoxFYMsg/jSz msHto4WdB7Uu5A+n/mDCQs8Kf3bLote5Cr0jei+dKDiIKYFp/kwReScNWZvoNYEo MUnQurqWoDXnbSVgdyiknBNhpzcpoSeEa+I4UtjHhZJvFHiOBHKpGYM1eiClCb2q Gxo3G3QJeuMO9yUaxC9IhkJVqqdcUxa08Pwl0VcKREouyWrpkBGmKSI5GcsvMQHz jjsTgchNkG5McfIji08/M47/ls8uEMDo3dkGQz0JYqsT30nQqZSR92CEASKTg7m/ XipXkAMEWYd2zOif9CLzrQePHpK+FJumbKfYmUzjcRt1yODqtk/u29oAUxy33Xg5 0SacT0EFg/uFURuljsw0ZzUlf+6c2/ewyaxDadJOHt30mk9uPHK8SQl4/Liy36/e GZ12nwDPYuUNn/9MttgnHT7UNt9sWfLnooH9c6vKKtDJeJNwDcZ3gLmOGgUqsP58 2MONIX/dX71lnEp8YXAS+abPyjU34zQCmxV2USlhyqKV+aHuLxs=
=erhu
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam James@21:1/5 to Matt Jolly on Fri Mar 1 08:10:02 2024

Matt Jolly <kangie@gentoo.org> writes:

But where do we draw the line? Are translation tools like DeepL
allowed? I don't see much of a copyright issue for these.

I'd also like to jump in and play devil's advocate. There's a fair
chance that this is because I just got back from a
supercomputing/research conf where LLMs were the hot topic in every keynote.

As mentioned by Sam, this RFC is performative. Any users that are going
to abuse LLMs are going to do it _anyway_, regardless of the rules. We already rely on common sense to filter these out; we're always going to
have BS/Spam PRs and bugs - I don't really think that the content being generated by LLM is really any worse.

This doesn't mean that I think we should blanket allow poor quality LLM contributions. It's especially important that we take into account the potential for bias, factual errors, and outright plagarism when these
tools are used incorrectly. We already have methods for weeding out low quality contributions and bad faith contributors - let's trust in these
and see what we can do to strengthen these tools and processes.

A bit closer to home for me, what about using a LLMs as an assistive technology / to reduce boilerplate? I'm recovering from RSI - I don't
know when (if...) I'll be able to type like I used to again. If a model
is able to infer some mostly salvagable boilerplate from its context
window I'm going to use it and spend the effort I would writing that to
fix something else; an outright ban on LLM use will reduce my _ability_
to contribute to the project.

Another person approached me after this RFC and asked whether tooling restricted to the current repo would be okay. For me, that'd be mostly acceptable, given it won't make suggestions based on copyrighted code.

I also don't have a problem with LLMs being used to help refine commit
messages as long as someone is being sensible about it (e.g. if, as in
your situation, you know what you want to say but you can't type much).

I don't know how to phrase a policy off the top of my head which allows
those two things but not the rest.

What about using a LLM for code documentation? Some models can do a
passable job of writing decent quality function documentation and, in production, I _have_ caught real issues in my logic this way. Why should
I type that out (and write what I think the code does rather than what
it actually does) if an LLM can get 'close enough' and I only need to do light editing?

I suppose in that sense, it's the same as blindly listening to any
linting tool or warning without understanding what it's flagging and if
it's correct.

[...]
As a final not-so-hypothetical, what about a LLM trained on Gentoo docs
and repos, or more likely trained on exclusively open-source
contributions and fine-tuned on Gentoo specifics? I'm in the process of spinning up several models at work to get a handle on the tech / turn
more electricity into heat - this is a real possibility (if I can ever
find the time).

I think that'd be interesting. It also does a good job as a rhetorical
point wrt the policy being a bit too blanket here.

See https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/ too.

The cat is out of the bag when it comes to LLMs. In my real-world job I
talk to scientists and engineers using these things (for their
strengths) to quickly iterate on designs, to summarise experimental
results, and even to generate testable hypotheses. We're only going to
see increasing use of this technology going forward.

TL;DR: I think this is a bad idea. We already have effective mechanisms
for dealing with spam and bad faith contributions. Banning LLM use by
Gentoo contributors at this point is just throwing the baby out with the bathwater.

The problem is that in FOSS, a lot of people are getting flooded with AI
spam and therefore have little regard for any possibly-good parts of it.

I count myself as part of that group - it's very much sludge and I feel
tired just seeing it talked about at the moment.

Is that super rational? No, but we're also volunteers and it's not
unreasonable for said volunteers to then say "well I don't want any more
of that".

I think this colours a lot of the responses here, and it doesn't
invalidate them, but it also explains why nobody is really interested
in being open to this for now. Who can blame them (me included)?

As an alternative I'd be very happy some guidelines for the use of LLMs
and other assistive technologies like "Don't use LLM code snippets
unless you understand them", "Don't blindly copy and paste LLM output",
or, my personal favourite, "Don't be a jerk to our poor bug wranglers".

A blanket "No completely AI/LLM generated works" might be fine, too.

Let's see how the legal issues shake out before we start pre-emptively banning useful tools. There's a lot of ongoing action in this space - at
the very least I'd like to see some thorough discussion of the legal
issues separately if we're making a case for banning an entire class of technology.

I'm sympathetic to the arguments you've made here and I don't want to
act like this sinks your whole argument (it doesn't), but this is
typically not how legal issues are approached. People act conservatively
if there's risk to them, not the other way around ;)

[...]

Thanks for making me think a bit more about it and considering some
use cases I hadn't really thought about.

I still don't really want ebuilds generated by LLMs, but I could live
with:
a) LLMs being used to refine commit messages;
b) LLMs being used if restricted to suggestions from a FOSS-licenced
codebase

Matt

thanks,
sam

-----BEGIN PGP SIGNATURE-----

iOUEARYKAI0WIQQlpruI3Zt2TGtVQcJzhAn1IN+RkAUCZeF+Zl8UgAAAAAAuAChp c3N1ZXItZnByQG5vdGF0aW9ucy5vcGVucGdwLmZpZnRoaG9yc2VtYW4ubmV0MjVB NkJCODhERDlCNzY0QzZCNTU0MUMyNzM4NDA5RjUyMERGOTE5MA8cc2FtQGdlbnRv by5vcmcACgkQc4QJ9SDfkZBnLgD/R33dn9qe8zVFLg4qW0bYb1LM/MPBy/KmGFHV lSu2bxUBAP9H65vywZEhy+/bIIxt6RyQr8t7syE4lqfrIcZ805sB
=0U6e
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Robin H. Johnson@21:1/5 to All on Tue Mar 5 07:20:01 2024

(Full disclosure: I presently work for a non-FAANG cloud company
with a primary business focus in providing GPU access, for AI & other workloads; I don't feel that is a conflict of interest, but understand
that others might not feel the same way).

Yes, we need to formally address the concerns.
However, I don't come to the same conclusion about an outright ban.

I think we need to:
1. Short-term, clearly point out why much of the present outputs
would violate existing policies. Esp. the low-grade garbage output.
2. Short & medium-term: a time-limited policy saying "no AI-backend
works temporarily, while waiting for legal precedent", which clear
guidelines about what is being the blocking deal.
3. Longer-term, produce a policy that shows how AI generation can be
used for good, in a safe way**.
4. Keep the human in the loop; no garbage reinforcing garbage.

Further points inline.

On Tue, Feb 27, 2024 at 03:45:17PM +0100, Michał Górny wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Are there footholds where you see AI tooling would be acceptable to you
today? AI-summarization of inputs, if correct & free of hallucinations,
is likely to be of immediate value. I see this coming up in terms of
analyzing code backtraces as well as better license analysis tooling.
The best tools here include citations that should be verified as to why
the system thinks the outcome is correct: buyer-beware if you don't
verify the citations.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

The Gentoo Foundation (and SPI) are both US legal entities. That means
at least abiding by US copyright law...
As of writing this, the present US Copyright office says AI-generated
works are NOT eligible for their *own* copyright registration. The
outputs are either un-copyrightable or if they are sufficiently
similarly to existing works, that original copyright stands (with
license and authorship markings required).

That's going to be a problem if the EU, UK & other major WIPO members
come to a different conclusion, but for now, as a US-based organization,
Gentoo has the rules it must follow.

The fact that it *might* be uncopyrightable, and NOT tagged as such
gives me equal concern to the missing attribution & license statements.
Enough untagged uncopyrightable material present MAY invalidate larger copyrights.

Clearer definitions about the distinction between public domain vs uncopyrightable are also required in our Gentoo documentation (at a high level ineligible vs not copyrighted vs expired vs laws/acts-of-government vs works-of-government, but there is nuance).

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

100% agree; The quality of output is the largest concern *right now*.
The consistency of output is strongly related: given similar inputs
(including best practices not changing over time), it should give
similar outputs.

How good must the output be to negate this concern?
Current-state-of-the-art can probably write ebuilds with fewer QA
violations than most contributors, esp. given automated QA checking
tools for a positive reinforcement loop.

Besides the actual output being low-quality, the larger problem is that
users submitting it don't realize that it's low-quality (or in a few
cases don't care).

Gentoo's existing policies may only need tweaks & re-iteration here.
- GLEP76 does not set out clear guidelines for uncopyrightable works.
- GLEP76 should have a clarification that asserting GCO/DCO over
AI-generated works at this time is not acceptable.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Is an ethical AI entity possible? Your argument here is really an
extension of a much older maxim: "There's no ethical consumption under capitalism". This can encompass most tech corporations, AI or not.
It's just much more readily exposed with AI than other "big tech"
movements, because AI and the name of AI is being used do immoral &
unethical things far more frequently that before.

An truly ethical AI entity should also not be the outcome of
rent-seeking behaviors (maybe profit-seeking, but that returns to the
perils of capitalism).

The energy waste argument is also one that needs to be made carefully: The training & fine-tuning phases today are energy wastes, only compared to the lifetime energy usage of a human to learn the same things. When that gets more efficient, the human may be the energy waste ;-) [1].

The generation/inference phases may be able to generate correct output
MUCH more efficiently than a human. If I think of how many times I run
"ebuild ... test" and "pkgcheck scan" some packaging, trying to get it
correct: the AI will be able to do a better job than most developers in reasonable course of time...

Gentoo's purpose as an organization, is not to be arbiters of ethics: we
can stand against unethical actions. Where is that middle ground?

At the top, I noted that it will be possible in future for AI generation
to be used in a good, safe way, and we should provide some signals to
the researchers behind the AI industry on this matter.

What should it have?
- The output has correct license & copyright attributions for portions that are copyrightable.
- The output explicitly disclaims copyright for uncopyrightable portions
(yes, this is a higher bar than we set for humans today).
- The output is provably correct (QA checks, actually running tests etc)
- The output is free of non-functional/nonsense garbage.
- The output is free of hallucinations (aka don't invent dependencies that don't exist).

Can you please contribute other requirements that you feel "good" AI output should have?

[1]
Citation needed; Best estimate I have says: https://www.eia.gov/tools/faqs/faq.php?id=85&t=1 76 MMBtu/person/year https://www.wolframalpha.com/input?i=+76+MMBtu+to+MWh => 22.27 MWh/person/year vs
Facebook claims entire model development energy consumption on all 4 sizes of LLaMA was 2,638 MWh
https://kaspergroesludvigsen.medium.com/facebook-disclose-the-carbon-footprint-of-their-new-llama-models-9629a3c5c28b

2638 / 22.27 => 118.45 people
So Development energy was the same as 118 average people doing average things for a year.
(not CompSci students compiling their code many times).

The outcome here: don't use AI where a human would be much more efficient, unless you have strong reasons why it would be better to use the AI than a human. We haven't crossed that threshold YET, but the day is coming, esp. with amortized costs that training is a rare event compared to inference.

--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
Comment: Robbat2 @ Orbis-Terrarum Networks - The text below is a digital signature. If it doesn't make any sense to you, ignore it.

iQKTBAABCgB9FiEEveu2pS8Vb98xaNkRGTlfI8WIJsQFAmXmt7RfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEJE RUJCNkE1MkYxNTZGREYzMTY4RDkxMTE5Mzk1RjIzQzU4ODI2QzQACgkQGTlfI8WI JsRLNQ/+IURDfiHOCl5EmNpU73ivp/YwORZRh3N7RxiYCWVKzwSxBvqybhVQ6xrl 4f0pTbdxZgYA8kgvejWRQJIJhqKSHU9bq0adm3syKS9XxPk2g/vGbrttM4ASzFi+ VS7RzMJvycU1l+LH4ybeMLaCTrItVWqSJk5EbmCtnlPLJ1ZuD16p2SMpH4SuH4m3 zAsZJ55FaeO2K2hRNyhBUG/c9IpKe3/m+9i82LeGyaKzwZ7vLMWFzukumpNaVz99 7WiEyCV5fS7RYDzmiCIeqq+1B5ilVjp2dYlqa3GbytEsKJQAVWNMz5UaMRB7it0E oAYWlkBQCwDgE+x+a/Pf/+OM92wU013gvswaur5bIRG5JZp/GQDMrKd3ycfQtID4
cDvu+E3f

From martin-kokos@21:1/5 to mgorny@gentoo.org on Wed Mar 6 15:00:01 2024

On Tuesday, February 27th, 2024 at 3:45 PM, Michał Górny <mgorny@gentoo.org> wrote:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

--
Best regards,
Michał Górny

While I understand the concerns that may have triggered feeling the need for a rule like this. As someone from the field of machine learning (AI) engineer, I feel I need to add my brief opinion.

The pkgxdev thing very artificial and if there is a threat to quality/integrity it will not manifest itself as obviously which brings me to..

A rule like this is just not enforceable.

The contributor as they're signed is responsible for the quality of the contribution, even if it's been written by plain editor, dev environment with smart plugins (LSP) or their dog.

Other organizations have already had to deal with automated contributions which can sometimes go wrong for *all different* kinds of reasons for much longer and their approach may be an inspiration:
[0] OpenStreetMap: automated edits - https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct
[1] Wikipedia: bot policy - https://en.wikipedia.org/wiki/Wikipedia:Bot_policy The AI that we are dealing right now is just another means of automation after all.

As a machine learning engineer myself, I was contemplating creating an instance of a generative model myself for my own use from my own data, in which case the copyright and ethical point would absolutely not apply.
Also, there are ethically and copyright-ok language model projects such as project Bergamo [2] vetted by universities and EU, also used by [3] Mozilla (one of the prominent ethical AI proponents).

Banning all tools, just because some might be not up to moral standards, puts the ones that are, in a disadvantage in our world as a whole.

[2] Project Bergamo - https://browser.mt/
[3] Mozilla blog: training translation models - https://hacks.mozilla.org/2022/06/training-efficient-neural-network-models-for-firefox-translations/

- Martin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Duncan@21:1/5 to All on Fri Mar 8 05:00:01 2024

Robin H. Johnson posted on Tue, 5 Mar 2024 06:12:06 +0000 as excerpted:

The energy waste argument is also one that needs to be made carefully:

Indeed. In a Gentoo context, condemning AI for the computative energy
waste? Maybe someone could argue that effectively. That someone isn't
Gentoo. Something about people living in glass houses throwing stones...

(And overall, I just don't see the original proposal aging well; like a regulation that all drivers must carry a buggy-whip... =:^ Absolutely,
tweak existing policies with some added AI context here or there as others
have already suggested, but let's leave it at that.)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Fco. Javier Felix Belmonte@21:1/5 to All on Fri Mar 8 08:20:02 2024

El 27/2/24 a las 15:45, Michał Górny escribió:

Hello,

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

I think it would be a big mistake, because in the end we are not
shooting in the foot (I use translate, it doesn't mean the same thing in English anyway)
In the end it is a helping tool and in the end there is always human intervention to finish the job.

In the end we are going to have to live with AIs in all the environments
of our lives. The sooner we know how to manage them, the more productive
we will be.

Rationale:

1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.

2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.

3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.

Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.

Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Sam James on Sat Mar 9 16:00:01 2024

On Tue, 2024-02-27 at 18:04 +0000, Sam James wrote:

I'm a bit worried this is slightly performative - which is not a dig at
you at all - given we can't really enforce it, and it requires honesty,
but that's also not a reason to not try ;)

I don't think it's really possible or feasible to reliably detect such contributions, and even if it were, I don't think we want to go as far
as to actively pursue anything that looks like one. The point
of the policy is rather to make a statement that we don't want these,
and to kindly ask users not to do that.

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmXseMgSHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQODQ0H/iHHuPvSF8N1I31C4yWjPXE+gGfoxnUc kfnEXGV1HHstiwpabdqWt1APPhSnC6MLjsLTP0+GUl3gqTt79S9sEf+RUtXwnwi/ 6AjU66s8Tusbh1paLWL2O++Z0ukVc0nV/1Ufw2DW6zq9OFMGSUk3oPm5bhHJ07aU dE4j6MfMr7rlPYTvFSqbJMbWCeE7WcFJiauVl0xBKhUT87b4VfCbsafv9uco9LjA t3AKia8qr2UeZRcfXBwh/uMJdwtZuLCm7O06tOzT7YEQC6hy5htfhfE9EFe/1cM0 r3VcZlAnxGtZHOEujXcH3ogiHuDKTehiz08oxYpQQZtBY1vzIoCT4HQ=
=taot
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Duncan on Sat Mar 9 16:10:01 2024

On Fri, 2024-03-08 at 03:59 +0000, Duncan wrote:

Robin H. Johnson posted on Tue, 5 Mar 2024 06:12:06 +0000 as excerpted:

The energy waste argument is also one that needs to be made carefully:

Indeed. In a Gentoo context, condemning AI for the computative energy waste? Maybe someone could argue that effectively. That someone isn't Gentoo. Something about people living in glass houses throwing stones...

Could you support that claim with actual numbers? Particularly,
on average energy use specifically due to use of Gentoo on machines vs.
energy use of dedicated data centers purely for training LLMs? I'm not
even talking of all the energy wasted as a result of these LLMs at work.

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmXsepoSHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQOingIAMY9SFe0ROqGEAps3wdq+ndNdZGkhalp qaqx3s5Vl98Pf7EMTA15fiu+rUDLT413UeJlw9zCHHzawr/xEqFuCKEbmPV2BfuE 7CVCYTETSmIi1XlOfq/oFpVmjyszqsfswDJKt0V9HCWmqzEkChpfBDNtAdyQidl4 0vfPlgDo5vkyDOPlkSALPuJXPvxKxmbgN0I7AxqbkUrlLrQnNkPB5iHHSbcWp0qf GL2q4bM/Y8JpFTvF/tEyKsHBFtH9+AIkoO08MEkOhFBb1nng1Br01BqJmom8vRLJ yZ1IWFL5vG1kamtb+GUP80SRQyh89NilOyKgllmcNQ1wcWocHym0TFQ=
=K6iK
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Sam James on Sat Mar 9 16:10:01 2024

On Fri, 2024-03-01 at 07:06 +0000, Sam James wrote:

Another person approached me after this RFC and asked whether tooling restricted to the current repo would be okay. For me, that'd be mostly acceptable, given it won't make suggestions based on copyrighted code.

I think an important question is: how is it restricted? Are we talking
about a tool that was clearly trained on specific code, or about a tool
that was trained on potentially copyright material, then artificially restricted to the repository (to paper over the concerns)? Can we trust
the latter?

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmXseXISHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQOzIgIAINqoJ2MJtEzCMlDLJacn1l78F09fVUv XJOz4MQimGhwbX7mMiUL1KiNsev1kRrzIBy7lWhAfFbfVdFupKG9LE/qt81PlDIb RLQlMHbSkXOWBzOhxinUSbE6xiTVBAIIPIDvKY/aL6JzKfJ2j8pTVoWSpGwoJFMz 4ZFO9W+AJwmLLlzevm/ZEHVl512Ep/OtRg9oz912Ewe0+JiknCnF85BXixbj6Pof PTz030vIrbzAbAW2W2IM8cCbsBYU2iqHSqmWVY/6d2Q3ZqFK1YgiM2ARDUBY12M1 QGQ2381hBSEL76ogBqabcOK3bpM3GzCGgcrpSlk7aE5qeOx55aVoByo=
=6diG
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Duncan@21:1/5 to All on Sat Mar 9 22:20:01 2024

Michał Górny posted on Sat, 09 Mar 2024 16:04:58 +0100 as excerpted:

On Fri, 2024-03-08 at 03:59 +0000, Duncan wrote:

Robin H. Johnson posted on Tue, 5 Mar 2024 06:12:06 +0000 as excerpted:

The energy waste argument is also one that needs to be made
carefully:

Indeed. In a Gentoo context, condemning AI for the computative energy
waste? Maybe someone could argue that effectively. That someone isn't
Gentoo. Something about people living in glass houses throwing
stones...

Could you support that claim with actual numbers? Particularly,
on average energy use specifically due to use of Gentoo on machines vs. energy use of dedicated data centers purely for training LLMs? I'm not
even talking of all the energy wasted as a result of these LLMs at work.

Fair question. Actual numbers? No. But...

I'm not saying don't use gentoo -- I'm a gentooer after all -- I'm saying gentoo simply isn't in a good position to condemn AI for its energy inefficiency. In fact, I'd claim that in the Gentoo case there are demonstrably more energy efficient practical alternatives (can anyone
sanely argue otherwise?, there are binary distros after all), while in the
AI case, for some usage AI is providing practical solutions where there
simply /weren't/ practical solutions /at/ /all/ before. In others, availability and scale was practically and severely cost-limiting compared
to the situation with AI. At least in those cases despite high energy
usage, AI *is* the most efficient -- arguably including energy efficient
-- practical alternative, being the _only_ practical alternative, at least
at scale. Can Gentoo _ever_ be called the _only_ practical alternative,
at scale or not?

Over all, I'd suggest that Gentoo is in as bad or worse a situation in
terms of most energy efficient practical alternative than AI, so it simply can't credibly make the energy efficiency argument against AI. Debian/ RedHat/etc, perhaps, a case could be reasonably made at least, Gentoo, no,
not credibly.

That isn't to say that Gentoo can't credibly take an anti-AI position
based on the /other/ points discussed in-thread. But energy usage is just
not an argument that can be persuasively made by Gentoo, thereby bringing
down the credibility of the other arguments made with it that are
otherwise viable.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to All on Thu Mar 21 16:30:01 2024

On Tue, 2024-02-27 at 15:45 +0100, Michał Górny wrote:

Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.

Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.

Since I've been asked to flesh out a specific motion, here's what I
propose specifically:

"""
It is expressly forbidden to contribute to Gentoo any content that has
been created with the assistance of Natural Language Processing
artificial intelligence tools. This motion can be revisited, should
a case been made over such a tool that does not pose copyright, ethical
and quality concerns.
"""

This explicitly covers all GPTs, including ChatGPT and Copilot, which is
the category causing the most concern at the moment. At the same time,
it doesn't block more specific uses of machine learning to problem
solving.

Special thanks to Arthur Zamarin for consulting me on this.

--
Best regards,
Michał Górny

-----BEGIN PGP SIGNATURE-----

iQFGBAABCgAwFiEEx2qEUJQJjSjMiybFY5ra4jKeJA4FAmX8UWwSHG1nb3JueUBn ZW50b28ub3JnAAoJEGOa2uIyniQO+okH/j1EGuPvFkWjymZi1F3GJI9Enst5ngUc Cb2Q8yK1HDjM/5O6l00QeNdt4/XEWfS8X8I2ckoWmDFEoB9a3DXPtcnfDrK25CNg fwW/Asb6l8eTPX9xLgIqgk4oYw8b/LU3ZQ5kkHiyZMOCOeB2bYz1k5E+rO2KIl6G 56ITC6bKasGE0ADjp8po09Mf+TkV/a1GIpfD34O4is92J+QfX7veM02t5JDnpwmp 9WRV7uKKPqFwkR81iaxzO5yDecAkL5mrhztPdeVdfqdIJc/wWhZIUyOeSQ2fPh6G hhffLRU7foO3l5sCKO260d5vNbBzBiWThFtwti5OhSXfE1vs1HwbDjA=
=7D3v
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Brianm
  Fri May 3 21:42:05 2024
  from Glasgow via Raw
- Daniel Garrod
  Fri May 3 14:16:49 2024
  from Cambridge, Uk via Telnet
- Shaun Christiansen
  Sat May 4 09:31:18 2024
  from Brisbane Qld via Telnet
- Keyop
  Sat May 4 02:49:29 2024
  from Huddersfield, West Yorkshire via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	87:24:52
Calls:	6,697
Calls today:	2
Files:	12,230
Messages:	5,348,227

[gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions

Who's Online

Recent Visitors

System Info