• unexpected failure for GSS Pg server

    From Matt Zagrabelny@21:1/5 to kerberos on Tue Feb 8 11:54:13 2022
    Greetings,

    I'm experiencing a failure between a GSS enabled Postgresql server and my
    CLI client.

    To my knowledge nothing has changed on the system to create this failure. I
    did modify some puppet configs, but according to the puppet log output (and stat'ing /etc/postgresql-common/krb5.keytab) file contents did not change.

    Here are the errors on on the Pg side:

    2022-02-08 11:29:59.304 CST [19401] [unknown]@[unknown] FATAL: unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0
    2022-02-08 11:29:59.315 CST [19402] mzagrabe@somedb FATAL: accepting GSS security context failed
    2022-02-08 11:29:59.315 CST [19402] mzagrabe@somedb DETAIL: Unspecified
    GSS failure. Minor code may provide more information: Request ticket
    server postgres/db.example.com@EXAMPLE.COM kvno 3 not found in keytab;
    keytab is likely out of date

    NOTE: I still have some Pg servers where my GSS authentication is not
    failing. On those systems, Pg still logs the first line above: "unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0". I only included
    it above for full disclosure.

    Here is the error on the client side:

    GSSAPI continuation error: Unspecified GSS failure. Minor code may provide more information: Key version is not available

    Here is what klist has to say...

    Valid starting Expires Service principal
    02/08/2022 11:10:57 02/08/2022 21:10:57 krbtgt/EXAMPLE.COM@EXAMPLE.COM
    renew until 02/09/2022 11:10:46
    02/08/2022 11:11:05 02/08/2022 21:10:57 postgres/
    db.example.com@EXAMPLE.COM
    renew until 02/09/2022 11:10:46

    Looking at the krb5kdc logs, I see no differences between successful GSS Pg auths and the failure mentioned above:
    Feb 8 11:36:42 auth krb5kdc[434]: TGS_REQ (8 etypes {18 17 20 19 16 23 25
    26}) [IPV6 redacted]: ISSUE: authtime 1644340257, etypes {rep=18 tkt=18 ses=18}, mzagrabe@EXAMPLE.COM for postgres/db.example.com@EXAMPLE.COM
    Feb 8 11:36:42 auth krb5kdc[434]: closing down fd 14
    Feb 8 11:42:45 auth krb5kdc[434]: TGS_REQ (8 etypes {18 17 20 19 16 23 25
    26}) [IPV6 redacted]: ISSUE: authtime 1644340257, etypes {rep=18 tkt=18 ses=18}, mzagrabe@EXAMPLE.COM for postgres/
    successful.example.com@EXAMPLE.COM
    Feb 8 11:42:45 auth krb5kdc[434]: closing down fd 14

    Does anyone have any ideas of where to look to start to debug this issue?

    Thanks for any help!

    -m

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matt Zagrabelny@21:1/5 to kerberos on Tue Feb 8 12:28:21 2022
    On Tue, Feb 8, 2022 at 11:54 AM Matt Zagrabelny <mzagrabe@d.umn.edu> wrote:

    Greetings,

    I'm experiencing a failure between a GSS enabled Postgresql server and my
    CLI client.

    To my knowledge nothing has changed on the system to create this failure.
    I did modify some puppet configs, but according to the puppet log output
    (and stat'ing /etc/postgresql-common/krb5.keytab) file contents did not change.

    Here are the errors on on the Pg side:

    2022-02-08 11:29:59.304 CST [19401] [unknown]@[unknown] FATAL:
    unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0 2022-02-08 11:29:59.315 CST [19402] mzagrabe@somedb FATAL: accepting GSS security context failed
    2022-02-08 11:29:59.315 CST [19402] mzagrabe@somedb DETAIL: Unspecified
    GSS failure. Minor code may provide more information: Request ticket
    server postgres/db.example.com@EXAMPLE.COM kvno 3 not found in keytab;
    keytab is likely out of date

    NOTE: I still have some Pg servers where my GSS authentication is not failing. On those systems, Pg still logs the first line above: "unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0". I only included
    it above for full disclosure.

    Here is the error on the client side:

    GSSAPI continuation error: Unspecified GSS failure. Minor code may
    provide more information: Key version is not available

    Here is what klist has to say...

    Valid starting Expires Service principal
    02/08/2022 11:10:57 02/08/2022 21:10:57 krbtgt/EXAMPLE.COM@EXAMPLE.COM
    renew until 02/09/2022 11:10:46
    02/08/2022 11:11:05 02/08/2022 21:10:57 postgres/ db.example.com@EXAMPLE.COM
    renew until 02/09/2022 11:10:46

    Looking at the krb5kdc logs, I see no differences between successful GSS
    Pg auths and the failure mentioned above:
    Feb 8 11:36:42 auth krb5kdc[434]: TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) [IPV6 redacted]: ISSUE: authtime 1644340257, etypes {rep=18 tkt=18 ses=18}, mzagrabe@EXAMPLE.COM for postgres/db.example.com@EXAMPLE.COM
    Feb 8 11:36:42 auth krb5kdc[434]: closing down fd 14
    Feb 8 11:42:45 auth krb5kdc[434]: TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) [IPV6 redacted]: ISSUE: authtime 1644340257, etypes {rep=18 tkt=18 ses=18}, mzagrabe@EXAMPLE.COM for postgres/ successful.example.com@EXAMPLE.COM
    Feb 8 11:42:45 auth krb5kdc[434]: closing down fd 14

    Does anyone have any ideas of where to look to start to debug this issue?



    After thinking a bit more on the problem. I believe the issue is that I
    have different kvno (key version numbers).

    For now (before anyone spends any time on my initial request for help) I
    would say that I need to do more research and confirm that my process is working as expected.

    One request, where is the documentation that describes how/where the kvno
    is used between the kdc and principals?

    Thanks again!

    -m

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dameon Wagner@21:1/5 to All on Tue Feb 8 23:01:21 2022
    On Tue, Feb 08 2022 at 12:28:21 -0600, Matt Zagrabelny via Kerberos scribbled
    in "Re: unexpected failure for GSS Pg server":
    On Tue, Feb 8, 2022 at 11:54 AM Matt Zagrabelny <mzagrabe@d.umn.edu> wrote:

    Greetings,

    I'm experiencing a failure between a GSS enabled Postgresql server and my CLI client.

    To my knowledge nothing has changed on the system to create this
    failure. I did modify some puppet configs, but according to the
    puppet log output (and stat'ing
    /etc/postgresql-common/krb5.keytab) file contents did not change.

    Here are the errors on on the Pg side:

    2022-02-08 11:29:59.304 CST [19401] [unknown]@[unknown] FATAL:
    unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0 2022-02-08 11:29:59.315 CST [19402] mzagrabe@somedb FATAL: accepting GSS security context failed
    2022-02-08 11:29:59.315 CST [19402] mzagrabe@somedb DETAIL: Unspecified GSS failure. Minor code may provide more information: Request ticket server postgres/db.example.com@EXAMPLE.COM kvno 3 not found in keytab; keytab is likely out of date

    NOTE: I still have some Pg servers where my GSS authentication is
    not failing. On those systems, Pg still logs the first line above: "unsupported frontend protocol 1234.5680: server supports 2.0 to
    3.0". I only included it above for full disclosure.

    Here is the error on the client side:

    GSSAPI continuation error: Unspecified GSS failure. Minor code may
    provide more information: Key version is not available

    Here is what klist has to say...

    Valid starting Expires Service principal
    02/08/2022 11:10:57 02/08/2022 21:10:57 krbtgt/EXAMPLE.COM@EXAMPLE.COM
    renew until 02/09/2022 11:10:46
    02/08/2022 11:11:05 02/08/2022 21:10:57 postgres/ db.example.com@EXAMPLE.COM
    renew until 02/09/2022 11:10:46

    Looking at the krb5kdc logs, I see no differences between successful GSS
    Pg auths and the failure mentioned above:
    Feb 8 11:36:42 auth krb5kdc[434]: TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) [IPV6 redacted]: ISSUE: authtime 1644340257, etypes {rep=18 tkt=18 ses=18}, mzagrabe@EXAMPLE.COM for postgres/db.example.com@EXAMPLE.COM
    Feb 8 11:36:42 auth krb5kdc[434]: closing down fd 14
    Feb 8 11:42:45 auth krb5kdc[434]: TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) [IPV6 redacted]: ISSUE: authtime 1644340257, etypes {rep=18 tkt=18 ses=18}, mzagrabe@EXAMPLE.COM for postgres/ successful.example.com@EXAMPLE.COM
    Feb 8 11:42:45 auth krb5kdc[434]: closing down fd 14

    Does anyone have any ideas of where to look to start to debug this issue?


    After thinking a bit more on the problem. I believe the issue is that I
    have different kvno (key version numbers).

    Based on the server-side logs from your initial post I would
    completely agree with your recent thoughts.

    To confirm the state of things there are two (or three) commands that
    you will need to run, one on a client system, and one on the
    postgresql server (and possibly, just for good measure, a third you
    your master KDC):

    * On a client run `kvno postgres/db.example.com@EXAMPLE.COM`. This
    should return the key version number of the
    "postgres/db.example/com" principal as fetched from the KDC.
    Going by the your quoted logs, I would expect this to be "3", at
    least at the time you captured those logs.
    * On the postgres server run
    `klist -ek /etc/postgresql-common/krb5.keytab`. This should
    provide output that looks something like the following:

    Keytab name: FILE:/etc/postgresql-common/krb5.keytab
    KVNO Principal
    ---- -------------------------------------------------------------
    2 postgres/db.example.com@EXAMPLE.COM (aes256-cts-hmac-sha1-96)
    2 postgres/db.example.com@EXAMPLE.COM (aes128-cts-hmac-sha1-96)
    2 postgres/db.example.com@EXAMPLE.COM (arcfour-hmac)

    which may have a different selection of keytypes for your keytab,
    and possibly a different value in the KVNO column. I would
    guess that at the time you captured the logs, the KVNO listed
    would have been less than 3 (certainly should not be more, unless
    something rather weird has happened).

    * Optionally, you could also use `kadmin` to connect to your KDC,
    and run the subcommand: `get_principal postgres/db.example.com`.
    This will also give you a list of the current kvno (for each
    keytype), which should match the value from the first command you
    ran on the client. More importantly or usefully though, it will
    also list when it was "Last modified", and by whom, which may shed
    light on why it has changed.

    Armed with that information, the most likely solution would be to
    extract a fresh keytab (using either the kadmin "ktadd" subcommand, or
    the handy `k5srvutil` command).

    One caveat however (and something that has caught many sysadmins out
    at one time or another): If you're running a clustered system which is attempting to use the same principal on more than one server, creating
    a fresh keytab will increment the kvno in the KDC again, meaning it
    will be out of date on any other system that may be using the
    principal. In this case you'd have to copy the keytab to those
    affected systems by hand.

    For now (before anyone spends any time on my initial request for
    help) I would say that I need to do more research and confirm that
    my process is working as expected.

    One request, where is the documentation that describes how/where the kvno
    is used between the kdc and principals?

    It may appear a bit old, but the O'Reilly book is still a classic
    resource for becoming familiar with Kerberos and how it functions.

    Cheers.

    Dameon.

    --
    <> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
    Dr. Dameon Wagner, Unix Platform Services
    IT Services, University of Oxford
    <> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matt Zagrabelny@21:1/5 to Dameon Wagner on Wed Feb 16 19:58:27 2022
    Copy: kerberos@mit.edu (kerberos)

    On Tue, Feb 8, 2022 at 5:03 PM Dameon Wagner <dameon.wagner@it.ox.ac.uk>
    wrote:


    Armed with that information, the most likely solution would be to
    extract a fresh keytab (using either the kadmin "ktadd" subcommand, or
    the handy `k5srvutil` command).


    Thanks for the detailed instructions, Dameon!

    Do you know why performing the ktadd increases the kvno? I believe that is
    what tripped me up. I thought I was just "re-exporting" the key from the
    KDC.




    It may appear a bit old, but the O'Reilly book is still a classic
    resource for becoming familiar with Kerberos and how it functions.


    Ha! Yup! That book is in the office and I have been WFH for the last two
    years. :/

    Thanks again for the help!

    -m

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dameon Wagner@21:1/5 to kerberos on Thu Feb 17 18:03:36 2022
    On Wed, Feb 16 2022 at 19:58:27 -0600, Matt Zagrabelny scribbled
    in "Re: unexpected failure for GSS Pg server":
    On Tue, Feb 8, 2022 at 5:03 PM Dameon Wagner <dameon.wagner@it.ox.ac.uk> wrote:


    Armed with that information, the most likely solution would be to
    extract a fresh keytab (using either the kadmin "ktadd" subcommand, or
    the handy `k5srvutil` command).


    Thanks for the detailed instructions, Dameon!

    Do you know why performing the ktadd increases the kvno? I believe that is what tripped me up. I thought I was just "re-exporting" the key from the
    KDC.

    Always happy to help :)

    The default behaviour of "ktadd" has always been to increment the kvno
    of the principal, as it's effectively a password change (though using
    a random key instead).

    If you are in the position of really really needing to export a keytab
    with the existing key(s) in it, you can, on the master KDC host, use
    the `kadmin.local` command. Once there, you can run the "ktadd"
    subcommand as before, but with the additional "-norandkey" option
    (which I believe is only available via `kadmin.local`). The downside
    of this though is that you're then responsible for securely
    transferring the keytab to the remote host that ultimately needs it.


    It may appear a bit old, but the O'Reilly book is still a classic
    resource for becoming familiar with Kerberos and how it functions.


    Ha! Yup! That book is in the office and I have been WFH for the last two years. :/

    I'm in the same boat, so know the feeling well. Thankfully have my
    own copy at arms reach.

    Back in the office soon though (which will be rather strange), with a
    more extensive library available :)

    Cheers.

    Dameon.

    --
    <> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
    Dr. Dameon Wagner, Unix Platform Services
    IT Services, University of Oxford
    <> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)