• Gemini Specification (Work in Progress)

    From Jason Evans@21:1/5 to All on Fri Apr 9 10:25:54 2021
    NOTE: This is a work in progress. Until it's finalized, this is NOT the official specification.

    # Abstract

    This document specifies the Gemini protocol for file transfer. It can be thought of as an incremental improvement over Gopher [RFC1436] rather than a stripped down HTTP [RFC7230]. It runs over TCP [STD7] port 1965 with encryption provided by TLS [RFC8446] with a simple request and response transaction.

    # Conventions used in this document

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
    document are to be interpreted as described in [BCP14].

    # Overview

    An overriding goal of Gemini is to provide a simple protocol that is easy to implement (requiring a day or two of effort and only a few hundred lines for
    a server or a client) while still being useful.

    Gemini is served over TCP on port 1965 by default (the first manned Gemini mission, Gemini 3, flew in March 1965), using TLS to provide an encrypted transaction. Servers and clients MUST support TLS 1.2 or higher. The type
    of TLS certificate used (CA-based or self-signed) is specified in a "best practices" document as details are still being discussed. The default
    port of 1965 is an unpriviledged port on most systems, so the use of an administrative account is not required to run the service.

    Addressing in Gemini is based on URIs [STD66], with the following modifications:

    1. the scheme used is "gemini";
    2. the userinfo portion of a URI MUST NOT be used;
    3. a empty path component and a path component of "/" are equivalent and
    servers MUST support both without sending a redirection;
    4. the port defaults to 1965 if not specified;
    5. the use of an IP address in the authority section SHOULD NOT be used.

    While this document just covers the protocol with some mandates for what clients and servers have to do, there are other aspects of Gemini that
    aren't covered here in the specification which fall outside the core
    protocol. Implementors of both clients and servers are RECOMMENDED to
    follow the best practice guide for the Gemini protocol.

    # The use of TLS

    At the time of writing (2021), not all existing TLS libraries support TLS
    1.3, but a majority (all?) do support TLS 1.2, thus TLS 1.2 is the minimum required version. Implementations MUST support TLS SNI (Server Name Indication), and servers MUST use the TLS close_notify implementation to
    close the connection. Clients SHOULD NOT close a connection by default, but MAY in case the content exceeds constraints set by the user.

    ## TLS Server certificates

    Clients can validate TLS connections however they like (including not at
    all) but the strongly RECOMMENDED approach is to implement a lightweight
    "TOFU" certificate-pinning system which treats self-signed certificates as first- class citizens. This greatly reduces TLS overhead on the network
    (only one cert needs to be sent, not a whole chain) and lowers the barrier
    to entry for setting up a Gemini site (no need to pay a CA or setup a Let's Encrypt cron job, just make a cert and go).

    TOFU stands for "Trust On First Use" and is a public-key security model
    similar to that used by OpenSSH. The first time a Gemini client connects to
    a server, it accepts whatever certificate it is presented. That
    certificate's fingerprint and expiry date are saved in a persistent database (like the .known_hosts file for SSH), associated with the server's hostname.
    On all subsequent connections to that hostname, the received certificate's fingerprint is computed and compared to the one in the database. If the certificate is not the one previously received, but the previous
    certificate's expiry date has not passed, the user is shown a warning, analogous to the one web browser users are shown when receiving a
    certificate without a signature chain leading to a trusted CA.

    This model is by no means perfect, but it is better than just accepting self-signed certificates unconditionally.

    ## TLS Client certificates

    Although rarely seen on the web, TLS permits clients to identify themselves
    to servers using certificates, in exactly the same way that servers traditionally identify themselves to the client. Gemini includes the
    ability for servers to request in-band that a client repeats a request with
    a client certificate. This is a very flexible, highly secure and simple
    notion of client identity with several applications:

    * Short-lived client certificates which are generated on demand and deleted
    immediately after use can be used as "session identifiers" to maintain
    server-side state for applications. In this role, client certificates act
    as a substitute for HTTP cookies, but unlike cookies they are generated
    voluntarily by the client, and once the client deletes a certificate and
    its matching key, the server cannot possibly "resurrect" the same value
    later (unlike so-called "super cookies".

    * Long-lived client certificates can reliably identify a user to a
    multi-user application without the need for passwords which may be
    brute-forced. Even a stolen database table mapping certificate hashes to
    user identities is not a security risk, as rainbow tables for certificates
    are not feasible.

    * Self-hosted, single-user applications can be easily and reliably secured
    in a manner familiar from OpenSSH: the user generates a self-signed
    certificate and adds its hash to a server-side list of permitted
    certificates, analogous to the .authorized_keys file for SSH).

    Gemini requests will typically be made without a client certificate. If a requested resource requires a client certificate and one is not included in
    a request, the server can respond with a status code of 60, 61 or 62 (see section "Client certificates"). A client certificate which is generated or loaded in response to such a status code has its scope bound to the same hostname as the request URL and to all paths below the path of the request
    URL path. E.g. if a request for gemini://example.com/foo returns status 60 and the user chooses to generate a new client certificate in response to
    this, that same certificate should be used for subsequent requests to gemini://example.com/foo, gemini://example.com/foo/bar/, gemini://example.com/foo/bar/baz, etc., until such time as the user decides
    to delete the certificate or to temporarily deactivate it. Interactive
    clients for human users SHOULD make such actions easy and to generally give users full control over the use of client certificates.

    ## TLS Issues

    Both clients and servers SHOULD handle the case when the TLS close_notify mechanism is not used (such as a low level socket error that closes the
    socket without properly terminating the TLS connection). A client SHOULD notify the user of such a case; the server MAY log such a case.

    Implementators should be aware that TLS 1.2 will send the server name and
    the client certificate (if used) in the clear as part of the encryption negotiation phase of the protocol. A client MAY warn a user if a TLS 1.2 connection is established, and SHOULD warn the user when a client certifiate will be transmitted via TLS 1.2.

    # Requests

    The client connects to the server and sends a request which consists of an absolute URI followed by a CR (character 13) and LF (character 10). The augmented BNF [STD68] for this is:

    request = absolute-URI CRLF

    ; absolute-URI from [STD66]
    ; CRLF from [STD68]

    When making a request, the URI MUST NOT exceed 1024 bytes, and a server MUST reject requests where the URI exceeds this limit. A server MUST reject a request with a userinfo portion. Clients MUST NOT send a fragment as part
    of the request, and a server MUST reject such requests as well. If a client
    is making a request with an empty path, the client SHOULD add a trailing '/'
    to the request, but a server MUST be able to deal with an empty path.

    # Replies

    Upon a request, the server will send back a status and in the case of a successful request, the content requested by the client. The status
    consists of a two digit response code, possibly some additional information (which depends upon the response being sent) followed by a CR and LF. The augmented BNF:

    reply = input / success / redirect / tempfail / permfail / auth

    input = '1' DIGIT SP prompt CRLF
    success = '2' DIGIT SP mimetype CRLF body
    redirect = '3' DIGIT SP URI-reference CRLF
    ; NOTE: [STD66] allows "" as a valid
    ; URI-reference. This is not intended to
    ; be valid for cases of redirection.
    tempfail = '4' DIGIT [SP errormsg] CRLF
    permfail = '5' DIGIT [SP errormsg] CRLF
    auth = '6' DIGIT [SP errormsg] CRLF

    prompt = 1*(SP / VCHAR)
    mimetype = type '/' subtype *(';' parameter)
    errormsg = 1*(SP / VCHAR)
    body = *OCTET

    VCHAR =/ UTF8-2v / UTF-3 / UTF8-4
    UTF8-2v = %xC2 %xA0-BF UTF8-tail ; no C1 control set
    / %xC3-DF UTF8-tail

    ; URI-reference from [STD66]
    ;
    ; type from [RFC2045]
    ; subtype from [RFC2045]
    ; parameter from [RFC2045]
    ;
    ; CRLF from [STD68]
    ; DIGIT from [STD68]
    ; SP from [STD68]
    ; VCHAR from [STD68]
    ; OCTET from [STD68]
    ; WSP from [STD68]
    ;
    ; UTF8-3 from [STD63]
    ; UTF8-4 from [STD63]
    ; UTF8-tail from [STD63]

    The VCHAR rule from [STD68] is extended to include the non-control
    codepoints from Unicode (and encoded as UTF-8 [STD63]). The body type is unspecified here, as the contents depend upon the MIME type of the content being served. Upon sending the complete response (which may include
    content), the server closes the connection and MUST use the TLS close_notify mechanism to inform the client that no more data will be sent.

    The status values range from 10 to 69 inclusive, although not all values
    are currently defined. They are grouped such that a client MAY use the
    initial digit to handle the response, but the additional digit is there to further clarify the status, and it is RECOMMENDED that clients use the addtional digit when deciding what to do. Servers MUST NOT send status
    codes that are not defined.

    # Status codes

    There are six groups of status codes:

    10-19 Input expected
    20-29 Success
    30-39 Redirection
    40-49 Temporary failure
    50-59 Permanent failure
    60-69 Client certifiates

    A client MUST reject any status code less than '10' and greater than '69'
    and warn the user of such. A client SHOULD deal with undefined status codes between '10' and '69' per the default action of the initial digit. So a
    status of '14' should be acted upon as if the client received a '10'; a
    status of '22' should be acted upon as if the client received a '20'.

    ## Input expected

    The server is expecting user input from the client. The additional
    information sent after the status code is the text that a client MUST use to prompt the user for the information, and that information is sent back to
    the same URI as the query portion. Spaces MUST be encoded as '%20'. There
    are currently two status codes defined under this category.

    input = '1' DIGIT SP prompt CRLF
    prompt = 1*(SP / VCHAR)

    If a client receives a 1x response to a URI that already contains a query string, the client MUST replace the query string with the user input. For example, if the given URI results in a 10 response:

    gemini://example.net/search?hello

    The client will send as a request:

    gemini://example.net/search?the%20user%20input

    ### Status 10

    The basic input status code. A client MUST prompt a user for input, it
    should be URI-encoded per [STD66] and sent as a query to the same URI that generated this response.

    ### Status 11---sensitive input

    As per status code 10, but for use with sensitive input such as passwords. Clients should present the prompt as per status code 10, but the user's
    input should not be echoed to the screen to prevent it being read by
    "shoulder surfers".

    ## Success

    The request was handled and the server has content to send to the client.
    The additional information is the MIME type of the content, specified per [RFC2045]. Client MUST deal with MIME parameters that are not understood by simply ignoring them.

    Response bodies are just raw content, text or binary, like with gopher [RFC1436]. There is no support for compression, chunking or any other kind
    of content or transfer encoding. The server closes the connection after the final byte, there is no "end of response" signal.

    Internet media types are registered with a canonical form. Content
    transferred via Gemini MUST be represented in the appropriate canonical form prior to its transmission except for "text" types, as defined in the next paragraph.

    When in canonical form, media subtypes of the "text" type use CRLF as the
    text line break. Gemini relaxes this requirement and allows the transport
    of text media with plain LF alone (but NOT a plain CR alone) representing a line break when it is done consistently for an entire response body. Gemini clients MUST accept CRLF and bare LF as being representative of a line break
    in text media received via Gemini.

    Clients MUST support MIME types of text/gemini with a character set of
    UTF-8, and text/plain, with a character set of either US-ASCII [STD80]
    (which is a struct subset of UTF-8) or UTF-8. A client MAY support
    text/plain with other character sets. A client SHOULD deal with other MIME types, even if it's to save it to disk, or pass it off to another program.

    The specification for text/gemini is given in the text/gemini specification.

    The only defined status under this group is 20.

    success = '2' DIGIT SP mimetype CRLF body
    mimetype = type '/' subtype *(';' parameter)
    body = *OCTET

    ### Status 20

    The server has successfully parsed and understood the request, and will
    serve up content of the given MIME type.

    ## Redirection

    The server is sending the client a new location where the content is
    located. The additional information is an abolute or relative URI. If a server sends a redirection in response to a request with a query string, the client MUST NOT apply the query string to the new location; if the query
    string is imporant to the new location, the server MAY include the query as part of the redirection. A server SHOULD NOT include fragments in redirections, but if one is given, and a client already has a fragment it
    could apply (from the original URI), it is up to the client which fragment
    to apply. Client MUST limit the number of redirections they follow to 5 redirections. There are two defined status code in this category.

    redirect = '3' DIGIT SP URI-reference CRLF
    ; NOTE: RFC-3987/3987 allow "" as a valid
    ; URI-reference. This is not intended to
    ; be valid for cases of redirection.

    ### Status 30---temporary redirection

    The basic redirection code. The redirection is temporary and the client
    should continue to request the content with the original URI.

    ### Status 31---permanent redirection

    The location of the content has moved permanently to a new location, and clients SHOULD use the new location to retrieve the given content from then
    on.

    ## Temporary failure

    The request has failed. There is no response body. The nature of the
    failure is temporary, i.e. an identical request MAY succeed in the future.
    The optional message MAY provide additional information on the failure and
    if given, a client SHOULD display it to the user. There are five status
    codes under this category.

    tempfail = '4' DIGIT [SP errormsg] CRLF
    errormsg = 1*(SP / VCHAR)

    ### Status 40

    An unspecified condition exists on the server that is preventing the content from being served, but a client can try again to obtain the content.

    ### Status 41---server unavailable

    The server is unavailable due to overload or maintenance. (cf HTTP 503)

    ### Status 42---CGI error

    A CGI process, or similar system for generating dynamic content, died unexpectedly or timed out.

    ### Status 43---proxy error

    A proxy request failed because the server was unable to successfully
    complete a transaction with the remote host. (cf HTTP 502, 504)

    ### Status 44---slow down

    The server is requesting the client to slow down requests, and SHOULD use an exponential back off, where subsequent delays between requests are doubled until this status no no longer returned.

    ## Permanent failure

    The request has failed. There is no response body. The nature of the
    failure is permanent, and futher requests of the content will return the
    same status and a client SHOULD NOT make the same request. The optional message MAY provide additional information on the failure and if given, a clieht SHOULD display it to the user. There are five status codes under
    this category.

    permfail = '5' DIGIT [SP errormsg] CRLF
    errormsg = 1*(SP / VCHAR)

    ### Status 50

    This is the general permanent failure code.

    ### Status 51---not found

    The requested resource could not be found (you can't find things at Area 51) and no further information is available. It may exist in the future, it may not. Who knows?

    ### Status 52---gone

    The resource requested is no longer available and will not be available
    again. Search engines and similar tools should remove this resource from
    their indices. Content aggregators should stop requesting the resource and convey to their human users that the subscribed resource is gone. (cf HTTP 410)

    ### Status 53---proxy request refused

    The request was for a resource at a domain not served by the server and the server does not accept proxy requests.

    ### Status 59---bad request

    The server was unable to parse the client's request, presumably due to a malformed request, or the request violated the contraints listed in the
    Request section.

    ## Client certificates

    The requested resource requires a client certificate to access. If the
    request was made without a certificate, it should be repeated with one. If
    the request was made with a certificate, the server did not accept it and
    the request should be repeated with a different certificate. The additional information may contain more details about why the certificate was required,
    or rejected; servers SHOULD include such information, and clients SHOULD display it to the user. There are three status codes defined for this category.

    auth = '6' DIGIT [SP errormsg] CRLF
    errormsg = 1*(SP / VCHAR)

    ### Status 60

    The content requires a client certificate. The client MUST provide a certificate for the content. The certificate is limited to the host and
    path, and a server MAY require a different certificate for a different path
    on the same host. A server SHOULD allow the same certificate to be used for any content along the given path. Examples:

    gemini://example.com/private/ -- requires certificate A
    gemini://example.com/private/r1 -- requires certificate A
    gemini://example.com/private/r2/r3 -- requires certificate A
    gemini://example.com/other/ -- requires certificate B
    gemini://example.com/other/r1 -- requires certificate B
    gemini://example.com/other/r2/r3 -- requires certificate B
    gemini://example.com/random -- no certificate required

    ### Status 61---certificate not authorized

    The supplied client certificate is not authorised for accessing the
    particular requested resource. The problem is not with the certificate
    itself, which may be authorised for other resources.

    ### Status 62---certificate not valid

    The supplied client certificate was not accepted because it is not valid.
    This indicates a problem with the certificate in and of itself, with no consideration of the particular requested resource. The most likely cause
    is that the certificate's validity start date is in the future or its expiry date has passed, but this code may also indicate an invalid signature, or a violation of a X509 standard requirements.

    # Examples of Gemini requests

    The examples below have two parties, the Server and Client. Actions of each are in square brackets '[]', literal text in quotes (but the quotes are NOT included in the input) with a few terminals like 'CRLF' indicating the characters code 13 and 10, 'mimetype' representing a MIME type per
    [RFC2045], and 'content...' meaning the content requested.

    This examle is a server requiring user input, which the client gathers, then resubmits the request with the user input:

    Client: [opens connection]
    Client: "gemini://example.net/search" CRLF
    Server: "10 Please input a search term" CRLF
    Server: [closes connection]
    Client: [prompts user, gets input]
    Client: [opens connection]
    Client: "gemini://example.net/search?gemini%20search%20engines" CRLF
    Server: "20 " mimetype CRLF content...
    Server: [closes connection]

    The client is requesting some content, which in this example, is an image
    file:

    Client: [opens connection]
    Client: "gemini://example.net/image.jpg" CRLF
    Server: "20 image/jpeg" CRLF <binary data of JPEG image>
    Server: [closes connection]

    For this example the server is redirecting the client to the new location of
    a resource:

    Client: [opens connection]
    Client: "gemini://example.net/current" CRLF
    Server: "30 /new" CRLF
    Server: [closes connection]
    Client: [opens connection]
    Client: "gemini://example.net/new" CRLF
    Server: "20 " mimetype CRLF content...
    Server: [closes connection]

    Here we have a server requesting a client certificate, and the client
    providing one on the subsequent request:

    Client: [opens connection, no client certificate sent]
    Client: "gemini://example.net/application/" CRLF
    Server: "60 Certificate required to maintain server-side state" CRLF
    Server: [closes connection]
    Client: [does application specific actions to get certificate]
    Client: [opens connection, client certificate sent]
    Client: "gemini://example.net/application/" CRLF
    Server: "20 " mimetype CRLF content...
    Server: [closes connection]

    In this example, the server is sending a temporary failure with additional
    text describing the error:

    Client: [opens connection]
    Client: "gemini://example.net/data" CRLF
    Server: "41 Undergoing maintanence at this time" CRLF
    Server: [closes connection]

    And the final example, a permanent failure without any further explanation:

    Client: [opens connection]
    Client: "gemini://example.net/data" CRLF
    Server: "50" CRLF
    Server: [closes connection]

    # Normative References

    [BCP14] Key words for use in RFCs to Indicate Requirement Levels
    [RFC2045] Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
    [RFC3987] Internationalized Resource Identifiers (IRIs)
    [STD63] UTF-8, a transformation format of ISO 10646
    [STD66] Uniform Resource Identifier (URI): Generic Syntax
    [STD68] Augmented BNF for Syntax Specifications: ABNF
    [STD80] ASCII format for network interchange

    # Informative References

    [RFC1436] The Internet Gopher Protocol
    [RFC5246] The Transport Layer Security (TLS) Protocol Version 1.2
    [RFC7230] Hypertext Transfer Protocol
    [RFC8446] The Transport Layer Security (TLS) Protocol Version 1.3
    [STD7] Transmission Control Protocol

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)