Forum: >>> Magnum BBS <<<

SRFI-167 rework dubbed okdb (formely okvslite)

From Amirouche Boubekki@21:1/5 to All on Sat Apr 10 04:49:42 2021

I discovered several infelicities in okvs SRFI-167 (and SRFI-168 nstore).

I set myself the task to rework it. Here are the problems with SRFI-167:

- Too much optional arguments;
- No clear definition for a transaction;
- Too broad in scope (try to accommodate both FoundationDB and the various embedded OKVS);

Also, there is no way to tell how big a subspace is, hence implement query optimizations. That was due to the fact that both FoundationDB, and embedded OKVS do not offer that feature, which was acknowledge by WiredTiger maintainer that it was a design
mistakes. I guess all OKVS were inspired by BerkeleyDB and its successor WiredTiger that do not provide that feature, but that is essential to build a database system that wants to be performant.

To tackle those problems I to rework SRFI-167, as follow:

- Drop all existing optional arguments, downstream will need to deal with the problem of making it possible to swap backend;
- Define transaction guarantees, so far it is the strongest scheme aka. serializable;
- Focus on the embedded OKVS interface with a cursor to navigate the key space;

Since there is no library that can implement the new interface, I took the liberty to add the optional ability to apply compression or encryption on the database blocks, that may be useful to save some space, and also help with privacy in some
configurations.

I am happy with the current state of the specification. It is missing hooks to implement triggers (before commit hook), and notifications (after commit hook), I may add them later depending on how I make progress with higher-level libraries that are now
called extensions. Thread safety at the object level is not specified yet, even if concurrent transaction behavior is.

Here is the pre-pre-SRFI:

## Abstract

General purpose backend storage datastructure for building in-memory
or on-disk databases that can handle concurrency thanks to ACID
transactions.

## Rationale

`okdb` can be the primitive datastructure for building many
datastructures. Low level extensions include counter, bag, set,
mapping-multi, binary object store. Higher level extensions include [Entity-Attribute-Value](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model)
possibly supported by datalog, Generic Tuple Store (nstore) inspired
from [Resource Description Framework](https://en.wikipedia.org/wiki/Resource_Description_Framework)
that can trivially match the query capabilities of [SPARQL](https://www.w3.org/TR/rdf-sparql-query/), nstore can
painlessly implement [RDF-star](https://w3c.github.io/rdf-star/cg-spec/2021-02-18.html), or
even the Versioned Generic Tuple Store (vnstore). Also, it is possible
to implement a property graph database, ranked set, leaderboard,
priority queue. It is possible to implement efficiently geometric
queries such as xz-ordered curves.

`okdb` is useful in the context of on-disk persistence. `okdb` is also
useful in a context such as client-server applications, where the
client need to cache heterogeneous data. It may be used in the
browser, or in microservice configuration as a shared in-memory
datastructure.

There is several existing databases that expose an interface similar
to `okdb`, and even more that use an ordered key-value store
(okvs) as their backing storage.

While `okdb` interface is lower-level than the mainstream SQL, it is
arguably more useful because the implementation specifics stems from
well-known datastructure part of every software engineering
curriculum, namely binary trees, also because it allows to implement
SQL, last but not least it reflects the current practice that builds (distributed) databases systems based on a similar interface.

`okdb` extends, and departs from the common okvs interface
inherited from BerkeleyDB, to ease the implementation thanks to
bounded keys and values, while making the implementation of efficient extensions also easier thanks to the ability to estimate the count of
keys, and the size of key-value pairs, in a given range.

This SRFI should offer grounds for apprentices to learn about data
storage. It will also offer a better story (the best?) for data
durability in Scheme implementations.

## Reference

### `(make-okdb filepath [block-read block-write]) string? procedure? procedure? → okdb?`

Rationale: In SRFI-167, `make-okvs` could take various options. The
interface was difficult, and did not work well. Instead, of trying to
define a couple of options, a left aside others, it is up to the
implementer and possibly eventually to the user to deal with options
in an appropriate way. One way, for the implementer to enable options
is to create a super procedure that a) returns multiple values,
including the constructor, b) rely on generic methods, or something
else.

Return a handle on the database. `FILEPATH` may be a string describing
the on-disk file or directory where the database will be saved. In
case `okdb` work only from memory, it should be ignored.

`BLOCK-READ` and `BLOCK-WRITE` if provided will be used to consume or
produce bytes that will read or written to disk, possibly using
cryptography or compression on input bytes. Both `BLOCK-READ` and
`BLOCK-WRITE` will take a generator and accumulator as argument. In
case the implementation works fully from memory, `BLOCK-READ` and
`BLOCK-WRITE` should be ignored.

### `(okdb? obj) * → boolean?`

Returns `#t` if `OBJ` is an `<okdb>` instance. Otherwise, returns
`#f`.

### `(okdb-close! db) okdb?`

Close `DB`. All transactions that completed successfully should be
available the next time the database is open with `make-okdb` except
in the case of fully in-memory database.

### `(okdb-transaction? obj) * → boolean?`

Returns `#t` if `OBJ` is an `<okdb-transaction>` instance. Otherwise,
returns `#f`.

### `(okdb-cursor? obj) * → boolean?`

Returns `#t` if `OBJ` is an `<okdb-cursor>` instance. Otherwise,
returns `#f`.

### `(okdb-handle? obj) * → boolean?`

Returns `#t` if `OBJ` is satisfy either `okdb?`, `okdb-transaction?`,
or `okdb-cursor?`. Otherwise, returns `#f`.

### `(okdb-key-max-size handle)`

Rationale: Most Ordered Key-Value Store do not specify the maximum
size of key, making both the implementation and its use erratic. The
same maximum size might not work in all situations, hence it might be
subject to customization. The important is to guarantee some
predicatable performance when keys follow that constraint. It also
makes the implementation of `okdb` easier, among other thing because
the implementation does not need to have to handle large binaries.
That is not a negligeable constraint for the user as keys max size are
not necessarly predicatable, but in any case should be small since in
most implementations those are kept in memory.

Return the maximum size of a key of the database associated with
`HANDLE`.

### `(okdb-value-max-size handle)`

Rationale: Same as the above: it is easier to implement. For the user perspective, it is much easier to handle the situation of large values
since they can be split without loosing features.

Return the maximum size of a value of the database associated with
`HANDLE`.

### `(okdb-conflict? obj)`

Returns `#t` if `OBJ` is a conflict error. Otherwise returns
`#f`. Such object may be raised by `okdb-in-transaction`.

### `(okdb-in-transaction okdb proc [failure [success]]) okdb? procedure? procedure? procedure? → (values (every? *))`

Rationales:

- `okdb-in-transaction` does not include a retry logic when
`okdb-conflict?` is raised because retrying might require to wait
which depends on the implementation but also and more importantly on
user code. The user is in the best position to know how, and when to
retry the transaction. The last resort strategy is not even to retry
the transaction immediatly, but to put the operation in queue possibly persisted in the database, and force the serialization through a
single thread. In any case, retry should be explicit in user code.

- Serializable scheme trades guarantees regarding the consistency of
the data, and hence ease of development because the state of the
database is determinist versus performance. The prescription of
serializable transactions is a strong requirement, that was thusfar
almost completly left aside in the industry in favor of snapshot
isolation. The philosophy here is: *make it work, then make it
fast*. It is not possible to build reliable systems upon claims that
are weak, or false, in the general case.

- Nested transactions were ruled out because it is still not clear
whether they put a strain on the implementation that does not yield
much help in user code. Nested transactions are similar to
savepoints or autonomous transactions.

`okvs-in-transaction` describes the extent of the atomic property, the
A in [ACID](https://en.wikipedia.org/wiki/ACID), of changes against
the underlying database. A transaction will apply all database
operations in `PROC` or none: all or nothing. When
`okdb-in-transaction` returns successfully, the changes will be
visible for future transactions, and implement durability, D in ACID,
and when the database implements on-disk storage, across restarts. In
case of error, changes will not be visible to other transactions in
all cases. Regarding isolation, and consistency, respectively the I
and C in ACID,
[serializable](https://en.wikipedia.org/wiki/Serializability)
transactions is prescribed: the concurrent execution of
`(okvs-in-transaction okdb proc ...)` should render the database as if
the concurrent transactions were executed serially ie. without
overlapping time, in some order, possibly rejecting some of them with
an error that satisfy `okdb-conflict?`. In particular, it is stronger
than snapshot isolation.

Begin a transaction against the database, and execute `PROC`. `PROC`
is called with first and only argument an object that satisfy `okdb-transaction?`. In case of error, rollback the transaction and
execute `FAILURE` with the error object as argument. The default value
of `FAILURE` re-raise the error with `raise`. Otherwise, executes
`SUCCESS` with the returned values of `PROC`. The default value of
`SUCCESS` is the procedure `values`.

`okdb` does not support nested transactions.

TODO: what about hooks

In case `okvs-in-transactions` raise an error that satisfy
`okdb-conflict?`, the user may re-run the same transaction taking care
that `PROC` is idempotent.

### `(okdb-call-with-cursor handle proc) okdb-handle? procedure? → (values (every? *))`

Open a cursor against `HANDLE` and call `PROC` with it. When `PROC`
returns, the cursor is closed.

If `HANDLE` satisfy `okdb?`, `okdb-call-with-cursor` must start a
transaction.

If `HANDLE` satisfy `okdb-cursor?`, `okdb-call-with-cursor` must pass
a cursor positioned at the same position as `HANDLE` and the same keys snapshot.

The produced cursor is not positioned, except when `HANDLE` satisfy `okdb-cursor?`, it is an error to call `okdb-cursor-next?`, `okdb-cursor-previous?`, `okdb-cursor-key`, or
`okdb-cursor-value`. When `HANDLE` does not satisfy `okdb-cursor?`,
the user should call `okdb-cursor-seek?` immediatly after `okdb-call-with-cursor`.

The cursor that is created will see a snapshot of keys of the database
inside the transaction taken when `okdb-call-with-cursor` is
called. During the extent of `PROC`, `okdb-set!` and `okdb-remove!`
will not change the position of the cursor, and the cursor will see
removed keys and not see added keys. Keys which value was changed
during cursor navigation, that exist when `okdb-call-with-cursor` is
called, can be seen. That is, the cursor is stable.

### `(okdb-estimate-key-count handle [key other [offset [limit]]]) handle? bytevector? bytevector? integer? integer? → integer?`

Rationale: It is helpful to know how big is a range to be able to tell
which index to use as seed. Imagine a query against two attributes,
each attribute with their own index and no compound index: being able
to tell which subspace contains less keys, can speed up significantly
query time.

Return an estimate count of keys between `KEY` and `OTHER`. If `KEY`
and `OTHER` are omitted return the approximate count of keys in the
whole database.

If `OFFSET` is provided, `okdb-estimate-key-count` will skip the first
`OFFSET` keys from the count.

If `LIMIT` is provided, `okdb-estimate-key-count` will consider `LIMIT`
keys from the count.

### `(okdb-estimate-bytes-count handle [key [other [offset [limit]]]]) okdb-handle? bytevector? bytevector? integer? integer? → integer?`

Rationale: That is useful in cases where the size of transaction is
limited.

Return the estimated size in bytes of key-value pairs in the subspace
described by `KEY` and `OTHER`. If `OTHER` is omitted, return the
approximate size of the key-value pair associated with
`KEY`. Otherwise, return the estimated size of the whole database
associated with `HANDLE`.

If `OFFSET` is provided, `okdb-estimate-bytes-count` will skip the
first `OFFSET` keys from the count.

If `LIMIT` is provided, `okdb-estimate-bytes-count` will consider `LIMIT`
keys from the count.

### `(okdb-set! handle key value) okdb-handle? bytevector? bytevector?`

Associate to the bytevector `KEY`, with the bytevector `VALUE`. If
`HANDLE` satisfy `OKDB?` start a transaction to execute the operation.

If `HANDLE` satisfy `okdb-cursor?`, `okdb-set!` does not change the
position of `HANDLE`.

### `(okdb-remove! handle key) okdb-handle? bytevector?`

Removes the bytevector `KEY`, and its associated value. If `HANDLE`
satisfy `OKDB?` start a transaction to execute the operation.

If `HANDLE` satisfy `okdb-cursor?`, `okdb-set!` does not change the
position of `HANDLE`.

### `(okdb-cursor-seek cursor strategy key) okdb-cursor? symbol? bytevector? → symbol?`

Position the `CURSOR` using `STRATEGY`. `STRATEGY` can be one of the
following symbol:

- `less-than-or-equal`
- `equal`
- `greater-than-or-equal`

The strategy `less-than-or-equal` will first seek for the biggest key
that is less than `KEY`, if there is one, it returns the symbol `less`. Otherwise, if there is a key that is equal to `KEY` it will return the
symbol `equal`. If there is no valid position for the given `KEY`, it
fallbacks to the symbol `not-found`.

The strategy `equal` will seek a key that is equal to `KEY`. If there
is one it will return the symbol `equal`. Otherwise, it returns the
symbol `not-found`.

The strategy `greater-than-equal` will first seek the smallest key that
is greater than `KEY`, if there is one, it returns the symbol
`greater`. Otherwise, if there is a key that is equal to `KEY` it
will return the symbol `equal`. If there is no valid position for the
given `KEY`, it fallbacks the symbol `not-found`.

### `(okdb-cursor-next? cursor) okdb-cursor? → boolean?`

Move the `CURSOR` to the next key if any. Return `#t` if there is such
a key. Otherwise returns `#f`. `#f` means the cursor reached the end
of the key space.

### `(okdb-cursor-previous? cursor) okdb-cursor? → boolean?`

Move the `CURSOR` to the previous key if any. Return `#t` if there is
such a key. Otherwise returns `#f`. `#f` means the cursor reached the
begining of the key space.

### `(okdb-cursor-key cursor)` okdb-cursor? → bytevector?`

Return the key bytevector where `CURSOR` is positioned. It is an error
to call `okdb-cursor-key`, when `CURSOR` reached the begining or end
of the key space or when `CURSOR` is not positioned.

### `(okdb-cursor-value cursor)` okdb-cursor? → bytevector?`

Return the value bytevector where `CURSOR` is positioned. It is an
error to call `okdb-cursor-key`, when `CURSOR` reached the begining or
end of the key space or when `CURSOR` is not positioned.

### `(okdb-query handle key [other [offset [limit]]]) handle? bytevector? bytevector? integer? integer? → (either? bytevector? procedure?)`

`OKDB-QUERY` will query the associated database. If only `KEY` is
provided it will return the associated value bytevector or `#f`. If
`OTHER` is provided there is two cases:

- `KEY < OTHER` then `okdb-query` returns a generator with all the
key-value pairs present in the database between `KEY` and `OTHER`
excluded ie. without the key-value pair associated with `OTHER` if
any.

- `OTHER < KEY` then `okdb-query` returns the equivalent of reversing
the generator returned by `(okdb-query handle KEY OTHER)`.

If `OFFSET` is provided the generator will skip as much key-value
pairs from the start of the generator.

If `LIMIT` is provided the generator will generate at most `LIMIT`
key-value pairs.

### `(okdb-bytevector-next-prefix bytevector) bytevector? → bytevector?`

Return the bytevector that follows `BYTEVECTOR` according to
lexicographic order that is not prefix of `BYTEVECTOR` such as the
following code iterates over all keys that have `key` as prefix:

```scheme
(okdb-query handle key (okdb-bytevector-next-prefix key))
```

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Briels
  Tue Apr 23 20:54:03 2024
  from Uk via SSH
- Cronus
  Tue Apr 23 19:46:51 2024
  from Provo, Ut via SSH
- Keyop
  Tue Apr 23 19:40:37 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Tue Apr 23 10:35:46 2024
  from Huddersfield, West Yorkshire via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	10:52:06
Calls:	6,645
Calls today:	5
Files:	12,190
Messages:	5,326,610

SRFI-167 rework dubbed okdb (formely okvslite)

Who's Online

Recent Visitors

System Info