• "Recovery" in large, embedded datasets

    From Don Y@21:1/5 to All on Fri Aug 7 10:18:29 2020
    [This is probably better in s.e.design but the "RT" nature of
    the question is more significant than the "embedded"]

    Given an application with a large (100M+), embedded dataset
    (mutable and immutable entities), what's your verification
    and recovery strategy against data corruption at (or before)
    run-time?

    E.g., one could make a pass over the entire dataset at IPL
    and verify it's integrity prior to starting the application
    (similar to verifying a ROM checksum at POST). If an
    error is encountered, you could attempt to restore some
    "default" settings (based on where the error is encountered,
    assuming it can be verified as a separate unit from the rest
    of the store). Or, you could <panic> -- not usually an
    acceptable approach in an unattended device!

    [Of course, this "verify-at-IPL" approach neatly avoids considering
    the possibility that the data might subsequently be corrupted
    and only detected as such at load/use time... perhaps a different
    recovery option, then?]

    The advantage of such an approach is that it moves this activity
    out of the timeliness constraints imposed by the application
    (assuming IPL is not directly related to any specific deadlines)

    The fly-in-the-ointment is the cost of recovery -- if done
    DURING run-time as this introduces (variable) delays based
    on the nature of the error and extent of the recovery effort
    (i.e., do you scrub the original store? or, just substitute
    "good" values for use NOW? What if the values have consequences
    for other aspects of the store?). This likely impacts your ability
    to meet the deadlines applicable to that task.

    [Note that this problem persists even if you try to "preverify"
    (and possibly recover) the applicable portions of the store "before
    they are needed" as the task is likely only made ready when
    needed to be!]

    Of course, the easy approach is just to <panic> and throw a
    "Check Engine" indication... <frown>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)