"Recovery" in large, embedded datasets
From
Don Y@21:1/5 to
All on Fri Aug 7 10:18:29 2020
[This is probably better in s.e.design but the "RT" nature of
the question is more significant than the "embedded"]
Given an application with a large (100M+), embedded dataset
(mutable and immutable entities), what's your verification
and recovery strategy against data corruption at (or before)
run-time?
E.g., one could make a pass over the entire dataset at IPL
and verify it's integrity prior to starting the application
(similar to verifying a ROM checksum at POST). If an
error is encountered, you could attempt to restore some
"default" settings (based on where the error is encountered,
assuming it can be verified as a separate unit from the rest
of the store). Or, you could <panic> -- not usually an
acceptable approach in an unattended device!
[Of course, this "verify-at-IPL" approach neatly avoids considering
the possibility that the data might subsequently be corrupted
and only detected as such at load/use time... perhaps a different
recovery option, then?]
The advantage of such an approach is that it moves this activity
out of the timeliness constraints imposed by the application
(assuming IPL is not directly related to any specific deadlines)
The fly-in-the-ointment is the cost of recovery -- if done
DURING run-time as this introduces (variable) delays based
on the nature of the error and extent of the recovery effort
(i.e., do you scrub the original store? or, just substitute
"good" values for use NOW? What if the values have consequences
for other aspects of the store?). This likely impacts your ability
to meet the deadlines applicable to that task.
[Note that this problem persists even if you try to "preverify"
(and possibly recover) the applicable portions of the store "before
they are needed" as the task is likely only made ready when
needed to be!]
Of course, the easy approach is just to <panic> and throw a
"Check Engine" indication... <frown>
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)