• Scheduling algorithms for "foreign" binaries

    From Don Y@21:1/5 to All on Fri Jan 3 16:15:15 2020
    I have an "open" system to which "uncertified" binaries can be added,
    at any time. Once added, they can be invoked at will (even their own!).

    [Core services are implemented at a different level -- different
    resource controls, scheduling, etc. -- to ensure their continuous
    availability and service guarantees. These services tend to be
    either more demanding (e.g., multimedia) or essential (e.g., comms).
    Letting foreign binaries compete with that closed system jeopardizes
    those guarantees and the functionality of the system, as a whole]

    The resources available in the system vary, over time. This is a
    consequence of the dynamic nature of task invocations as well as
    the dynamic nature of hardware availability (i.e., more nodes
    on-line means more potential resources). Because the system is open,
    none of these decisions can be made at design time.

    Resource usage per task (lowercase T) is tracked and constrained.
    So, the system can see who's using what and act on (against!)
    abusers. This can be with or without prejudice (e.g., "Sorry,
    I assume you're only using the resources that you NEED to perform
    your job but we don't have those resources to spare, currently.
    Too bad, so sad, good bye!")

    The real-time scheduler makes scheduling decisions based on deadline data
    (with input from the resource manager... some greedy task may be deemed
    a poor choice to run even though it looks to be the most urgent!)

    With no certifying authority in place, there's nothing to keep a task
    from claiming it has urgent deadlines in an attempt to buck the queue.
    This, of course, can work against it because the scheduler's job isn't
    to ensure the most urgent deadlines are met but, rather, that the
    most "work" gets done (because there is no independent authority
    to indicate that task A's job is more meaningful than task B's!).
    So, the scheduler may decide that the resource (i.e., time) spent
    on that "urgent" task is better spent on helping some other task(s)
    achieve *their* goals. (Too bad, so sad, good bye!)

    Treating the expressed deadline as always "hard" -- after which, there
    is no point in continuing the task's work -- is artificially draconian.
    If a (foreign!) task knows that it faces termination if its deadline
    *happens* to be too soon (for current conditions), then there is
    pressure on it to delay that deadline -- perhaps too long.

    There are occasions when a "softer" deadline is more appropriate and
    could provide opportunity for the workload scheduler to bring more
    resources on-line and rebalance the load.

    This suggests a simple/intuitive approach is to allow these foreign tasks
    to express TWO deadlines:
    - the deadline that they would LIKE to meet
    - the deadline after which their efforts are irrelevant

    Noting, of course, that they might not meet *either* of them and, in
    some cases, may never be granted admission if the workload scheduler
    deems resources to be insufficient.

    Two questions:
    - what situation(s) does this approach fail to address?
    - how could a hostile actor manipulate these parameters to game the system
    (i.e., artificially inflate his significance)?

    There's an AI that advises the workload scheduler so the system can
    eventually adjust its bias towards (or against) the task in question.
    But, the time for the AI to learn that can be long (in user time),
    depending on how often the task is invoked and how often the system
    needs to kill() it.

    Email preferred (for those of you that have access to it)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Don Y on Fri Jan 3 22:02:07 2020
    On 1/3/20 6:15 PM, Don Y wrote:
    I have an "open" system to which "uncertified" binaries can be added,
    at any time.  Once added, they can be invoked at will (even their own!).

    [Core services are implemented at a different level -- different
    resource controls, scheduling, etc. -- to ensure their continuous availability and service guarantees.  These services tend to be
    either more demanding (e.g., multimedia) or essential (e.g., comms).
    Letting foreign binaries compete with that closed system jeopardizes
    those guarantees and the functionality of the system, as a whole]

    The resources available in the system vary, over time.  This is a consequence of the dynamic nature of task invocations as well as
    the dynamic nature of hardware availability (i.e., more nodes
    on-line means more potential resources).  Because the system is open,
    none of these decisions can be made at design time.

    Resource usage per task (lowercase T) is tracked and constrained.
    So, the system can see who's using what and act on (against!)
    abusers.  This can be with or without prejudice (e.g., "Sorry,
    I assume you're only using the resources that you NEED to perform
    your job but we don't have those resources to spare, currently.
    Too bad, so sad, good bye!")

    The real-time scheduler makes scheduling decisions based on deadline data (with input from the resource manager... some greedy task may be deemed
    a poor choice to run even though it looks to be the most urgent!)

    With no certifying authority in place, there's nothing to keep a task
    from claiming it has urgent deadlines in an attempt to buck the queue.
    This, of course, can work against it because the scheduler's job isn't
    to ensure the most urgent deadlines are met but, rather, that the
    most "work" gets done (because there is no independent authority
    to indicate that task A's job is more meaningful than task B's!).
    So, the scheduler may decide that the resource (i.e., time) spent
    on that "urgent" task is better spent on helping some other task(s)
    achieve *their* goals. (Too bad, so sad, good bye!)

    Treating the expressed deadline as always "hard" -- after which, there
    is no point in continuing the task's work -- is artificially draconian.
    If a (foreign!) task knows that it faces termination if its deadline *happens* to be too soon (for current conditions), then there is
    pressure on it to delay that deadline -- perhaps too long.

    There are occasions when a "softer" deadline is more appropriate and
    could provide opportunity for the workload scheduler to bring more
    resources on-line and rebalance the load.

    This suggests a simple/intuitive approach is to allow these foreign tasks
    to express TWO deadlines:
    - the deadline that they would LIKE to meet
    - the deadline after which their efforts are irrelevant

    Noting, of course, that they might not meet *either* of them and, in
    some cases, may never be granted admission if the workload scheduler
    deems resources to be insufficient.

    Two questions:
    - what situation(s) does this approach fail to address?
    - how could a hostile actor manipulate these parameters to game the system
      (i.e., artificially inflate his significance)?

    There's an AI that advises the workload scheduler so the system can eventually adjust its bias towards (or against) the task in question.
    But, the time for the AI to learn that can be long (in user time),
    depending on how often the task is invoked and how often the system
    needs to kill() it.

    Email preferred (for those of you that have access to it)

    (Hard to E-mail to in .invalid address)

    Handling totally unverified programs is of course tough. I presume the
    system sandboxes each application well enough that they can't really do
    any harm besides using up resources, but that does sort of limit what
    they can productively do.

    My general approach to system like this is to schedule based on some performance vs cost metric, with cost be a combination of resources
    needed and urgency, so tasks that want resource NOW, pay a premium for
    them, and if they don't quickly produce some useful work, they quickly
    starve themselves of resource since they can't afford them. You only
    penalize tasks that start with short deadlines, if the task has been
    around for a while, and then the deadline starts to loom, but resources
    haven't been given to it, gets some real priority.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Richard Damon on Sat Jan 4 18:47:21 2020
    Hi Richard,

    On 1/3/2020 8:02 PM, Richard Damon wrote:
    Handling totally unverified programs is of course tough. I presume the system sandboxes each application well enough that they can't really do any harm besides using up resources, but that does sort of limit what they can productively do.

    Yes. The CM system requires each "app" to declare (in fine-grained detail)
    its resource requirements along with the interface(s) and operations that
    it expects to use as well as the interfaces/operations that it will provide. The idea is that the user can SEE the "costs" and potential interactions
    before agreeing to admit the app to their system. (I'll build an AI to help advise the user as he typically won't be able to evaluate all of this
    technical detail).

    Because this information is published, *other* folks can comment on the relative merit of the app's "requirements" vs. functionality offered
    (do you really want a "flashlight app" looking at your address book?)

    The installer sets up the machinery to ensure these resources and interfaces will be made available (when the task/app is invoked). The OS subsequently ensures that nothing that hasn't been explicitly declared and "agreed upon" (contract) is available to the app.

    Additionally, if an app TRIES to do something that it hasn't declared
    to the installer, then it is unceremoniously killed, marked as "hostile",
    and blacklisted from running, thereafter. (there's no conceivable
    explanation for trying to do something that you shouldn't; you're either *malicious* or *buggy*!)

    But, I can't prevent an app from using the resources that it *has* been
    granted on admission -- unless the system's instantaneous needs (e.g.,
    load) indicate a need for constraint.

    The scheduling criteria are the means by which I specify the limits
    on the app's need for "TIMELY work time".

    My general approach to system like this is to schedule based on some performance vs cost metric, with cost be a combination of resources needed and
    urgency, so tasks that want resource NOW, pay a premium for them, and if they don't quickly produce some useful work, they quickly starve themselves of resource since they can't afford them. You only penalize tasks that start with
    short deadlines, if the task has been around for a while, and then the deadline
    starts to loom, but resources haven't been given to it, gets some real priority.

    The problem is that you can't easily deduce "importance" from any
    scheduling metric. And if you let the task define its own importance,
    you rely on the benevolence AND omnipotence of the developer to
    come up with a "correct" assessment of its self-worth; does he have
    adequate understanding of the other tasks in the system to be able to objectively rate THIS task in that continuum of importance?

    [This is why static priority schemes are silly; all they do is provide
    an /ex post factum/ mechanism to tweek the system's performance to compensate for issues that should have been accommodated in the *design*. By extension, quasi-dynamic schemes are equally so. Open the system to foreign binaries
    and you may as well assume everything will "need" to run at MAX_PRIORITY!]

    A task with a short deadline (lumping the "soft" and "hard" together, for
    the moment) and minimal resource requirements may be totally insignificant. E.g., a task that winks an indicator/annunciator (short deadline, minimal resources) can likely be elided without impacting system performance (esp
    if its dismissal is only transient). OTOH, a task with a distant deadline might be VERY important, regardless of the amount of resources used.

    I'm looking for something that is easy for a developer to "grok" (a very deliberate choice of words!) and relatively easy for him to quantify and express -- I don't want him to have to compute some "work metric" for his application in order to determine its viability as a competing app).

    *And*, I want it easy/apparent for the potentially hostile/exploitive
    developer to understand how his "greed" can backfire on his application's performance/utility ("You want all these resources? Well, then I guess
    YOU are the prime candidate to delay or kill-off when the system load increases! <smile>")

    The dual deadline *feels* like it should be easy to understand AND
    express. E.g., you'd *like* that "1_Hz_indicator_wink" to be invoked
    at a very specific time/deadline (perhaps 0.5 seconds?). And, it surely wouldn't make any sense to let it run much later than 0.9999999 seconds;
    it's "worthless" beyond that point! You'd *like* that button press to be
    acted upon within ~200ms -- and definitely wouldn't want the action to
    be more than a second or two delayed (lest the user "forget" that he
    pressed the button). OTOH, if you're tagging "commercials" in a media
    stream, you only need to get it done before the user is likely going to
    want to view that edited stream (hours? days??)!

    Furthermore, these criteria remain valid regardless of *where* the task
    is executing; they don't need to be "adjusted" based on the "priorities"
    of any co-resident tasks if it migrates to a different node! (though its RELATIVE "importance" may change, based on the actors that its competing
    with in its new execution environment). And, the criteria don't change
    when an entirely new set of ADDITIONAL tasks are added to the system!

    I can't think of a situation that the dual deadline criteria doesn't
    address -- even if only suboptimally. Nor can I think of a hack that
    allows it to be exploited /with a likelihood of success/! (you're just
    as likely to end up shooting yourself in the foot if you try to game
    the system)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)