• Get real, user and system time

    From Cecil Westerhof@21:1/5 to All on Sat Jul 18 16:26:35 2020
    In Linux we have the time command to get the real, user and system
    time that a command takes.
    I know how to get the real time a method takes, but is there a way to
    get the user and system time a method takes also?

    --
    Cecil Westerhof
    Senior Software Engineer
    LinkedIn: http://www.linkedin.com/in/cecilwesterhof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Cecil Westerhof on Sat Jul 18 20:59:31 2020
    On Sat, 18 Jul 2020 16:26:35 +0200, Cecil Westerhof wrote:

    In Linux we have the time command to get the real, user and system time
    that a command takes.
    I know how to get the real time a method takes, but is there a way to
    get the user and system time a method takes also?

    The only way I can think of, offhand would be the Stopwatch class, which appeared here some time ago. I won't include it as the (reasonably well- commented) copy I have is 118 lines. It has start(), stop() methods, so start/stop should be called to time a process and toString() to display
    the elapsed time in seconds. Its resolution is to the nearest millisecond.

    I've not included it because my copy is 118 lines, tolerably commented.
    I can e-mail you a copy if it looks useful.


    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luuk@21:1/5 to Martin Gregorie on Mon Jul 20 08:06:31 2020
    On 18-7-2020 22:59, Martin Gregorie wrote:
    On Sat, 18 Jul 2020 16:26:35 +0200, Cecil Westerhof wrote:

    In Linux we have the time command to get the real, user and system time
    that a command takes.
    I know how to get the real time a method takes, but is there a way to
    get the user and system time a method takes also?

    The only way I can think of, offhand would be the Stopwatch class, which appeared here some time ago. I won't include it as the (reasonably well- commented) copy I have is 118 lines. It has start(), stop() methods, so start/stop should be called to time a process and toString() to display
    the elapsed time in seconds. Its resolution is to the nearest millisecond.

    I've not included it because my copy is 118 lines, tolerably commented.
    I can e-mail you a copy if it looks useful.



    does it looks like this class: https://introcs.cs.princeton.edu/java/stdlib/Stopwatch.java.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Luuk on Mon Jul 20 08:55:33 2020
    On Mon, 20 Jul 2020 08:06:31 +0200, Luuk wrote:

    On 18-7-2020 22:59, Martin Gregorie wrote:
    On Sat, 18 Jul 2020 16:26:35 +0200, Cecil Westerhof wrote:

    In Linux we have the time command to get the real, user and system
    time that a command takes.
    I know how to get the real time a method takes, but is there a way to
    get the user and system time a method takes also?

    The only way I can think of, offhand would be the Stopwatch class,
    which appeared here some time ago. I won't include it as the
    (reasonably well- commented) copy I have is 118 lines. It has start(),
    stop() methods, so start/stop should be called to time a process and
    toString() to display the elapsed time in seconds. Its resolution is to
    the nearest millisecond.

    I've not included it because my copy is 118 lines, tolerably commented.
    I can e-mail you a copy if it looks useful.



    does it looks like this class: https://introcs.cs.princeton.edu/java/stdlib/Stopwatch.java.html

    Its a similar idea, but the version I have doesn't have a main() method
    and has start(), stop() and reset() methods, IOW its better adapted to
    what you're trying to do:


    Stopwatch sw = new StopWatch();
    ...
    sw.start();
    myMethod(arguments);
    sw.stop();
    System.stderr.println("myMethod: " + sw.toString()
    ...

    The output would be something like:

    myMethod: Stopwatch(2020.07.20.09.53.56, 2020.07.20.09.53.59, 3.001
    seconds)


    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniele Futtorovic@21:1/5 to Cecil Westerhof on Tue Jul 21 01:40:05 2020
    On 2020-07-18 16:26, Cecil Westerhof wrote:
    In Linux we have the time command to get the real, user and system
    time that a command takes.
    I know how to get the real time a method takes, but is there a way to
    get the user and system time a method takes also?

    I don't know, but I doubt it very much:

    1. Those are POSIX features, right? They don't have a ready equivalent
    in other systems, right? Then Java, which is supposed to be platform-independent, won't have them built-in.

    2. System and user time are process-level notions, aren't they? I don't
    know the C world that well, but could you get them at the method-level
    if you limited the scope to native code on a POSIX-compliant system? If
    not, you can be certain they won't be available in Java.

    3. How would even this work? What if the thread executing your method is suspended and another takes over for a few loops? What about parallel execution? How would you sort those out in terms of method-level user
    and system time?

    I don't see it. I don't even get it.

    --
    DF.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Daniele Futtorovic on Tue Jul 21 11:52:59 2020
    On Tue, 21 Jul 2020 01:40:05 +0200, Daniele Futtorovic wrote:

    1. Those are POSIX features, right? They don't have a ready equivalent
    in other systems, right? Then Java, which is supposed to be platform-independent, won't have them built-in.

    Right, and you often need to go deeper - that is once you've used the UNIX/Linux time command to help determine that its not something dumb in
    your own code that's gobbling cpu or wasting time waiting.

    Here's a case from around 2000 when DEC were still a thing and their big
    iron was running UNIX on Alpha chips.

    At the time I was involved in a high performance system for a telco: its
    job was to ingest logs from their switches and populate a datawarehouse
    with the data - it was like drinking from a firehose. We used a
    conventional star schema for the DW, but some of the dimensions contained
    a lot of data, so we needed to compress it. The initial approach was was
    to use the dimensions themselves as indexes: each holding data, e.g. a
    phone number (19 characters is needed to hold a full international
    number) while the key was an integer. The conversion was initially by SQL lookup on the DW, assigning a new key and adding the resulting row to the dimension each time the lookup scored a miss.

    This allowed log items to be loaded at 350 items a sec, max - limited by
    the disk system. We needed to load 2000 items/sec, with 8 dimension
    lookups needed per item loaded. Merely by replacing dimension data in the
    fact table with these integer keys we got something like a 3x data
    compression ratio, which is why we did it.

    Some detective work here showed that DECs state of the art SCSI
    controllers were single threaded and had all SCSI drive chains connected
    to a common bus, so no simultaneous access to disks on different threads.
    On top of that, the controller couldn't interleave commands and responses
    so couldn't do only sequentially access one disk at a time. Each
    controller had something like 8 strings of 8 disks attached to it, but
    the whole thing ran at the speed of a single disk.

    IOW, we needed to do the key generation using in-memory lookups.
    Fortunately we had lots of RAM, so built a configurable process that used Red-Black B-trees to perform the lookups and then passed the log items, complete with indexes, to the DW loader. We had a separate lookup process
    for each key so they could all run in parallel, which meant in practise
    that each lookup was chewing through its configured task on a different
    batch of input: each match was loaded into memory and passed from task to
    task until it was finally handed to the DW loader and removed from memory
    when it had been loaded into the DW. The DW loading process was now fast enough, but the lookups bottlenecked at around 800 lookups/sec in a
    growing B-tree. More detective work needed.

    It turned out that this DEC UNIX had a Mach-based kernel with a single
    request queue that serialised malloc() calls. Yes, I know this is a Java
    Group, but this system was written in C. Worse, it turned out that adding
    a single node to the B-tree required three mallocs() for each node root, left+right pointer and node data. So I wrote my own B-tree management
    function (cribbed from Sedgewick: "Algorithms") and that gave 2100
    lookups/sec. Still not enough: we needed at least around 7500 lookups/sec
    to handle the projected log volumes.

    It turned out that sbrk(), which allocated chunks of heap storage, was
    yet another process that the Mach kernel serialised, so I rewrote the B-
    tree algorithm again, this time grabbing several MB of RAM at a time and
    doing its own storage allocation inside these chunks. This did the trick:
    it now ran at 25,000 lookups/sec.

    The point of this story is that merely looking at elapsed time and CPU
    time usage couldn't tell us much: the user process was too slow, but we
    already knew that. the UNIX 'time' command showed that it wasn't CPU-
    bound and nor were any of the system processes, so we needed to look at
    other stuff to work out where the problems were. This is what we found:

    - disc controller schematics showed that all disk strings were
    attached to a common bus, serialising access to them

    - the tech specs for the disk controller told us that it would not
    interleave commands to disks on a string even though SCSI disk drives
    were capable of handling interleaved commands

    Now we knew why the Data Warehouse performance couldn't be improved
    beyond what we'd already achieved by tuning it.

    - reading the source code of the tsearch() function told us about the
    multiple malloc()s needed to add a node to the tree

    FWIW the Java TreeMap class is also a Red-Black binary tree
    implementation, so similar to the C library tsearch() function
    and friends, but seems to have somewhat better performance than the
    DEC implementation.

    - the bottleneck causes by the way the Mach kernel serialised each type
    of system call was found by looking at kernel statistics and kernel
    documentation.

    2. System and user time are process-level notions, aren't they?

    Yes.

    I don't know the C world that well, but could you get them at the method-level if you limited the scope to native code on a
    POSIX-compliant system? If not, you can be certain they won't
    be available in Java.

    No. UNIX/Linux only aggregates elapsed and mill time per process, so
    even for a multi-threaded process you only get process totals. Under UNIX/ Linux the JVM is a multi-threaded process with a single thread handling
    user program execution: IIRC it spawns threads to handle garbage
    collection and Swing/awt but nothing external to the JVM process will see either these or any user-defined threads that your program may execute.

    If you watch 'top' while a Java program is running, you only see one
    entry for it.

    C programs can launch multiple processes and share data between them:
    these will show up in 'top' as separate processes accumulating their own
    totals for clock and mill time and with their own memory allocations.
    IIRC shared memory is credited to the process that allocated it.

    3. How would even this work? What if the thread executing your method is suspended and another takes over for a few loops? What about parallel execution? How would you sort those out in terms of method-level user
    and system time?

    All accumulated into the process totals. See above.


    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Sosman@21:1/5 to Martin Gregorie on Tue Jul 21 08:42:32 2020
    Topic drift ...

    On 7/21/2020 7:52 AM, Martin Gregorie wrote:

    [...] Under UNIX/
    Linux the JVM is a multi-threaded process with a single thread handling
    user program execution: IIRC it spawns threads to handle garbage
    collection and Swing/awt but nothing external to the JVM process will see either these or any user-defined threads that your program may execute.

    This sounds like the "green threads" model used in Old-Time Original
    Java of a quarter-century ago. Is Linux still stuck with that relic?
    (IIRC there was a time when Linux sort of faked multi-threading by
    running multiple processes, but I kind of thought "LinuxThreads" had
    long since been replaced by "NPTL.")

    Anyhow: If you're right, that would explain why nobody runs Java
    on Linux multi-core servers ...

    --
    esosman@comcast-dot-net.invalid
    Maskless Trump is a bare-faced liar.
    One hundred eighty-three days to go.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Eric Sosman on Tue Jul 21 16:55:22 2020
    On Tue, 21 Jul 2020 08:42:32 -0400, Eric Sosman wrote:
    This sounds like the "green threads" model used in Old-Time Original
    Java of a quarter-century ago. Is Linux still stuck with that relic?

    Pass on that: I only started to use Linux 20 years back. At that time it
    used the standard C libraries.

    (IIRC there was a time when Linux sort of faked multi-threading by
    running multiple processes, but I kind of thought "LinuxThreads" had
    long since been replaced by "NPTL.")

    Don't forget that all Unices have allways provided to ways of doing the
    same sort of thing:

    - a single process with multiple threads running in it

    - multiple, separately forked, processes that access one or more shared
    memory modules.

    These will look different when looked at by top and similar tools though
    use the same basic tools (semaphores, locks, etc).

    AFAIK Java uses the standard thread module but I could be wrong: I
    haven't examined its inner workings - just know that a running Java
    program looks the same in 'top' as a threaded C program while multiple processes with with shared memory look quite different.

    Anyhow: If you're right, that would explain why nobody runs Java
    on Linux multi-core servers ...

    More do that you may realise: my mail archive is Java and runs on a dual Athlon. I run a variety of Java applications on this Intel i5 based
    laptop. All my computers run Linux apart from one or two PIC-based
    devices (PicAXE and Parallax STAMP) but those don't run Java.


    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Sosman@21:1/5 to Martin Gregorie on Tue Jul 21 15:46:23 2020
    On 7/21/2020 12:55 PM, Martin Gregorie wrote:
    On Tue, 21 Jul 2020 08:42:32 -0400, Eric Sosman wrote:
    This sounds like the "green threads" model used in Old-Time Original
    Java of a quarter-century ago. Is Linux still stuck with that relic?
    [...]
    AFAIK Java uses the standard thread module but I could be wrong: I
    haven't examined its inner workings - just know that a running Java
    program looks the same in 'top' as a threaded C program while multiple processes with with shared memory look quite different.

    Okay, then: The external evidence ("JVM has only one `top' line")
    means that Java isn't using the old "LinuxThreads" implementation, but
    we can't really tell whether it is or isn't using "green threads."

    Anyhow: If you're right, that would explain why nobody runs Java
    on Linux multi-core servers ...

    More do that you may realise: my mail archive is Java and runs on a dual Athlon. I run a variety of Java applications on this Intel i5 based
    laptop. All my computers run Linux apart from one or two PIC-based
    devices (PicAXE and Parallax STAMP) but those don't run Java.

    I guess my intended ironic tone didn't survive transcription onto
    Usenet. Shoulda used a smiley ...

    --
    esosman@comcast-dot-net.invalid
    Maskless Trump is a bare-faced liar.
    One hundred eighty-three days to go.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Eric Sosman on Tue Jul 21 20:16:04 2020
    On Tue, 21 Jul 2020 15:46:23 -0400, Eric Sosman wrote:

    I guess my intended ironic tone didn't survive transcription onto Usenet. Shoulda used a smiley ...

    Yeah - sorry 'bout that.

    I've seen a few architectures that I'd be interested to see Java run on,
    just to see if it could be done. Things like a Tandem NonStop fault-
    tolerant system for instance.


    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniele Futtorovic@21:1/5 to Martin Gregorie on Wed Jul 22 00:23:02 2020
    On 2020-07-21 13:52, Martin Gregorie wrote:
    On Tue, 21 Jul 2020 01:40:05 +0200, Daniele Futtorovic wrote:
    2. System and user time are process-level notions, aren't they?

    Yes.

    I don't know the C world that well, but could you get them at the
    method-level if you limited the scope to native code on a
    POSIX-compliant system? If not, you can be certain they won't
    be available in Java.

    No. UNIX/Linux only aggregates elapsed and mill time per process, so
    even for a multi-threaded process you only get process totals. Under UNIX/ Linux the JVM is a multi-threaded process with a single thread handling
    user program execution: IIRC it spawns threads to handle garbage
    collection and Swing/awt but nothing external to the JVM process will see either these or any user-defined threads that your program may execute.

    Thanks for the confirmation, Martin.

    --
    DF.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Gregorie@21:1/5 to Martin Gregorie on Tue Jul 21 23:14:04 2020
    On Tue, 21 Jul 2020 16:55:22 +0000, Martin Gregorie wrote:

    - multiple, separately forked, processes that access one or more shared
    memory modules.

    These will look different when looked at by top and similar tools though
    use the same basic tools (semaphores, locks, etc).

    I should have added that this is a more thread-safe way to write C, but
    of course its sort of irrelevant to Java because the latter's memory
    management is much safer, especially for strings: not allowing space for
    the terminating NULL (or forgetting to append the NULL if you're
    assembling a string are classic ways of causing a segfault crash.

    There are other uses for shared memory: in the system which had the slow
    lookup problem we handled the log files by reading each into a 1-2MB
    chunk of shared memory by the log loader process. It then handed the
    shared memory pointer to a file control process. Then, as each task in
    the data prep chain, i.e. the dimension key lookups and database loader finished processing a log, it released its connection to log, causing the
    file controller to mark it log as ready for the next process in the line,
    and requested a connection to the next available log.

    This was inherently fast because it each log file was only loaded into
    memory while being formatted with NULLS where the integer keys would go.
    This done, the log didn't move in memory or have its size changed until
    it had been loaded into the database and deleted.


    --
    Martin | martin at
    Gregorie | gregorie dot org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)