• Fastest way to update 20,000+ game objects position?

    From Zenchess@21:1/5 to All on Wed Sep 2 20:40:04 2020
    Hi, I wrote an external interfacing wrapper for Raylib, and duplicated their BunnyMark demo. I was surprised that I could spawn far less bunnies without a lot of FPS drop. After some profiling, it turned out that updating the positions of each bunny
    by += velocity was taking a very long time, something like 50+ms on 20,000 objects.

    What would be the fastest way to do this just inside dolphin? I know that the theoretical fastest way would probably be to write a c++ library to do it.

    The way I was doing it was I had bunny objects, with points for position and velocity. They were stored in an OrderedCollection, and then a do:[:each| each position: each position + each velocity].

    I know very little about optimization in Smalltalk so I would appreciate any advice on this topic, thanks - Jacob

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sergio Del Franco@21:1/5 to All on Fri Sep 4 09:03:53 2020
    El Thursday, September 3, 2020 a la(s) 12:40:05 AM UTC-3, Zenchess escribió:
    Hi, I wrote an external interfacing wrapper for Raylib, and duplicated their BunnyMark demo. I was surprised that I could spawn far less bunnies without a lot of FPS drop. After some profiling, it turned out that updating the positions of each bunny by
    += velocity was taking a very long time, something like 50+ms on 20,000 objects.
    What would be the fastest way to do this just inside dolphin? I know that the theoretical fastest way would probably be to write a c++ library to do it.
    The way I was doing it was I had bunny objects, with points for position and velocity. They were stored in an OrderedCollection, and then a do:[:each| each position: each position + each velocity].
    I know very little about optimization in Smalltalk so I would appreciate any advice on this topic, thanks - Jacob

    The profiler should help you find where the problem is.
    E.g.:
    [BunnyMarkDemo run] profile.
    Then select the sample set in the first tab and analize the information the last four tabs. As this is your first time, I recommend you to start by the last one, "Time".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruno Buzzi Brassesco@21:1/5 to All on Mon Sep 7 10:02:51 2020
    Hi,

    Did you take a look at:
    https://www.youtube.com/watch?v=NiJWcQJseLQ https://www.youtube.com/watch?v=nIb9rmxJ4Ko https://www.youtube.com/watch?v=_6Obmt0hGh4 https://www.youtube.com/watch?v=koOsS-vD-XA

    For all simulations you have to create a subclass od DoubleBufferedView (or maybe create your own DoubleBufferedView).

    regards,
    bruno

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Zenchess@21:1/5 to All on Mon Sep 7 21:54:06 2020
    Yes, I've already been rendering games with opengl. I'm using GLFW to create the window, not dolphin.
    The problem isn't a rendering issue - it's that running the code I gave above takes a very long time to execute.

    Here is an example code that illustrates the issue a := OrderedCollection new.

    1 to: 20000 do: [:count|
    a add: (2@2)].

    Time millisecondsToRun: [
    1 to: 20000 do: [:index|
    (a at: index) x: index]
    ]

    would give like 50-80 ms to run the second block.


    This is not the exact code in my update loop, but basically I am updating the position of 20,000 or so game objects every frame and it's taking around 50-80 ms to run code like the above. I was just seeing if there is a more performant way of updating
    a large number of objects in dolphin, i.e. perhaps I shouldnt use a class at all for my game objects, and store everything in an array, an orderedCollection, I'm not really sure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john.aspinall@gmail.com@21:1/5 to Zenchess on Tue Sep 8 00:17:37 2020
    What's the spec of your machine? For me your test code runs in 4ms - this is in Windows 10 running under VMWare on a 10 year old Mac, so probably not especially performant by today's standards.

    I'd second the suggestion of running a test in the Profiler to highlight any hotspots in your code.


    On Tuesday, September 8, 2020 at 5:54:07 AM UTC+1, Zenchess wrote:
    Yes, I've already been rendering games with opengl. I'm using GLFW to create the window, not dolphin.
    The problem isn't a rendering issue - it's that running the code I gave above takes a very long time to execute.

    Here is an example code that illustrates the issue a := OrderedCollection new.

    1 to: 20000 do: [:count|
    a add: (2@2)].

    Time millisecondsToRun: [
    1 to: 20000 do: [:index|
    (a at: index) x: index]
    ]

    would give like 50-80 ms to run the second block.


    This is not the exact code in my update loop, but basically I am updating the position of 20,000 or so game objects every frame and it's taking around 50-80 ms to run code like the above. I was just seeing if there is a more performant way of updating
    a large number of objects in dolphin, i.e. perhaps I shouldnt use a class at all for my game objects, and store everything in an array, an orderedCollection, I'm not really sure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bruno Buzzi Brassesco@21:1/5 to All on Tue Sep 8 04:41:01 2020
    a := OrderedCollection new.
    1 to: 20000 do: [:count|
    a add: (2@2)].

    Time millisecondsToRun: [
    1 to: 20000 do: [:index|
    (a at: index) x: index]
    ]

    would give like 50-80 ms to run the second block.

    In my machine run in 2 millisecods.

    What version of Dolphin are you using ?
    Also as John asked which hardware do you have in your machine ?

    regards,
    bruno

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ala'a Alawi@21:1/5 to Bruno Buzzi Brassesco on Tue Sep 8 10:39:31 2020
    Is it milli or micro seconds? I maybe reading something wrong

    On Tuesday, September 8, 2020 at 3:41:02 PM UTC+4, Bruno Buzzi Brassesco wrote:
    a := OrderedCollection new.
    1 to: 20000 do: [:count|
    a add: (2@2)].

    Time millisecondsToRun: [
    1 to: 20000 do: [:index|
    (a at: index) x: index]
    ]

    would give like 50-80 ms to run the second block.

    In my machine run in 2 millisecods.

    What version of Dolphin are you using ?
    Also as John asked which hardware do you have in your machine ?

    regards,
    bruno

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Zenchess@21:1/5 to All on Tue Sep 8 15:55:46 2020
    Sorry about that, that wasn't actually the code that was taking a while, actually what I was doing in my code was:
    Object subclass: #Bunny
    instanceVariableNames: 'position speed color'
    classVariableNames: ''
    poolDictionaries: ''
    classInstanceVariableNames: ''

    (accessors for position, speed and color)

    bunnies := OrderedCollection new.
    1 to: 20000 do: [:index| |bunny|
    bunny := Bunny new position: 200@200; speed: 5@5;yourself.
    bunnies add: bunny.
    ].


    and then my game loop would need to call the below to update them once per frame (of course I wasn't using time millisecondsToRun to run during the actual game loop)


    Time millisecondsToRun:[bunnies do: [:each|
    each position: each position + each speed.
    ((each position x + (32)) > 800) | ((each position x + 32) < 0) ifTrue: [each speed x: each speed x* -1].
    (each position y + (32)) > height | ((each position y + 32) < 0) ifTrue: [each speed y: each speed y* -1].
    ]].

    which takes 13 milliseconds on my dolphin (my processor is an i7 7700k). However when I profiled the same code in c++ it would only take about 200 microseconds.

    Anyway, thanks for the responses, I will check the profiler in dolphin to see if I can optimize this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john.aspinall@gmail.com@21:1/5 to Zenchess on Wed Sep 9 01:02:00 2020
    Dolphin is interpreted so will always be slower than a compiled language. This paper is rather old but still worth reading on the pros, cons and philosophy behind Dolphin's interpreter:

    http://www.object-arts.com/downloads/papers/TheInterpreterIsDead.PDF

    As a further comparison, running your test code in Pharo (JIT compiled) takes 2ms, compared to 18ms in Dolphin on my machine (I've changed the undefined 'height' for 800 in order to run the code).

    In terms of optimisation, the Point arithmetic (position + speed) appears to be the main hotspot, possibly down to creating a new object. Reworking this to directly modify the existing position as follows:

    each position x: each position x + each speed x.
    each position y: each position y + each speed y.

    ...brings my runtime down to 12ms. This can be slightly reduced further (and the code tidied) by implementing a method in Point to do this:

    moveBy: aPoint

    x := x + aPoint x.
    y := y + aPoint y

    The test code then becomes:

    each position moveBy: each speed.

    ...and runtime is 11ms.

    A further optimisation is to use the shortcutting form of or: [..] instead of | in the conditional checks:

    (((each position x + (32)) > 800) or: [(each position x + 32) < 0]) ifTrue: [each speed x: each speed x* -1].
    (((each position y + (32)) > 800) or: [(each position y + 32) < 0]) ifTrue: [each speed y: each speed y* -1].

    My final runtime is then 8ms.

    HTH.

    John



    On Tuesday, September 8, 2020 at 11:55:48 PM UTC+1, Zenchess wrote:
    Sorry about that, that wasn't actually the code that was taking a while, actually what I was doing in my code was:
    Object subclass: #Bunny
    instanceVariableNames: 'position speed color'
    classVariableNames: ''
    poolDictionaries: ''
    classInstanceVariableNames: ''

    (accessors for position, speed and color)

    bunnies := OrderedCollection new.
    1 to: 20000 do: [:index| |bunny|
    bunny := Bunny new position: 200@200; speed: 5@5;yourself.
    bunnies add: bunny.
    ].


    and then my game loop would need to call the below to update them once per frame (of course I wasn't using time millisecondsToRun to run during the actual game loop)


    Time millisecondsToRun:[bunnies do: [:each|
    each position: each position + each speed.
    ((each position x + (32)) > 800) | ((each position x + 32) < 0) ifTrue: [each speed x: each speed x* -1].
    (each position y + (32)) > height | ((each position y + 32) < 0) ifTrue: [each speed y: each speed y* -1].
    ]].

    which takes 13 milliseconds on my dolphin (my processor is an i7 7700k). However when I profiled the same code in c++ it would only take about 200 microseconds.

    Anyway, thanks for the responses, I will check the profiler in dolphin to see if I can optimize this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Zenchess@21:1/5 to All on Wed Sep 9 21:53:39 2020
    Thanks John. Do you think there is a way to further optimize it, for example not using points, or some other different way of structuring the data?
    Optimization in smalltalk is a topic I am interested in but have read nothing about as far as I remember. Eventually I guess I will have to learn c++ better
    so I can check out what is actually happening behind the scenes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john.aspinall@gmail.com@21:1/5 to Zenchess on Thu Sep 10 01:15:44 2020
    Nice idea. I removed the use of points, instead storing positionX, positionY, speedX and speedY directly within Bunny, and added Bunny>>moveAndCheck

    moveAndCheck

    positionX := positionX + speedX.
    positionY := positionY + speedY.

    "Also remove numeric calculations here for small extra gain"
    ((positionX > 768) or: [positionX < -32]) ifTrue: [speedX := speedX * -1].
    ((positionY > 768) or: [positionY < -32]) ifTrue: [speedY := speedY * -1].

    Test code then simply becomes:

    Time millisecondsToRun: [bunnies do: [ :each | each moveAndCheck]].

    This brings my runtime down to 4ms, the gain likely coming from the removal of a number of message sends (which are relatively expensive, as detailed in the "Interpreter is dead slow?" paper).

    I'm going to stick my neck out and say this is about as fast as this code can be made to run in Dolphin, though I'd be delighted to be proved wrong!

    Cheers.

    John


    On Thursday, September 10, 2020 at 5:53:40 AM UTC+1, Zenchess wrote:
    Thanks John. Do you think there is a way to further optimize it, for example not using points, or some other different way of structuring the data?
    Optimization in smalltalk is a topic I am interested in but have read nothing about as far as I remember. Eventually I guess I will have to learn c++ better
    so I can check out what is actually happening behind the scenes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)