Forum: >>> Magnum BBS <<<

deque<> vs. list<> for a queue

From Bonita Montero@21:1/5 to All on Fri Nov 17 11:06:02 2023

I know that a deque is superior to a list if you want to have a queue
because it does chunk-allocations, thereby allogating the storage for
multiple nodes at once. But I was curious about what's the actual per- formance-impact. And I was interested in how many allocations were
saved through a deque. So I overloaded new and delete, redirected the allocation to malloc and free and counted the number of allocations.

#include <iostream>
#include <deque>
#include <string>
#include <sstream>
#include <chrono>
#include <optional>
#include <list>

using namespace std;
using namespace chrono;

uint64_t allocs = 0;

void *operator new( size_t n )
{
++allocs;
return malloc( n );
}

void operator delete( void *p ) noexcept
{
free( p );
}

int main()
{
auto bench = []( auto queue )
{
for( size_t i = 0; i != 10'000; ++i )
queue.emplace_back( (ostringstream() << "no short string
optimization: " << i).str() );
auto start = high_resolution_clock::now();
constexpr size_t ROUNDS = 100'000'000;
uint64_t allocsBefore = ::allocs;
for( size_t r = ROUNDS; r--; )
{
string str( move( queue.front() ) );
queue.pop_front();
queue.emplace_back( move( str ) );
}
double ns = (double)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / ROUNDS;
uint64_t allocs = ::allocs - allocsBefore;
cout << ns << ": " << allocs << endl;
};
bench( deque<string>() );
bench( list<string>() );
}

With MSVC I get the following results on a AMD 7950X Zen4 CPU:
4.01774: 6384
23.5851: 100000000
So the deque is about eight times faster and has 15.600 times less
allocations. So the chunk size must be really large with MSVC's
deque.
With g++ 12.3.0 the results are the following:
3.13869: 6250000
11.8536: 100000000
So the deque is about 3.5 times faster and the chunks seem rather
small with 16 elements per chunk. Nevertheless libstdc++ seems to
have the faster memory allocator so that allocating so much more
chunks doesn't result in less performance. The better allocation
performance shows also with the list-performance.
So the decision is for a deque if you can afford the memory blend.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Fri Nov 17 11:29:27 2023

I guess that MSVC's deque-implementation attaches an just emptied front
chunk to the back's capactity. Maybe I'll find a way to prove that.

Am 17.11.2023 um 11:06 schrieb Bonita Montero:

I know that a deque is superior to a list if you want to have a queue
because it does chunk-allocations, thereby allogating the storage for multiple nodes at once. But I was curious about what's the actual per- formance-impact. And I was interested in how many allocations were
saved through a deque. So I overloaded new and delete, redirected the allocation to malloc and free and counted the number of allocations.

#include <iostream>
#include <deque>
#include <string>
#include <sstream>
#include <chrono>
#include <optional>
#include <list>

using namespace std;
using namespace chrono;

uint64_t allocs = 0;

void *operator new( size_t n )
{
    ++allocs;
    return malloc( n );
}

void operator delete( void *p ) noexcept
{
    free( p );
}

int main()
{
    auto bench = []( auto queue )
    {
        for( size_t i = 0; i != 10'000; ++i )
            queue.emplace_back( (ostringstream() << "no short string
optimization: " << i).str() );
        auto start = high_resolution_clock::now();
        constexpr size_t ROUNDS = 100'000'000;
        uint64_t allocsBefore = ::allocs;
        for( size_t r = ROUNDS; r--; )
        {
            string str( move( queue.front() ) );
            queue.pop_front();
            queue.emplace_back( move( str ) );
        }
        double ns = (double)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / ROUNDS;
        uint64_t allocs = ::allocs - allocsBefore;
        cout << ns << ": " << allocs << endl;
    };
    bench( deque<string>() );
    bench( list<string>() );
}

With MSVC I get the following results on a AMD 7950X Zen4 CPU:
    4.01774: 6384
    23.5851: 100000000
So the deque is about eight times faster and has 15.600 times less allocations. So the chunk size must be really large with MSVC's
deque.
With g++ 12.3.0 the results are the following:
    3.13869: 6250000
    11.8536: 100000000
So the deque is about 3.5 times faster and the chunks seem rather
small with 16 elements per chunk. Nevertheless libstdc++ seems to
have the faster memory allocator so that allocating so much more
chunks doesn't result in less performance. The better allocation
performance shows also with the list-performance.
So the decision is for a deque if you can afford the memory blend.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Pavel@21:1/5 to Bonita Montero on Fri Nov 17 21:42:49 2023

Bonita Montero wrote:

I know that a deque is superior to a list if you want to have a queue
because it does chunk-allocations, thereby allogating the storage for multiple nodes at once. But I was curious about what's the actual per- formance-impact. And I was interested in how many allocations were
saved through a deque. So I overloaded new and delete, redirected the allocation to malloc and free and counted the number of allocations.

#include <iostream>
#include <deque>
#include <string>
#include <sstream>
#include <chrono>
#include <optional>
#include <list>

using namespace std;
using namespace chrono;

uint64_t allocs = 0;

void *operator new( size_t n )
{
    ++allocs;
    return malloc( n );
}

void operator delete( void *p ) noexcept
{
    free( p );
}

int main()
{
    auto bench = []( auto queue )
    {
        for( size_t i = 0; i != 10'000; ++i )
            queue.emplace_back( (ostringstream() << "no short string
optimization: " << i).str() );
        auto start = high_resolution_clock::now();
        constexpr size_t ROUNDS = 100'000'000;
        uint64_t allocsBefore = ::allocs;
        for( size_t r = ROUNDS; r--; )
        {
            string str( move( queue.front() ) );
            queue.pop_front();
            queue.emplace_back( move( str ) );
        }
        double ns = (double)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / ROUNDS;
        uint64_t allocs = ::allocs - allocsBefore;
        cout << ns << ": " << allocs << endl;
    };
    bench( deque<string>() );
    bench( list<string>() );
}

With MSVC I get the following results on a AMD 7950X Zen4 CPU:
    4.01774: 6384
    23.5851: 100000000
So the deque is about eight times faster and has 15.600 times less allocations. So the chunk size must be really large with MSVC's
deque.
With g++ 12.3.0 the results are the following:
    3.13869: 6250000
    11.8536: 100000000
So the deque is about 3.5 times faster and the chunks seem rather
small with 16 elements per chunk. Nevertheless libstdc++ seems to
have the faster memory allocator so that allocating so much more
chunks doesn't result in less performance. The better allocation
performance shows also with the list-performance.
So the decision is for a deque if you can afford the memory blend.

This particular test will most likely be bested by boost::circular_buffer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bonita Montero@21:1/5 to All on Sat Nov 18 05:19:04 2023

Am 18.11.2023 um 03:42 schrieb Pavel:

This particular test will most likely be bested by boost::circular_buffer.

I've got about 18 cycles per rotate from the back to the front unter
linux. A circular buffer wouldn't be much more efficient and has the
drawback that it is expensive to extend.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jun 11 07:19:27 2024
  from Wales, Uk via Raw
- Cronus
  Mon Jun 10 23:25:08 2024
  from Provo, Ut via SSH
- Bob Worm
  Mon Jun 10 22:16:20 2024
  from Wales, Uk via Raw
- Bob Worm
  Mon Jun 10 22:01:32 2024
  from Wales, Uk via Raw
- Keyop
  Mon Jun 10 11:17:06 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Mon Jun 10 11:16:42 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon Jun 10 08:16:19 2024
  from Wales, Uk via Raw
- Keyop
  Mon Jun 10 01:56:48 2024
  from Huddersfield, West Yorkshire via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	307
Nodes:	16 (2 / 14)
Uptime:	30:37:57
Calls:	6,907
Calls today:	1
Files:	12,376
Messages:	5,427,923
Posted today:	1

deque<> vs. list<> for a queue

Who's Online

Recent Visitors

System Info