I know that a deque is superior to a list if you want to have a queue
because it does chunk-allocations, thereby allogating the storage for multiple nodes at once. But I was curious about what's the actual per- formance-impact. And I was interested in how many allocations were
saved through a deque. So I overloaded new and delete, redirected the allocation to malloc and free and counted the number of allocations.
#include <iostream>
#include <deque>
#include <string>
#include <sstream>
#include <chrono>
#include <optional>
#include <list>
using namespace std;
using namespace chrono;
uint64_t allocs = 0;
void *operator new( size_t n )
{
++allocs;
return malloc( n );
}
void operator delete( void *p ) noexcept
{
free( p );
}
int main()
{
auto bench = []( auto queue )
{
for( size_t i = 0; i != 10'000; ++i )
queue.emplace_back( (ostringstream() << "no short string
optimization: " << i).str() );
auto start = high_resolution_clock::now();
constexpr size_t ROUNDS = 100'000'000;
uint64_t allocsBefore = ::allocs;
for( size_t r = ROUNDS; r--; )
{
string str( move( queue.front() ) );
queue.pop_front();
queue.emplace_back( move( str ) );
}
double ns = (double)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / ROUNDS;
uint64_t allocs = ::allocs - allocsBefore;
cout << ns << ": " << allocs << endl;
};
bench( deque<string>() );
bench( list<string>() );
}
With MSVC I get the following results on a AMD 7950X Zen4 CPU:
4.01774: 6384
23.5851: 100000000
So the deque is about eight times faster and has 15.600 times less allocations. So the chunk size must be really large with MSVC's
deque.
With g++ 12.3.0 the results are the following:
3.13869: 6250000
11.8536: 100000000
So the deque is about 3.5 times faster and the chunks seem rather
small with 16 elements per chunk. Nevertheless libstdc++ seems to
have the faster memory allocator so that allocating so much more
chunks doesn't result in less performance. The better allocation
performance shows also with the list-performance.
So the decision is for a deque if you can afford the memory blend.
I know that a deque is superior to a list if you want to have a queue
because it does chunk-allocations, thereby allogating the storage for multiple nodes at once. But I was curious about what's the actual per- formance-impact. And I was interested in how many allocations were
saved through a deque. So I overloaded new and delete, redirected the allocation to malloc and free and counted the number of allocations.
#include <iostream>
#include <deque>
#include <string>
#include <sstream>
#include <chrono>
#include <optional>
#include <list>
using namespace std;
using namespace chrono;
uint64_t allocs = 0;
void *operator new( size_t n )
{
++allocs;
return malloc( n );
}
void operator delete( void *p ) noexcept
{
free( p );
}
int main()
{
auto bench = []( auto queue )
{
for( size_t i = 0; i != 10'000; ++i )
queue.emplace_back( (ostringstream() << "no short string
optimization: " << i).str() );
auto start = high_resolution_clock::now();
constexpr size_t ROUNDS = 100'000'000;
uint64_t allocsBefore = ::allocs;
for( size_t r = ROUNDS; r--; )
{
string str( move( queue.front() ) );
queue.pop_front();
queue.emplace_back( move( str ) );
}
double ns = (double)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / ROUNDS;
uint64_t allocs = ::allocs - allocsBefore;
cout << ns << ": " << allocs << endl;
};
bench( deque<string>() );
bench( list<string>() );
}
With MSVC I get the following results on a AMD 7950X Zen4 CPU:
4.01774: 6384
23.5851: 100000000
So the deque is about eight times faster and has 15.600 times less allocations. So the chunk size must be really large with MSVC's
deque.
With g++ 12.3.0 the results are the following:
3.13869: 6250000
11.8536: 100000000
So the deque is about 3.5 times faster and the chunks seem rather
small with 16 elements per chunk. Nevertheless libstdc++ seems to
have the faster memory allocator so that allocating so much more
chunks doesn't result in less performance. The better allocation
performance shows also with the list-performance.
So the decision is for a deque if you can afford the memory blend.
This particular test will most likely be bested by boost::circular_buffer.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 30:37:57 |
Calls: | 6,907 |
Calls today: | 1 |
Files: | 12,376 |
Messages: | 5,427,923 |
Posted today: | 1 |