I'm currently working on something supposed to enable matching names in
DNS queries against a large "list of bad domains you don't want to be
talking to". The test file I'm working with has 1,118,228
entries. Reading all of the information in it into memory needs a little
over 34M of RAM. A program using calloc to allocate the necessary
storage ends up with an >49M heap due to memory wasted by GNU malloc on $something.
I'm currently working on something supposed to enable matching names in
DNS queries against a large "list of bad domains you don't want to be
talking to". The test file I'm working with has 1,118,228
entries. Reading all of the information in it into memory needs a little
over 34M of RAM.
A program using calloc to allocate the necessary
storage ends up with an >49M heap due to memory wasted by GNU malloc on $something.
In contrast to this, using mmap to allocate an area of a suitable size
and putting the structures and domain names there just wastes 2400
bytes.
The code will possibly also run on a modern "embedded system"/ SOHO
router.
While these are pretty powerful nowadays, 15M of RAM getting
sucked into malloc for no particular purpose seems excessive to
me.
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast to this, using mmap to allocate an area of a suitable
size and putting the structures and domain names there just wastes
2400 bytes.
mmap operates directly on MMU-Level, typically allocating 4k Pages.
It is efficient if it is not used too often or for too small chunks.
I used mmap on Linux in a compression utility recently that had to
scan the same file a number of times because I assumed it would be
faster than normal I/O. Turned out I was wrong and in fact using
open(), read() etc was about twice as fast which I still find
confusing because I thought all the standard C I/O used mmap (or its
kernel hook) eventually anyway. Strange.
In contrast to this, using mmap to allocate an area of a suitable size
and putting the structures and domain names there just wastes 2400
bytes.
mmap operates directly on MMU-Level, typically allocating 4k Pages.
It is efficient if it is not used too often or for too small chunks.
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast to this, using mmap to allocate an area of a suitable
size and putting the structures and domain names there just wastes
2400 bytes.
mmap operates directly on MMU-Level, typically allocating 4k Pages.
It is efficient if it is not used too often or for too small chunks.
I used mmap on Linux in a compression utility recently that had to
scan the same file a number of times because I assumed it would be
faster than normal I/O. Turned out I was wrong and in fact using
open(), read() etc was about twice as fast which I still find
confusing because I thought all the standard C I/O used mmap (or its
kernel hook) eventually anyway. Strange.
I did a similar experiment a couple of decades ago, with similar
results.
The reason, as I understand it, is that changing your process’s virtual >memory mapping is relatively expensive, even compared to the copying
involved in read().
If it's the former, it sounds about right. I don't think this is fragmentation, just the allocator chunk metadata overhead.
Each chunk needs some overhead to hold the size/flags and prev size. It
seems like each entry is about 31 bytes (34M / 1118228), and you say
(49M / 1118228) = ~45 bytes are used, so that's ~14 bytes of overhead, probably the two size_t's on x64.
On 2021-07-13, Rainer Weikusat wrote:
I'm currently working on something supposed to enable matching names in
DNS queries against a large "list of bad domains you don't want to be
talking to". The test file I'm working with has 1,118,228
entries. Reading all of the information in it into memory needs a little
over 34M of RAM. A program using calloc to allocate the necessary
storage ends up with an >49M heap due to memory wasted by GNU malloc on
$something.
You mean each entry is a calloc(), or you do calloc(1118228, sizeof(entry))?
If it's the former, it sounds about right. I don't think this is fragmentation, just the allocator chunk metadata overhead.
Each chunk needs some overhead to hold the size/flags and prev size. It
seems like each entry is about 31 bytes (34M / 1118228), and you say
(49M / 1118228) = ~45 bytes are used, so that's ~14 bytes of overhead, probably the two size_t's on x64.
I'm currently working on something supposed to enable matching names in
DNS queries against a large "list of bad domains you don't want to be
talking to". The test file I'm working with has 1,118,228
entries. Reading all of the information in it into memory needs a little
over 34M of RAM. A program using calloc to allocate the necessary
storage ends up with an >49M heap due to memory wasted by GNU malloc on >$something.
On Tue, 13 Jul 2021 22:46:37 +0200
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast to this, using mmap to allocate an area of a suitable size
and putting the structures and domain names there just wastes 2400
bytes.
mmap operates directly on MMU-Level, typically allocating 4k Pages.
It is efficient if it is not used too often or for too small chunks.
I used mmap on Linux in a compression utility recently that had to scan the same file a number of times because I assumed it would be faster than normal I/O. Turned out I was wrong and in fact using open(), read() etc was about twice as fast which I still find confusing because I thought all the standard C I/O used mmap (or its kernel hook) eventually anyway. Strange.
In contrast you probably did not use read with only 4kb chunks. But even
if you did so the file system cache has optimizations for sequential
read and uses read ahead.
On Thu, 15 Jul 2021 07:37:42 +0200
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast you probably did not use read with only 4kb chunks. But even
if you did so the file system cache has optimizations for sequential
read and uses read ahead.
In that case does anyone know how read() etc on Linux actually access the >file information? Seems like I was wrong about the I/O library mapping the >file (unless it does it in chunks) so are there other kernel hooks it
uses?
Am 14.07.21 um 09:24 schrieb MrSpud_pjnlvZ@r7hwq328lx.tv:
On Tue, 13 Jul 2021 22:46:37 +0200
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast to this, using mmap to allocate an area of a suitable size >>>> and putting the structures and domain names there just wastes 2400
bytes.
mmap operates directly on MMU-Level, typically allocating 4k Pages.
It is efficient if it is not used too often or for too small chunks.
I used mmap on Linux in a compression utility recently that had to scan the >> same file a number of times because I assumed it would be faster than normal >> I/O. Turned out I was wrong and in fact using open(), read() etc was about >> twice as fast which I still find confusing because I thought all the standard
C I/O used mmap (or its kernel hook) eventually anyway. Strange.
The problem with mmap is, that it does not read anything. You will get a
page fault every 4kB and so the Data is read in 4k Chunks, allocating >physical memory in the same chunk size.
significant fragmentation of the physical memory leading to many TLB
entries.
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast you probably did not use read with only 4kb chunks. But even
if you did so the file system cache has optimizations for sequential
read and uses read ahead.
In that case does anyone know how read() etc on Linux actually access the file information? Seems like I was wrong about the I/O library mapping the file (unless it does it in chunks) so are there other kernel hooks it
uses?
I used mmap on Linux in a compression utility recently that had to
scan the same file a number of times because I assumed it would be
faster than normal I/O. Turned out I was wrong and in fact using
open(), read() etc was about twice as fast which I still find
confusing because I thought all the standard C I/O used mmap (or its
kernel hook) eventually anyway. Strange.
I did a similar experiment a couple of decades ago, with similar
results.
The reason, as I understand it, is that changing your process?s
virtual memory mapping is relatively expensive, even compared to the
copying involved in read().
MrSpud_na@7a31.net writes:
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
In contrast you probably did not use read with only 4kb chunks. But even >>>if you did so the file system cache has optimizations for sequential
read and uses read ahead.
In that case does anyone know how read() etc on Linux actually access the
file information? Seems like I was wrong about the I/O library mapping the >> file (unless it does it in chunks) so are there other kernel hooks it
uses?
Scott has given you most of the answer. The connection to memory mapping
is that when you memory map (part of) a file, the physical RAM used for
that mapping is same RAM that is used for the kernel’s page cache for
the corresponding part of the file (with some caveats about private >mappings).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 71:00:11 |
Calls: | 6,656 |
Calls today: | 2 |
Files: | 12,201 |
Messages: | 5,332,212 |
Posted today: | 1 |