Package: apt, compared to 0.4 seconds without compression.
Version: 2.7.12
I noticed that searching for packages is very slow if the package lists are compressed. To reproduce, remove `/var/lib/apt/lists`, enable
Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";
, run `apt update`. This enables LZ4 compression on my systems, but I don't think the exact method matters. You can then run `apt search librust`, which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has compression already set up)
Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't matter too much.
I tried to look into it, and `strace -e trace=openat apt-cache search librust` shows it reopen and re-read one of the package lists:
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+default-dev - Cross-platform symbolication library - feature "default"
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+object-dev - Cross-platform symbolication library - feature "object"
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - feature "rustc-demangle"
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+std-dev - Cross-platform symbolication library - feature "std"
(you can use -e trace=openat,read to confirm that it's actually reading the file)
I believe it's quadratic in the number of search results, and this is related to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress the file in order to seek to the destination.
Unfortunately, I suspect this isn't exactly an easy fix, given the current design.
On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote:up), compared to 0.4 seconds without compression.
Package: apt
Version: 2.7.12
I noticed that searching for packages is very slow if the package lists are compressed. To reproduce, remove `/var/lib/apt/lists`, enable
Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";
, run `apt update`. This enables LZ4 compression on my systems, but I don't think the exact method matters. You can then run `apt search librust`, which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has compression already set
Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't matter too much.
I tried to look into it, and `strace -e trace=openat apt-cache search librust` shows it reopen and re-read one of the package lists:
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+default-dev - Cross-platform symbolication library - feature "default"
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+object-dev - Cross-platform symbolication library - feature "object"
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - feature "rustc-demangle"
openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16
librust-addr2line+std-dev - Cross-platform symbolication library - feature "std"
(you can use -e trace=openat,read to confirm that it's actually reading the file)
I believe it's quadratic in the number of search results, and this is related to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress the file in order to seek to the destination.
Unfortunately, I suspect this isn't exactly an easy fix, given the current design.
Going to respond to this but also including responses to your followup email which has a broken Subject:
Searching works by ordering the packages based on file, offset
and then iterating over them and looking them up. Seeking forward
to a higher offset does not involve a reopen, we just skip content
in betwene.
Full-text search is inside the description in the section parsed
for each package.
It's not clear why this fails on bookworm - I can reproduce that -
t certainly is fine in git main on my Ubuntu 24.04 system.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 11:57:03 |
Calls: | 6,706 |
Files: | 12,236 |
Messages: | 5,350,979 |