• Bug#1064629: libamd-comgr2: segfault in rocfft

    From Cordell Bloor@21:1/5 to All on Sun Feb 25 08:50:01 2024
    Package: libamd-comgr2
    Version: 6.0+git20231212.4510c28+dfsg-1~exp2
    Severity: important
    X-Debbugs-Cc: cgmb@slerp.xyz

    Dear Maintainer,

    The rocfft tests began segfaulting on all architectures when rocm-compilersupport 6.0+git20231212.4510c28+dfsg-1~exp2 was uploaded to unstable. You can see it from the CI run for 6.0+git20231212.4510c28+dfsg-1~exp1 on experimental: https://ci.rocm.debian.net/packages/r/rocfft/unstable/amd64+gfx1030/6343/
    as compared to with 5.2.3-2: https://ci.rocm.debian.net/packages/r/rocfft/unstable/amd64+gfx1030/6466/

    I've captured a backtrace, although it's very unclear to me what the
    problem is:

    root@b50a9fa13687:~# gdb /usr/libexec/rocm/librocfft0-tests/rocfft-test
    GNU gdb (Debian 13.2-1) 13.2
    Copyright (C) 2023 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    Type "show copying" and "show warranty" for details.
    This GDB was configured as "x86_64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from /usr/libexec/rocm/librocfft0-tests/rocfft-test...

    This GDB supports auto-downloading debuginfo from the following URLs:
    <https://debuginfod.debian.net>
    Enable debuginfod for this session? (y or [n]) y
    Debuginfod has been enabled.
    To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit. Reading symbols from /root/.cache/debuginfod_client/b2eea099f3a928be0c9fb7ba45fbee4d9b157b43/debuginfo...
    (gdb) r
    Starting program: /usr/libexec/rocm/librocfft0-tests/rocfft-test
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". single epsilon: 3.75e-05 double epsilon: 1e-15
    Random seed: 4218654847
    rocFFT version: 1.0.21.
    [==========] Running 289668 tests from 43 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from manual
    [ RUN ] manual.vs_fftw
    Manual test:
    length: 8
    istride: 1
    idist: 8
    ostride: 1
    odist: 8
    batch: 1
    isize: 8
    osize: 8
    ioffset: 0 0
    ooffset: 0 0
    in-place
    transform_type: fft_transform_type_complex_forward
    fft_array_type_complex_interleaved -> fft_array_type_complex_interleaved
    single-precision
    ilength: 8
    olength: 8
    ibuffer_size: 64
    obuffer_size: 64

    Token: complex_forward_len_8_single_ip_batch_1_istride_1_CI_ostride_1_CI_idist_8_odist_8_ioffset_0_0_ooffset_0_0
    [New Thread 0x7ffff33576c0 (LWP 4429)]
    [New Thread 0x7ffff2b546c0 (LWP 4430)]
    [Thread 0x7ffff2b546c0 (LWP 4430) exited]
    [New Thread 0x7fffebbff6c0 (LWP 4431)]
    [New Thread 0x7fffeb3fe6c0 (LWP 4432)]
    [Detaching after vfork from child process 4433]
    [Thread 0x7fffeb3fe6c0 (LWP 4432) exited]
    [Thread 0x7fffebbff6c0 (LWP 4431) exited]
    [New Thread 0x7fffeb3fe6c0 (LWP 4434)]
    [New Thread 0x7fffebbff6c0 (LWP 4435)]
    [Thread 0x7fffeb3fe6c0 (LWP 4434) exited]
    [New Thread 0x7fffeb3fe6c0 (LWP 4436)]
    [Thread 0x7fffeb3fe6c0 (LWP 4436) exited]
    [Thread 0x7fffebbff6c0 (LWP 4435) exited]
    [ OK ] manual.vs_fftw (2682 ms)
    [----------] 1 test from manual (2682 ms total)

    [----------] 26 tests from rocfft_UnitTest
    [ RUN ] rocfft_UnitTest.default_load_callback_complex_single
    [New Thread 0x7fffeb3fe6c0 (LWP 4437)]
    [New Thread 0x7fffebbff6c0 (LWP 4438)]
    [Detaching after vfork from child process 4439]
    [Thread 0x7fffebbff6c0 (LWP 4438) exited]
    [Thread 0x7fffeb3fe6c0 (LWP 4437) exited]
    [New Thread 0x7fffebbff6c0 (LWP 4440)]
    [New Thread 0x7ffff03ff6c0 (LWP 4441)]
    [Thread 0x7fffebbff6c0 (LWP 4440) exited]

    Thread 1 "rocfft-test" received signal SIGSEGV, Segmentation fault. 0x00007ffff40d6208 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    (gdb) thread apply all bt

    Thread 12 (Thread 0x7ffff03ff6c0 (LWP 4441) "rocfft-test"):
    #0 __GI___ioctl (fd=fd@entry=3, request=request@entry=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
    #1 0x00007ffff349ee90 in kmtIoctl (fd=3, request=request@entry=3222817548, arg=arg@entry=0x7ffff03fddc0) at ./src/libhsakmt.c:13
    #2 0x00007ffff3497ddc in hsaKmtWaitOnMultipleEvents_Ext (event_age=0x7ffff03fdea8, Milliseconds=3, WaitOnAll=<optimized out>, NumEvents=<optimized out>, Events=0x7ffff03fde78) at ./src/events.c:407
    #3 hsaKmtWaitOnMultipleEvents_Ext (Events=0x7ffff03fde78, NumEvents=1, WaitOnAll=<optimized out>, Milliseconds=3, event_age=0x7ffff03fdea8) at ./src/events.c:378
    #4 0x00007ffff349854b in hsaKmtWaitOnEvent_Ext (Event=<optimized out>, Milliseconds=<optimized out>, event_age=<optimized out>) at ./src/events.c:226
    #5 0x00007ffff3537640 in rocr::core::InterruptSignal::WaitRelaxed (this=0x5555895d6f00, condition=HSA_SIGNAL_CONDITION_NE, compare_value=1, timeout=<optimized out>, wait_hint=HSA_WAIT_STATE_BLOCKED) at ./src/core/runtime/interrupt_signal.cpp:241
    #6 0x00007ffff353734e in rocr::core::InterruptSignal::WaitAcquire (this=<optimized out>, condition=<optimized out>, compare_value=<optimized out>, timeout=<optimized out>, wait_hint=<optimized out>) at ./src/core/runtime/interrupt_signal.cpp:249
    #7 0x00007ffff352beeb in rocr::HSA::hsa_signal_wait_scacquire (hsa_signal=..., condition=HSA_SIGNAL_CONDITION_NE, compare_value=1, timeout_hint=4000000, wait_state_hint=HSA_WAIT_STATE_BLOCKED) at ./src/core/runtime/hsa.cpp:1220
    #8 0x00007ffff41274bd in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #9 0x00007ffff3e4cab8 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #10 0x00007ffff40bfef6 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #11 0x00007ffff39e745c in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
    #12 0x00007ffff3a67bbc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    Thread 2 (Thread 0x7ffff33576c0 (LWP 4429) "rocfft-test"):
    #0 __GI___ioctl (fd=fd@entry=3, request=request@entry=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
    #1 0x00007ffff349ee90 in kmtIoctl (fd=3, request=request@entry=3222817548, arg=arg@entry=0x7ffff3355d40) at ./src/libhsakmt.c:13
    #2 0x00007ffff3497ddc in hsaKmtWaitOnMultipleEvents_Ext (event_age=0x7ffff3355df0, Milliseconds=4294967294, WaitOnAll=<optimized out>, NumEvents=<optimized out>, Events=0x7ffff3355e80) at ./src/events.c:407
    #3 hsaKmtWaitOnMultipleEvents_Ext (Events=0x7ffff3355e80, NumEvents=3, WaitOnAll=<optimized out>, Milliseconds=4294967294, event_age=0x7ffff3355df0) at ./src/events.c:378
    #4 0x00007ffff3556d29 in rocr::core::Signal::WaitAny (signal_count=signal_count@entry=10, hsa_signals=hsa_signals@entry=0x7fffec001180, conds=conds@entry=0x7fffec000dc0, values=values@entry=0x7fffec001210, timeout=timeout@entry=18446744073709551615,
    wait_hint=<optimized out>, wait_hint@entry=HSA_WAIT_STATE_BLOCKED, satisfying_value=<optimized out>) at ./src/core/runtime/signal.cpp:321
    #5 0x00007ffff3532e37 in rocr::AMD::hsa_amd_signal_wait_any (signal_count=10, hsa_signals=0x7fffec001180, conds=0x7fffec000dc0, values=0x7fffec001210, timeout_hint=timeout_hint@entry=18446744073709551615, wait_hint=wait_hint@entry=HSA_WAIT_STATE_BLOCKED,
    satisfying_value=0x7ffff3355fb8) at ./src/core/runtime/hsa_ext_amd.cpp:572
    #6 0x00007ffff354effa in rocr::core::Runtime::AsyncEventsLoop () at ./src/core/runtime/runtime.cpp:1125
    #7 0x00007ffff34f9dbb in rocr::os::ThreadTrampoline (arg=<optimized out>) at ./src/core/util/lnx/os_linux.cpp:80
    #8 0x00007ffff39e745c in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
    #9 0x00007ffff3a67bbc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    Thread 1 (Thread 0x7ffff33695c0 (LWP 4426) "rocfft-test"):
    #0 0x00007ffff40d6208 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #1 0x00007ffff40fcf5b in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #2 0x00007ffff410643b in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #3 0x00007ffff41067b3 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #4 0x00007ffff40c7c56 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #5 0x00007ffff3fd28f3 in ?? () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #6 0x00007ffff3fd34b2 in hipModuleLaunchKernel () from /lib/x86_64-linux-gnu/libamdhip64.so.5
    #7 0x00007ffff7d04a97 in RTCKernel::launch (this=0x7ffed4046940, kargs=..., gridDim=..., blockDim=..., lds_bytes=<optimized out>, stream=<optimized out>) at ./library/src/rtc_kernel.cpp:105
    #8 0x00007ffff7d047b4 in RTCKernel::launch (this=0x7ffed4046940, data=...) at ./library/src/rtc_kernel.cpp:76
    #9 0x00007ffff7a6406d in TransformPowX (execPlan=..., in_buffer=0x7fffffffcdc8, out_buffer=0x7fffffffcdb0, info=0x7fffffffcd00) at ./library/src/powX.cpp:592
    #10 0x00007ffff7a60001 in rocfft_execute (plan=0x555581cf67e0, in_buffer=0x555581cf67e0, out_buffer=<optimized out>, info=<optimized out>) at ./library/src/transform.cpp:150
    #11 0x000055555561ebea in Test_Callback::forward_transform<HIP_vector_type<float, 2u>, HIP_vector_type<float, 2u> > (this=0x7fffffffcf80, apply_callback=true, host_mem_in=std::vector of length 256, capacity 256 = {...}, host_mem_out=std::vector of length
    256, capacity 256 = {...}) at ./clients/tests/default_callbacks_test.cpp:280 #12 0x000055555561d1d9 in Test_Callback::run<HIP_vector_type<float, 2u>, HIP_vector_type<float, 2u>, float> (this=0x7fffffffcf80, low_bound=<error reading variable: That operation is not available on integers of more than 8 bytes.>, up_bound=<error
    reading variable: That operation is not available on integers of more than 8 bytes.>, host_mem_in=std::vector of length 256, capacity 256 = {...}, host_mem_out=std::vector of length 256, capacity 256 = {...}, host_mem_out_no_cb=std::vector of length 0,
    capacity 0) at ./clients/tests/default_callbacks_test.cpp:182
    #13 0x000055555561cdd4 in Test_Callback::Test_Callback (this=0x0, _N=256, _dim=1, _frwd_transf_type=rocfft_transform_type_complex_forward, _frwd_transf_precision=rocfft_precision_single, _cb_type=DefaultCallbackType::STORE, _seed=1) at ./clients/tests/
    default_callbacks_test.cpp:124
    #14 0x000055555561c7f9 in rocfft_UnitTest_default_load_callback_complex_single_Test::TestBody (this=<optimized out>) at ./clients/tests/default_callbacks_test.cpp:402
    #15 0x00005555556921e7 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
    #16 0x0000555555679fde in testing::Test::Run() ()
    #17 0x000055555567a195 in testing::TestInfo::Run() ()
    #18 0x000055555567a37f in testing::TestSuite::Run() ()
    #19 0x000055555568799c in testing::internal::UnitTestImpl::RunAllTests() ()
    #20 0x0000555555692877 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ()
    #21 0x000055555567a574 in testing::UnitTest::Run() ()
    #22 0x000055555558a0ec in RUN_ALL_TESTS () at /usr/include/gtest/gtest.h:2317 #23 main (argc=<optimized out>, argv=<optimized out>) at ./clients/tests/gtest_main.cpp:412

    -- System Information:
    Debian Release: trixie/sid
    APT prefers unstable
    APT policy: (500, 'unstable')
    Architecture: amd64 (x86_64)

    Kernel: Linux 6.6.15-amd64 (SMP w/32 CPU threads; PREEMPT)
    Locale: LANG=C, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
    Shell: /bin/sh linked to /usr/bin/dash
    Init: unable to detect

    Versions of packages libamd-comgr2 depends on:
    ii libc6 2.37-15
    ii libgcc-s1 14-20240201-3
    ii libllvm17 1:17.0.6-5
    ii libstdc++6 14-20240201-3
    ii libzstd1 1.5.5+dfsg2-2
    ii zlib1g 1:1.3.dfsg-3+b1

    libamd-comgr2 recommends no packages.

    libamd-comgr2 suggests no packages.

    -- no debconf information

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)