I tested Debian's most recent m68k kernels from the 6.0.x and 6.1.x series and
neither of these boot on my Amiga 4000/060. Both get stuck at the ABCDGHIJK message.
FWIW, I noticed that the kernel image itself is already over 7 MB, not sure whether this is a problem.
Anyone else tried a recent kernel on their Amigas?
Looks surprisingly similar to the issue reported by Stan.
Do the mitigations given in https://lore.kernel.org/all/CAMuHMdUtkr2zvZiJfLXvs9d_inJbktSNqQQfO1oxnJHZeoYcHg@mail.gmail.com
help?
FWIW, I noticed that the kernel image itself is already over 7 MB, not sure whether this is a problem.
Depends on how much RAM you have ;-)
Anyone else tried a recent kernel on their Amigas?
I really should start booting on real Amiga hardware again...
a1 is just before the end of your RAM chunk. If that's a longword
access, you'd fall over the edge :) Can you disassemble the code snippet
(or memcmp()) so we can see what's happening?
I do recall recent changes to the mm code, but that was for NOMMU. I
wonder whether there was anything else that would introduce an implicit assumption about memory starting at 0x0 ...
Hi Geert!
On Tue, 2023-02-21 at 15:55 +0100, Geert Uytterhoeven wrote:
Looks surprisingly similar to the issue reported by Stan.The kernel actually crashes with a backtrace:
Do the mitigations given in
https://lore.kernel.org/all/CAMuHMdUtkr2zvZiJfLXvs9d_inJbktSNqQQfO1oxnJHZeoYcHg@mail.gmail.com
help?
ABCDGHIJK
[ 0.000000] Linux version 6.0.0-6-m68k (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for
Debian) 2.39) #1 Debian 6.0.12-1 (2022-12-09)
[ 0.000000] Enabling workaround for errata I14
[ 0.000000] printk: bootconsole [debug0] enabled
[ 0.000000] Amiga hardware found: [A4000] VIDEO BLITTER AUDIO FLOPPY A4000_IDE KEYBOARD MOUSE SERIAL PARALLEL A3000_CLK CHIP_RAM PAULA
LISA ALICE_PAL ZORRO3
[ 0.000000] initrd: 0ef0602c - 0f800000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000008000000-0x000000f7ffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Unable to handle kernel access at virtual address (ptrval)
[ 0.000000] Oops: 00000000
[ 0.000000] Modules linked in:
[ 0.000000] PC: [<00201d3c>] memcmp+0x28/0x56
[ 0.000000] SR: 2709 SP: (ptrval) a2: 004a5580
[ 0.000000] d0: 00000003 d1: 00000001 d2: 00201d14 d3: 00000272
[ 0.000000] d4: 00012750 d5: 08023ec0 a0: 0000000c a1: 0f7ffff4
[ 0.000000] Process swapper (pid: 0, task=(ptrval))
[ 0.000000] Frame format=4 fault addr=0f7ffff4 fslw=01051000
[ 0.000000] Stack from 004a3fac:
[ 0.000000] 00201d14 00000272 00374e40 0f7ffff4 0f800000 00534b22 0f7ffff4 0042e325
[ 0.000000] 0000000c 0055c000 00000272 00012750 08023ec0 00012750 080dbf48 08001000
[ 0.000000] 08001000 0f7ffff0 00553d9a 00000000 00533872
[ 0.000000] Call Trace: [<00201d14>] memcmp+0x0/0x56
[ 0.000000] [<00374e40>] _printk+0x0/0x18
[ 0.000000] [<00534b22>] start_kernel+0x8a/0x5d6
[ 0.000000] [<00012750>] LOGTBL+0x228/0x800
[ 0.000000] [<00012750>] LOGTBL+0x228/0x800
[ 0.000000] [<00533872>] _sinittext+0x872/0x11f8
[ 0.000000]
[ 0.000000] Code: b288 661e 4280 6030 2a49 284b 264c 224d <bb8c> 66ea 5988 7003 b088 65f0 224d 264c 60dc 4283 1631 1800 4282 1433 1800
2003
[ 0.000000] Disabling lock debugging due to kernel taint
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
128 MB.FWIW, I noticed that the kernel image itself is already over 7 MB, not sure >>> whether this is a problem.Depends on how much RAM you have ;-)
You should ;-).Anyone else tried a recent kernel on their Amigas?I really should start booting on real Amiga hardware again...
Adrian
Hi Michael!
On Wed, 2023-02-22 at 10:09 +1300, Michael Schmitz wrote:
a1 is just before the end of your RAM chunk. If that's a longword
I'm sure the compressed kernel image is stripped but includes the kernelaccess, you'd fall over the edge :) Can you disassemble the code snippetHere you go:
(or memcmp()) so we can see what's happening?
00201d14 <memcmp>:
201d14: 48e7 301c moveml %d2-%d3/%a3-%a5,%sp@-
201d18: 226f 0018 moveal %sp@(24),%a1
201d1c: 266f 001c moveal %sp@(28),%a3
201d20: 206f 0020 moveal %sp@(32),%a0
201d24: 7003 moveq #3,%d0
201d26: b088 cmpl %a0,%d0
201d28: 650a bcss 201d34 <memcmp+0x20>
201d2a: 4281 clrl %d1
201d2c: b288 cmpl %a0,%d1
201d2e: 661e bnes 201d4e <memcmp+0x3a>
201d30: 4280 clrl %d0
201d32: 6030 bras 201d64 <memcmp+0x50>
201d34: 2a49 moveal %a1,%a5 <======= 0x0f7ffff4
201d36: 284b moveal %a3,%a4
201d38: 264c moveal %a4,%a3
201d3a: 224d moveal %a5,%a1
201d3c: bb8c cmpml %a4@+,%a5@+ <======= a5 will be 0x0f800000 after post-increment
201d3e: 66ea bnes 201d2a <memcmp+0x16>
201d40: 5988 subql #4,%a0
201d42: 7003 moveq #3,%d0
201d44: b088 cmpl %a0,%d0
201d46: 65f0 bcss 201d38 <memcmp+0x24>
201d48: 224d moveal %a5,%a1
201d4a: 264c moveal %a4,%a3
201d4c: 60dc bras 201d2a <memcmp+0x16>
201d4e: 4283 clrl %d3
201d50: 1631 1800 moveb %a1@(0,%d1:l),%d3
201d54: 4282 clrl %d2
201d56: 1433 1800 moveb %a3@(0,%d1:l),%d2
201d5a: 2003 movel %d3,%d0
201d5c: 9082 subl %d2,%d0
201d5e: 5281 addql #1,%d1
201d60: b483 cmpl %d3,%d2
201d62: 67c8 beqs 201d2c <memcmp+0x18>
201d64: 4cdf 380c moveml %sp@+,%d2-%d3/%a3-%a5
201d68: 4e75 rts
The kernel image is actually unstripped. Is there a config option for that?
Do we want to keep symbols in a non-debug kernel?
I do recall recent changes to the mm code, but that was for NOMMU. ISounds like a possible culprit.
wonder whether there was anything else that would introduce an implicit
assumption about memory starting at 0x0 ...
Adrian
Hi Adrian,
On 22/02/23 10:46, John Paul Adrian Glaubitz wrote:
Hi Michael!
On Wed, 2023-02-22 at 10:09 +1300, Michael Schmitz wrote:
a1 is just before the end of your RAM chunk. If that's a longword
Actually it isn't that close - if I read the stack correctly, we're
comparing 0xc bytes from 0x0f7ffff4 which is to 0x0f7ffffff.
The post-increment of a5 to 0x0f800000 might cause a pre-fetch beyond
end of memory - how does that get handled?
I'm sure the compressed kernel image is stripped but includes theaccess, you'd fall over the edge :) Can you disassemble the codeHere you go:
snippet
(or memcmp()) so we can see what's happening?
00201d14 <memcmp>:
201d14: 48e7 301c moveml %d2-%d3/%a3-%a5,%sp@- >> 201d18: 226f 0018 moveal %sp@(24),%a1
201d1c: 266f 001c moveal %sp@(28),%a3
201d20: 206f 0020 moveal %sp@(32),%a0
201d24: 7003 moveq #3,%d0
201d26: b088 cmpl %a0,%d0
201d28: 650a bcss 201d34 <memcmp+0x20>
201d2a: 4281 clrl %d1
201d2c: b288 cmpl %a0,%d1
201d2e: 661e bnes 201d4e <memcmp+0x3a>
201d30: 4280 clrl %d0
201d32: 6030 bras 201d64 <memcmp+0x50>
201d34: 2a49 moveal %a1,%a5 <======= 0x0f7ffff4
201d36: 284b moveal %a3,%a4
201d38: 264c moveal %a4,%a3
201d3a: 224d moveal %a5,%a1
201d3c: bb8c cmpml %a4@+,%a5@+ <======= a5 will
be 0x0f800000 after post-increment
201d3e: 66ea bnes 201d2a <memcmp+0x16>
201d40: 5988 subql #4,%a0
201d42: 7003 moveq #3,%d0
201d44: b088 cmpl %a0,%d0
201d46: 65f0 bcss 201d38 <memcmp+0x24>
201d48: 224d moveal %a5,%a1
201d4a: 264c moveal %a4,%a3
201d4c: 60dc bras 201d2a <memcmp+0x16>
201d4e: 4283 clrl %d3
201d50: 1631 1800 moveb %a1@(0,%d1:l),%d3
201d54: 4282 clrl %d2
201d56: 1433 1800 moveb %a3@(0,%d1:l),%d2
201d5a: 2003 movel %d3,%d0
201d5c: 9082 subl %d2,%d0
201d5e: 5281 addql #1,%d1
201d60: b483 cmpl %d3,%d2
201d62: 67c8 beqs 201d2c <memcmp+0x18>
201d64: 4cdf 380c moveml %sp@+,%d2-%d3/%a3-%a5 >> 201d68: 4e75 rts
The kernel image is actually unstripped. Is there a config option for
that?
kernel symbol table (see below). The symbol table is definitely good
to have (otherwise you'd have to figure what all the addresses on the
stack mean from a separate symbol table).
Do we want to keep symbols in a non-debug kernel?
Definitely ...
Cheers,
Michael
Output of objdump -h:
vmlinux-6.2.0-rc8-atari-fpuemu-atafbfix+: file format elf32-m68k
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0030169c 00001000 00001000 00001000 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE
1 __ex_table 00001ab0 003026a0 003026a0 003026a0 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .rodata 000c81e8 00305000 00305000 00305000 2**4 CONTENTS, ALLOC, LOAD, DATA
3 __ksymtab 00009a14 003cd1e8 003cd1e8 003cd1e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
4 __ksymtab_gpl 000057c0 003d6bfc 003d6bfc 003d6bfc 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
5 __ksymtab_strings 000166a3 003dc3bc 003dc3bc 003dc3bc 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA
6 __param 000006cc 003f2a60 003f2a60 003f2a60 2**1 CONTENTS, ALLOC, LOAD, READONLY, DATA
7 __modver 00000088 003f312c 003f312c 003f312c 2**1 CONTENTS, ALLOC, LOAD, DATA
8 .notes 00000054 003f31b4 003f31b4 003f31b4 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .data 00051a20 003f4000 003f4000 003f4000 2**4 CONTENTS, ALLOC, LOAD, DATA
10 .bss 0002266c 00445a20 00445a20 00445a20 2**4 ALLOC
11 .init.text 00017be0 00469000 00469000 00447000 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .init.data 00004c1c 00480be0 00480be0 0045ebe0 2**2 CONTENTS, ALLOC, LOAD, DATA
13 .m68k_fixup 00000480 004857fc 004857fc 004637fc 2**0 CONTENTS, ALLOC, LOAD, DATA
14 .init_end 00000384 00485c7c 00485c7c 00463c7c 2**0 ALLOC
15 .comment 0000002d 00000000 00000000 00463c7c 2**0 CONTENTS, READONLY
I do recall recent changes to the mm code, but that was for NOMMU. ISounds like a possible culprit.
wonder whether there was anything else that would introduce an implicit
assumption about memory starting at 0x0 ...
Adrian
Will try earlier kernels until I found the one where the breakage was introduced. Currently known latest kernel to work is 5.10.5.
FYI:
Just caught this trying a re-compile of kernel 5.15.2 from kernel org,
under debbootstrap/sbuild and qemu-system-m68k both produce this issue:
CC mm/process_vm_access.o
CC mm/page_alloc.o
mm/page_alloc.c: In function ‘mem_init_print_info’: mm/page_alloc.c:8163:27: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&__init_begin[0] <= &_sinittext[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&_sinittext[0] < &__init_end[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_stext[0] <= &_sinittext[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&_sinittext[0] < &_etext[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_sdata[0] <= &__init_begin[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&__init_begin[0] < &_edata[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_stext[0] <= &__start_rodata[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&__start_rodata[0] < &_etext[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_sdata[0] <= &__start_rodata[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays [-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&__start_rodata[0] < &_edata[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize);
| ^~~~~~~~~~~~~
CC mm/init-mm.o
CC mm/memblock.o
the only commits to hit arch/m68k/mm between 5.15 and now are:
29f28f8b826d m68k: fix livelock in uaccess
6d0b92254510 m68k/mm: enable ARCH_HAS_VM_GET_PAGE_PROT
d92725256b4f mm: avoid unnecessary page fault retires on shared memory types f95a387cdeb3 m68k: coldfire: drop ISA_DMA_API support
05d51e42df06 m68k: Introduce a virtual m68k machine
c4d5b6eef258 m68k: mm: Remove check for VM_IO to fix deferred I/O 36ef159f4408 mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit 0e25498f8cd4 exit: Add and use make_task_dead.
376e3fdecb0d m68k: Enable memtest functionality
952eea9b01e4 memblock: allow to specify flags with memblock_add_node()
The first is a fix for the second so these should be tested together.
None appear suspect to me.
Running memtest could incur a boot delay but AFAIR that isn't enabled by default, and it isn't implicated in the panic log Adrian posted.
Hi Stephen,
that's apparently been corrected in later versions. Commit ca831f29f8f25c97182e726429b38c0802200c8f (in from 5.17).
I doubt this would lead to different code generated.
Which was the first broken version you tried? That would narrow down the search range considerably...
Cheers,
Michael
Am 24.02.2023 um 14:09 schrieb Stephen Walsh:
FYI:
Just caught this trying a re-compile of kernel 5.15.2 from kernel org,
under debbootstrap/sbuild and qemu-system-m68k both produce this issue:
CC mm/process_vm_access.o
CC mm/page_alloc.o
mm/page_alloc.c: In function ‘mem_init_print_info’:
mm/page_alloc.c:8163:27: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&__init_begin[0] <=
&_sinittext[0]’ to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&_sinittext[0] < &__init_end[0]’
to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8167:9: note: in expansion of macro ‘adj_init_size’
8167 | adj_init_size(__init_begin, __init_end, init_data_size,
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext,
init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_stext[0] <= &_sinittext[0]’ to
compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext,
init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext,
init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&_sinittext[0] < &_etext[0]’ to
compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8169:9: note: in expansion of macro ‘adj_init_size’
8169 | adj_init_size(_stext, _etext, codesize, _sinittext,
init_code_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin,
init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_sdata[0] <= &__init_begin[0]’ to >> compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin,
init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin,
init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&__init_begin[0] < &_edata[0]’ to
compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8170:9: note: in expansion of macro ‘adj_init_size’
8170 | adj_init_size(_sdata, _edata, datasize, __init_begin,
init_data_size);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_stext[0] <= &__start_rodata[0]’
to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&__start_rodata[0] < &_etext[0]’
to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8171:9: note: in expansion of macro ‘adj_init_size’
8171 | adj_init_size(_stext, _etext, codesize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:27: note: use ‘&_sdata[0] <= &__start_rodata[0]’
to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^~
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: warning: comparison between two arrays
[-Warray-compare]
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
mm/page_alloc.c:8163:41: note: use ‘&__start_rodata[0] < &_edata[0]’
to compare the addresses
8163 | if (start <= pos && pos < end && size > adj) \
| ^
mm/page_alloc.c:8172:9: note: in expansion of macro ‘adj_init_size’
8172 | adj_init_size(_sdata, _edata, datasize,
__start_rodata, rosize);
| ^~~~~~~~~~~~~
CC mm/init-mm.o
CC mm/memblock.o
Hi Michael!
On Sat, 2023-02-25 at 08:39 +1300, Michael Schmitz wrote:
the only commits to hit arch/m68k/mm between 5.15 and now are:
29f28f8b826d m68k: fix livelock in uaccess
6d0b92254510 m68k/mm: enable ARCH_HAS_VM_GET_PAGE_PROT
d92725256b4f mm: avoid unnecessary page fault retires on shared memory types >> f95a387cdeb3 m68k: coldfire: drop ISA_DMA_API support
05d51e42df06 m68k: Introduce a virtual m68k machine
c4d5b6eef258 m68k: mm: Remove check for VM_IO to fix deferred I/O
36ef159f4408 mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
0e25498f8cd4 exit: Add and use make_task_dead.
376e3fdecb0d m68k: Enable memtest functionality
952eea9b01e4 memblock: allow to specify flags with memblock_add_node()
The first is a fix for the second so these should be tested together.
None appear suspect to me.
Running memtest could incur a boot delay but AFAIR that isn't enabled by
default, and it isn't implicated in the panic log Adrian posted.
I don't have time this weekend to bisect the issue. But I think, I can start bisecting it on Sunday evening. I will give it a try on Amiga Forever.
Adrian
On Tue, 2023-02-21 at 15:55 +0100, Geert Uytterhoeven wrote:
Looks surprisingly similar to the issue reported by Stan.
Do the mitigations given in https://lore.kernel.org/all/CAMuHMdUtkr2zvZiJfLXvs9d_inJbktSNqQQfO1oxnJHZeoYcHg@mail.gmail.com
help?
The kernel actually crashes with a backtrace:
ABCDGHIJK
[ 0.000000] Linux version 6.0.0-6-m68k (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for
Debian) 2.39) #1 Debian 6.0.12-1 (2022-12-09)
[ 0.000000] Enabling workaround for errata I14
[ 0.000000] printk: bootconsole [debug0] enabled
[ 0.000000] Amiga hardware found: [A4000] VIDEO BLITTER AUDIO FLOPPY A4000_IDE KEYBOARD MOUSE SERIAL PARALLEL A3000_CLK CHIP_RAM PAULA
LISA ALICE_PAL ZORRO3
[ 0.000000] initrd: 0ef0602c - 0f800000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000008000000-0x000000f7ffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Unable to handle kernel access at virtual address (ptrval)
that's apparently been corrected in later versions. Commit ca831f29f8f25c97182e726429b38c0802200c8f (in from 5.17).
I doubt this would lead to different code generated.
Which was the first broken version you tried? That would narrow down
the search range considerably...
[ 0.000000] Linux version 5.17.0-1-m68k (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.2.0-20) 11.2.0, GNU ld (GNU Binutils for Debian) 2.38) #1 Debian 5.17.3-1 (2022-04-18)
[ 0.000000] Linux version 5.18.0-3-m68k (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-4) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 Debian 5.18.14-1 (2022-07-23)
On Tue, Feb 21, 2023 at 4:53 PM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
On Tue, 2023-02-21 at 15:55 +0100, Geert Uytterhoeven wrote:
Looks surprisingly similar to the issue reported by Stan.
Do the mitigations given in https://lore.kernel.org/all/CAMuHMdUtkr2zvZiJfLXvs9d_inJbktSNqQQfO1oxnJHZeoYcHg@mail.gmail.com
help?
The kernel actually crashes with a backtrace:
ABCDGHIJK
[ 0.000000] Linux version 6.0.0-6-m68k (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for
Debian) 2.39) #1 Debian 6.0.12-1 (2022-12-09)
[ 0.000000] Enabling workaround for errata I14
[ 0.000000] printk: bootconsole [debug0] enabled
[ 0.000000] Amiga hardware found: [A4000] VIDEO BLITTER AUDIO FLOPPY A4000_IDE KEYBOARD MOUSE SERIAL PARALLEL A3000_CLK CHIP_RAM PAULA
LISA ALICE_PAL ZORRO3
[ 0.000000] initrd: 0ef0602c - 0f800000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000008000000-0x000000f7ffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Unable to handle kernel access at virtual address (ptrval)
I see the same issue on my A4000, bisecting...
On Sun, Feb 26, 2023 at 12:02 PM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
On Tue, Feb 21, 2023 at 4:53 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
On Tue, 2023-02-21 at 15:55 +0100, Geert Uytterhoeven wrote:I see the same issue on my A4000, bisecting...
Looks surprisingly similar to the issue reported by Stan.
Do the mitigations given in
https://lore.kernel.org/all/CAMuHMdUtkr2zvZiJfLXvs9d_inJbktSNqQQfO1oxnJHZeoYcHg@mail.gmail.com
help?
The kernel actually crashes with a backtrace:
ABCDGHIJK
[ 0.000000] Linux version 6.0.0-6-m68k (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for
Debian) 2.39) #1 Debian 6.0.12-1 (2022-12-09)
[ 0.000000] Enabling workaround for errata I14
[ 0.000000] printk: bootconsole [debug0] enabled
[ 0.000000] Amiga hardware found: [A4000] VIDEO BLITTER AUDIO FLOPPY A4000_IDE KEYBOARD MOUSE SERIAL PARALLEL A3000_CLK CHIP_RAM PAULA
LISA ALICE_PAL ZORRO3
[ 0.000000] initrd: 0ef0602c - 0f800000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000008000000-0x000000f7ffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Unable to handle kernel access at virtual address (ptrval) >>
Bisected to commit 376e3fdecb0dcae2 ("m68k: Enable memtest
functionality") in v5.17-rc1. Reverting that on top of latest fixes the issue.
Gr{oetje,eeting}s,
Geert
On Sun, Feb 26, 2023 at 12:02 PM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
On Tue, Feb 21, 2023 at 4:53 PM John Paul Adrian Glaubitz
<glaubitz@physik.fu-berlin.de> wrote:
On Tue, 2023-02-21 at 15:55 +0100, Geert Uytterhoeven wrote:I see the same issue on my A4000, bisecting...
Looks surprisingly similar to the issue reported by Stan.
Do the mitigations given in
https://lore.kernel.org/all/CAMuHMdUtkr2zvZiJfLXvs9d_inJbktSNqQQfO1oxnJHZeoYcHg@mail.gmail.com
help?
The kernel actually crashes with a backtrace:
ABCDGHIJK
[ 0.000000] Linux version 6.0.0-6-m68k (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for
Debian) 2.39) #1 Debian 6.0.12-1 (2022-12-09)
[ 0.000000] Enabling workaround for errata I14
[ 0.000000] printk: bootconsole [debug0] enabled
[ 0.000000] Amiga hardware found: [A4000] VIDEO BLITTER AUDIO FLOPPY A4000_IDE KEYBOARD MOUSE SERIAL PARALLEL A3000_CLK CHIP_RAM PAULA
LISA ALICE_PAL ZORRO3
[ 0.000000] initrd: 0ef0602c - 0f800000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000008000000-0x000000f7ffffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000008000000-0x000000000f7fffff]
[ 0.000000] Unable to handle kernel access at virtual address (ptrval) >>
Bisected to commit 376e3fdecb0dcae2 ("m68k: Enable memtest
functionality") in v5.17-rc1. Reverting that on top of latest fixes the
issue.
Gr{oetje,eeting}s,
Geert
Bisected to commit 376e3fdecb0dcae2 ("m68k: Enable memtest
functionality") in v5.17-rc1. Reverting that on top of latest fixes
the issue.
Yes, I'm sorry to say that was the only likely candidate. Can't see why though - are Macs all configured to have RAM start at address zero, and possibly contiguous, Finn?
I wonder whether Finn's memtest patch merely exposed another MM bug
On Mon, 27 Feb 2023, Michael Schmitz wrote:
Bisected to commit 376e3fdecb0dcae2 ("m68k: Enable memtest
functionality") in v5.17-rc1. Reverting that on top of latest fixes
the issue.
Yes, I'm sorry to say that was the only likely candidate. Can't see why
though - are Macs all configured to have RAM start at address zero, and
possibly contiguous, Finn?
I don't really understand your question. This was not a Mac patch. The
issue seems to be about the locations initrd_start and initrd_end in
relation to the various memory segments (?)
This seems to be the same bug that was raised about 6 months ago... I had thought it was a bootloader bug but I'm out of my depth here.
https://lists.debian.org/debian-68k/2022/09/msg00047.html https://lists.debian.org/debian-68k/2022/09/msg00051.html https://lists.debian.org/debian-68k/2022/09/msg00055.html
On Mon, 27 Feb 2023, Michael Schmitz wrote:
I wonder whether Finn's memtest patch merely exposed another MM bug
A kernel patch may be easier than a bootloader patch (even if this is a bootloader bug) particularly if it affects multiple platforms.
A partial revert of my patch (below) will probably avoid the issue, but
with the side effect that use of memtest will clobber the initrd.
The initrd and memtest features aren't usually needed together. At the
time when I needed the memtest feature I did not have confidence in the hardeare. An initrd wasn't very useful at that point.
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c
index 3a2bb2e8fdad..92f1b9268dff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -326,6 +326,8 @@ void __init setup_arch(char **cmdline_p)
panic("No configuration setup");
}
+ paging_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
@@ -335,8 +337,6 @@ void __init setup_arch(char **cmdline_p)
}
#endif
- paging_init();
-
#ifdef CONFIG_NATFEAT
nf_init();
#endif
On Mon, 27 Feb 2023, Michael Schmitz wrote:
I wonder whether Finn's memtest patch merely exposed another MM bug
A kernel patch may be easier than a bootloader patch (even if this is a bootloader bug) particularly if it affects multiple platforms.
A partial revert of my patch (below) will probably avoid the issue, but
with the side effect that use of memtest will clobber the initrd.
The initrd and memtest features aren't usually needed together. At the
time when I needed the memtest feature I did not have confidence in the hardeare. An initrd wasn't very useful at that point.
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c
index 3a2bb2e8fdad..92f1b9268dff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -326,6 +326,8 @@ void __init setup_arch(char **cmdline_p)
panic("No configuration setup");
}
+ paging_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
@@ -335,8 +337,6 @@ void __init setup_arch(char **cmdline_p)
}
#endif
- paging_init();
-
#ifdef CONFIG_NATFEAT
nf_init();
#endif
Hi Finn,
FTR, here is the diff of the dmesg between good and bad:
+initrd: 07f61166 - 08000000
This is wrong (note the 6 trailing zeros), as phys_to_virt() is not
working correctly yet (module_fixup() is called from paging_init()).
Zone ranges:
DMA [mem 0x0000000007400000-0x0000007fffffffff]
Normal empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000007400000-0x0000000007ffffff]
Initmem setup node 0 [mem 0x0000000007400000-0x0000000007ffffff]
-initrd: 00b61166 - 00c00000
This is correct (note the 5 trailing zeros).
-pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
-pcpu-alloc: [0] 0
[...]
+Unable to handle kernel access at virtual address (ptrval)
+Oops: 00000000
+Modules linked in:
+PC: [<002c11be>] memcmp+0x2c/0x5c
+SR: 2700 SP: (ptrval) a2: 003bd560
+d0: 0035eb83 d1: 07fffff8 d2: 002c1192 d3: 000000e6
+d4: 000684e8 d5: 00447000 a0: 0000000c a1: 07fffff4
+Process swapper (pid: 0, task=(ptrval))
+Frame format=7 eff addr=003bbfbc ssw=0505 faddr=07fffff4
+wb 1 stat/addr/data: 0005 00447000 07401000
+wb 2 stat/addr/data: 0005 000000e6 000684e8
+wb 3 stat/addr/data: 0005 003bbfb4 002c1192
+push data: 07401000 002c7d82 07401000 074a2cf4
+Stack from 003bbfb4:
+002c1192 000000e6 002c7d82 00428eda 07fffff4 0035eb7f 0000000c 00447000
+000000e6 000684e8 00447000 07401000 074bec08 07401000 074a2cf4 07fffff0
+00440406 00000000 00428322
+Call Trace: [<002c1192>] memcmp+0x0/0x5c
+[<002c7d82>] _printk+0x0/0x18
+[<00428eda>] start_kernel+0x80/0x5b0
+[<000684e8>] pcpu_alloc+0x88/0x3b4
+[<00428322>] _sinittext+0x322/0x9b0
On Mon, Feb 27, 2023 at 7:30 AM Finn Thain <fthain@linux-m68k.org> wrote:
On Mon, 27 Feb 2023, Michael Schmitz wrote:
I wonder whether Finn's memtest patch merely exposed another MM bug
A kernel patch may be easier than a bootloader patch (even if this is a
bootloader bug) particularly if it affects multiple platforms.
A partial revert of my patch (below) will probably avoid the issue, but
with the side effect that use of memtest will clobber the initrd.
Which we can avoid, by moving the ramdisk handling inside paging_init().
The initrd and memtest features aren't usually needed together. At the
time when I needed the memtest feature I did not have confidence in the
hardeare. An initrd wasn't very useful at that point.
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c
index 3a2bb2e8fdad..92f1b9268dff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -326,6 +326,8 @@ void __init setup_arch(char **cmdline_p)
panic("No configuration setup");
}
+ paging_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
Presumably something in memblock_reserve() relies on having
called paging_init() before?
I'll do some more debugging later today...
@@ -335,8 +337,6 @@ void __init setup_arch(char **cmdline_p)
}
#endif
- paging_init();
-
#ifdef CONFIG_NATFEAT
nf_init();
#endif
Am 27.02.2023 um 18:55 schrieb Finn Thain:
On Mon, 27 Feb 2023, Michael Schmitz wrote:
Bisected to commit 376e3fdecb0dcae2 ("m68k: Enable memtest
functionality") in v5.17-rc1. Reverting that on top of latest fixes
the issue.
Yes, I'm sorry to say that was the only likely candidate. Can't see why
though - are Macs all configured to have RAM start at address zero, and
possibly contiguous, Finn?
I don't really understand your question. This was not a Mac patch. The
issue seems to be about the locations initrd_start and initrd_end in
relation to the various memory segments (?)
I didn't realize that - thanks for pointing this out.
This seems to be the same bug that was raised about 6 months ago... I had
thought it was a bootloader bug but I'm out of my depth here.
https://lists.debian.org/debian-68k/2022/09/msg00047.html
https://lists.debian.org/debian-68k/2022/09/msg00051.html
https://lists.debian.org/debian-68k/2022/09/msg00055.html
I had forgotten all about that one... Thanks for jogging my memory!
In this case though, the bug happens when the ramdisk is loaded in the
lowest address memory chunk, at least at a lower address than the one
the kernel runs from.
Only working config was Linux being loaded to ST-RAM, TT-RAM beinggiven only after that in bootinfo, and initrd ramdisk after kernel.
The crashes in the above thread were all from boots where the initrd got loaded at the end of the memory chunk the kernel runs from.
Time to try using copy_from_kernel_nofault() to copy the ramdisk into
its final location? (just kidding)
Hi,
On 27.2.2023 9.19, Michael Schmitz wrote:
Am 27.02.2023 um 18:55 schrieb Finn Thain:
On Mon, 27 Feb 2023, Michael Schmitz wrote:
Bisected to commit 376e3fdecb0dcae2 ("m68k: Enable memtest
functionality") in v5.17-rc1. Reverting that on top of latest fixes >>>>> the issue.
Yes, I'm sorry to say that was the only likely candidate. Can't see why >>>> though - are Macs all configured to have RAM start at address zero, and >>>> possibly contiguous, Finn?
I don't really understand your question. This was not a Mac patch. The
issue seems to be about the locations initrd_start and initrd_end in
relation to the various memory segments (?)
I didn't realize that - thanks for pointing this out.
This seems to be the same bug that was raised about 6 months ago... I
had
thought it was a bootloader bug but I'm out of my depth here.
https://lists.debian.org/debian-68k/2022/09/msg00047.html
https://lists.debian.org/debian-68k/2022/09/msg00051.html
https://lists.debian.org/debian-68k/2022/09/msg00055.html
I had forgotten all about that one... Thanks for jogging my memory!
In this case though, the bug happens when the ramdisk is loaded in the
lowest address memory chunk, at least at a lower address than the one
the kernel runs from.
I'm wondering whether this old Atari side boot issue is related at all...
When adding Linux bootinfo support to Hatari emulator (from Aranym
emulator) few years ago, I noticed that:
"Linux barfs at ST-RAM memory range given after TT-RAM. However, if
kernel is loaded to TT-RAM and ST-RAM range is given before TT-RAM
range, kernel crashes."
Only working config was Linux being loaded to ST-RAM, TT-RAM beinggiven only after that in bootinfo, and initrd ramdisk after kernel.
Based on mails in archive, this seemed to have been a known Linux/Atari
issue already in 2013.
The crashes in the above thread were all from boots where the initrd
got loaded at the end of the memory chunk the kernel runs from.
Time to try using copy_from_kernel_nofault() to copy the ramdisk into
its final location? (just kidding)
- Eero
PS. For people familiar only with Amiga terminology, ST-RAM = chip RAM, TT-RAM = fast RAM.
Hi Geert,
adding Mike Rapoport to the recipient list who would know whether memblock_reserve() relies on paging_init() having run.
Cheers,
Michael
Am 27.02.2023 um 21:26 schrieb Geert Uytterhoeven:
Hi Finn,
FTR, here is the diff of the dmesg between good and bad:
+initrd: 07f61166 - 08000000
This is wrong (note the 6 trailing zeros), as phys_to_virt() is not
working correctly yet (module_fixup() is called from paging_init()).
Zone ranges:
DMA [mem 0x0000000007400000-0x0000007fffffffff]
Normal empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000007400000-0x0000000007ffffff]
Initmem setup node 0 [mem 0x0000000007400000-0x0000000007ffffff]
-initrd: 00b61166 - 00c00000
This is correct (note the 5 trailing zeros).
-pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
-pcpu-alloc: [0] 0
[...]
+Unable to handle kernel access at virtual address (ptrval)
+Oops: 00000000
+Modules linked in:
+PC: [<002c11be>] memcmp+0x2c/0x5c
+SR: 2700 SP: (ptrval) a2: 003bd560
+d0: 0035eb83 d1: 07fffff8 d2: 002c1192 d3: 000000e6
+d4: 000684e8 d5: 00447000 a0: 0000000c a1: 07fffff4
+Process swapper (pid: 0, task=(ptrval))
+Frame format=7 eff addr=003bbfbc ssw=0505 faddr=07fffff4
+wb 1 stat/addr/data: 0005 00447000 07401000
+wb 2 stat/addr/data: 0005 000000e6 000684e8
+wb 3 stat/addr/data: 0005 003bbfb4 002c1192
+push data: 07401000 002c7d82 07401000 074a2cf4
+Stack from 003bbfb4:
+002c1192 000000e6 002c7d82 00428eda 07fffff4 0035eb7f 0000000c 00447000
+000000e6 000684e8 00447000 07401000 074bec08 07401000 074a2cf4 07fffff0
+00440406 00000000 00428322
+Call Trace: [<002c1192>] memcmp+0x0/0x5c
+[<002c7d82>] _printk+0x0/0x18
+[<00428eda>] start_kernel+0x80/0x5b0
+[<000684e8>] pcpu_alloc+0x88/0x3b4
+[<00428322>] _sinittext+0x322/0x9b0
On Mon, Feb 27, 2023 at 7:30 AM Finn Thain <fthain@linux-m68k.org> wrote:
On Mon, 27 Feb 2023, Michael Schmitz wrote:
I wonder whether Finn's memtest patch merely exposed another MM bug
A kernel patch may be easier than a bootloader patch (even if this is a bootloader bug) particularly if it affects multiple platforms.
A partial revert of my patch (below) will probably avoid the issue, but with the side effect that use of memtest will clobber the initrd.
Which we can avoid, by moving the ramdisk handling inside paging_init().
The initrd and memtest features aren't usually needed together. At the time when I needed the memtest feature I did not have confidence in the hardeare. An initrd wasn't very useful at that point.
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c index 3a2bb2e8fdad..92f1b9268dff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -326,6 +326,8 @@ void __init setup_arch(char **cmdline_p)
panic("No configuration setup");
}
+ paging_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
Presumably something in memblock_reserve() relies on having
called paging_init() before?
I'll do some more debugging later today...
@@ -335,8 +337,6 @@ void __init setup_arch(char **cmdline_p)
}
#endif
- paging_init();
-
#ifdef CONFIG_NATFEAT
nf_init();
#endif
Hi Mike,
On Mon, Feb 27, 2023 at 12:34 PM Mike Rapoport <rppt@kernel.org> wrote:
On Mon, Feb 27, 2023 at 10:42:34PM +1300, Michael Schmitz wrote:
Am 27.02.2023 um 21:26 schrieb Geert Uytterhoeven:
On Mon, 27 Feb 2023, Michael Schmitz wrote:
I wonder whether Finn's memtest patch merely exposed another MM bug
A kernel patch may be easier than a bootloader patch (even if this is a
bootloader bug) particularly if it affects multiple platforms.
A partial revert of my patch (below) will probably avoid the issue, but
with the side effect that use of memtest will clobber the initrd.
Which we can avoid, by moving the ramdisk handling inside paging_init().
The initrd and memtest features aren't usually needed together. At the
time when I needed the memtest feature I did not have confidence in the
hardeare. An initrd wasn't very useful at that point.
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c
index 3a2bb2e8fdad..92f1b9268dff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -326,6 +326,8 @@ void __init setup_arch(char **cmdline_p)
panic("No configuration setup");
}
+ paging_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
Presumably something in memblock_reserve() relies on having
called paging_init() before?
memblock_reserve() does not rely on paging_init() as it operates on physical addresses and it does not care if memory was already registered.
What does rely on paging_init() it's phys_to_virt() in the line after memblock_reserve():
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
initrd_end = initrd_start + m68k_ramdisk.size;
So to have both memtest and initrd we'd need something like
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
paging_init() {
/* setup page tables and memblock */
early_memtest();
}
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
or
paging_init(); /* without early_memtest() */
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
early_memtest();
Of course... /me bangs his head against the TFT for not having
realized before the values saved into initrd_{start,end} are not just
for printing in the pr_info() line...
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Mon, Feb 27, 2023 at 10:42:34PM +1300, Michael Schmitz wrote:
Am 27.02.2023 um 21:26 schrieb Geert Uytterhoeven:
FTR, here is the diff of the dmesg between good and bad:
+initrd: 07f61166 - 08000000
This is wrong (note the 6 trailing zeros), as phys_to_virt() is not working correctly yet (module_fixup() is called from paging_init()).
Zone ranges:
DMA [mem 0x0000000007400000-0x0000007fffffffff]
Normal empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000007400000-0x0000000007ffffff]
Initmem setup node 0 [mem 0x0000000007400000-0x0000000007ffffff]
-initrd: 00b61166 - 00c00000
This is correct (note the 5 trailing zeros).
-pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
-pcpu-alloc: [0] 0
[...]
+Unable to handle kernel access at virtual address (ptrval)
+Oops: 00000000
+Modules linked in:
+PC: [<002c11be>] memcmp+0x2c/0x5c
+SR: 2700 SP: (ptrval) a2: 003bd560
+d0: 0035eb83 d1: 07fffff8 d2: 002c1192 d3: 000000e6
+d4: 000684e8 d5: 00447000 a0: 0000000c a1: 07fffff4
+Process swapper (pid: 0, task=(ptrval))
+Frame format=7 eff addr=003bbfbc ssw=0505 faddr=07fffff4
+wb 1 stat/addr/data: 0005 00447000 07401000
+wb 2 stat/addr/data: 0005 000000e6 000684e8
+wb 3 stat/addr/data: 0005 003bbfb4 002c1192
+push data: 07401000 002c7d82 07401000 074a2cf4
+Stack from 003bbfb4:
+002c1192 000000e6 002c7d82 00428eda 07fffff4 0035eb7f 0000000c 00447000
+000000e6 000684e8 00447000 07401000 074bec08 07401000 074a2cf4 07fffff0
+00440406 00000000 00428322
+Call Trace: [<002c1192>] memcmp+0x0/0x5c
+[<002c7d82>] _printk+0x0/0x18
+[<00428eda>] start_kernel+0x80/0x5b0
+[<000684e8>] pcpu_alloc+0x88/0x3b4
+[<00428322>] _sinittext+0x322/0x9b0
On Mon, Feb 27, 2023 at 7:30 AM Finn Thain <fthain@linux-m68k.org> wrote:
On Mon, 27 Feb 2023, Michael Schmitz wrote:
I wonder whether Finn's memtest patch merely exposed another MM bug
A kernel patch may be easier than a bootloader patch (even if this is a bootloader bug) particularly if it affects multiple platforms.
A partial revert of my patch (below) will probably avoid the issue, but with the side effect that use of memtest will clobber the initrd.
Which we can avoid, by moving the ramdisk handling inside paging_init().
The initrd and memtest features aren't usually needed together. At the time when I needed the memtest feature I did not have confidence in the hardeare. An initrd wasn't very useful at that point.
diff --git a/arch/m68k/kernel/setup_mm.c b/arch/m68k/kernel/setup_mm.c index 3a2bb2e8fdad..92f1b9268dff 100644
--- a/arch/m68k/kernel/setup_mm.c
+++ b/arch/m68k/kernel/setup_mm.c
@@ -326,6 +326,8 @@ void __init setup_arch(char **cmdline_p)
panic("No configuration setup");
}
+ paging_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (m68k_ramdisk.size) {
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
Presumably something in memblock_reserve() relies on having
called paging_init() before?
memblock_reserve() does not rely on paging_init() as it operates on
physical addresses and it does not care if memory was already registered.
What does rely on paging_init() it's phys_to_virt() in the line after memblock_reserve():
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
initrd_end = initrd_start + m68k_ramdisk.size;
So to have both memtest and initrd we'd need something like
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
paging_init() {
/* setup page tables and memblock */
early_memtest();
}
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
or
paging_init(); /* without early_memtest() */
memblock_reserve(m68k_ramdisk.addr, m68k_ramdisk.size);
initrd_start = (unsigned long)phys_to_virt(m68k_ramdisk.addr);
early_memtest();
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 304 |
Nodes: | 16 (2 / 14) |
Uptime: | 34:06:36 |
Calls: | 6,820 |
Files: | 12,335 |
Messages: | 5,407,122 |