• OT: on the meaning of the objdump addresses

    From Meredith Montgomery@21:1/5 to All on Tue Nov 16 17:04:25 2021
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8---
    %objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:
    100401000: 55 push %rbp
    100401001: 48 89 e5 mov %rsp,%rbp
    100401004: 48 83 ec 20 sub $0x20,%rsp
    100401008: 48 8d 0d 71 00 00 00 lea 0x71(%rip),%rcx # 100401080 <main>
    10040100f: e8 0c 01 00 00 call 100401120 <msys_crt0>
    [...]
    --8<---------------cut here---------------end--------------->8---

    What is the meaning of these hexadecimal addresses, first column?

    I don't believe these addresses are really virtual-memory addresses,
    meaning that machine code will be loaded to that place. But they could
    be because it's just virtual memory. If they're not the right
    addresses, why does the compiler or the linker (which?) write them in
    the first place in the executable?

    Is there any book that I could read that would explain these systems a
    bit --- about any system specifically such as a FreeBSD, an OpenBSD or a
    GNU system or Windows would be fine, too? Thank you!

    I don't know where to ask this question, but I'm sure some of you know a
    lot about this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Meredith Montgomery on Tue Nov 16 20:15:34 2021
    Meredith Montgomery <mmontgomery@levado.to> writes:
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8--- >%objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:
    100401000: 55 push %rbp
    100401001: 48 89 e5 mov %rsp,%rbp
    100401004: 48 83 ec 20 sub $0x20,%rsp
    100401008: 48 8d 0d 71 00 00 00 lea 0x71(%rip),%rcx # 100401080 <main>
    10040100f: e8 0c 01 00 00 call 100401120 <msys_crt0>
    [...]
    --8<---------------cut here---------------end--------------->8---

    What is the meaning of these hexadecimal addresses, first column?

    As you suggest, these are the virtual addresses to which that code
    will be loaded.

    In generally, the executable file (PE COFF for Windows, ELF for Unix/Linux) contains header which describes the layout of the executable file.

    https://man7.org/linux/man-pages/man5/elf.5.html

    The header will point to a table consisting of the program
    sections that need to be loaded when the program is executed.
    (called program headers in ELF). Each entry contains the
    program address (which could be virtual, or if the code is
    running sans the benefit of an operating system, it could
    be physical) at which that section of the codefile should be
    loaded.

    The compiler doesn't generate addresses at all, most addresses in
    intermediate object files are relocatable (generally relative to
    the start of a function) and the final program
    addresses are assigned by the linker (or for the case of dynamically
    loaded shared objects, the run-time linker/loader).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Meredith Montgomery@21:1/5 to Scott Lurndal on Wed Nov 17 12:18:23 2021
    scott@slp53.sl.home (Scott Lurndal) writes:

    Meredith Montgomery <mmontgomery@levado.to> writes:
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8--- >>%objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:
    100401000: 55 push %rbp
    100401001: 48 89 e5 mov %rsp,%rbp
    100401004: 48 83 ec 20 sub $0x20,%rsp
    100401008: 48 8d 0d 71 00 00 00 lea 0x71(%rip),%rcx # 100401080
    <main>
    10040100f: e8 0c 01 00 00 call 100401120 <msys_crt0>
    [...]
    --8<---------------cut here---------------end--------------->8---

    What is the meaning of these hexadecimal addresses, first column?

    As you suggest, these are the virtual addresses to which that code
    will be loaded.

    In generally, the executable file (PE COFF for Windows, ELF for Unix/Linux) contains header which describes the layout of the executable file.

    https://man7.org/linux/man-pages/man5/elf.5.html

    Thank you.

    The header will point to a table consisting of the program
    sections that need to be loaded when the program is executed.
    (called program headers in ELF). Each entry contains the
    program address (which could be virtual, or if the code is
    running sans the benefit of an operating system, it could
    be physical) at which that section of the codefile should be
    loaded.

    That makes sense. Thank you. But, of course, when we see a virtual
    address in an executable, we can't assume it will be loaded at that
    exact address, right? I suppose the system reserves itself the right to relocate these bytes in virtual memory itself. (Perhaps some DLL needs
    some of these locations for whatever reason?)

    The compiler doesn't generate addresses at all, most addresses in intermediate object files are relocatable (generally relative to
    the start of a function) and the final program
    addresses are assigned by the linker (or for the case of dynamically
    loaded shared objects, the run-time linker/loader).

    That makes sense. (By ``compiler'' I had in mind the whole executable-production pipeline.) Perhaps an illustration of ``relative
    to the start of a function'' is the following COFF hello.o.

    --8<---------------cut here---------------start------------->8---
    %objdump -d hello.o
    hello.o: file format pe-x86-64
    Disassembly of section .text:
    0000000000000000 <main>:
    0: 55 push %rbp
    1: 48 89 e5 mov %rsp,%rbp
    4: 48 83 ec 30 sub $0x30,%rsp
    8: 89 4d 10 mov %ecx,0x10(%rbp)
    b: 48 89 55 18 mov %rdx,0x18(%rbp)
    f: 4c 89 45 20 mov %r8,0x20(%rbp)
    [...]
    --8<---------------cut here---------------end--------------->8---

    The addresses there just serve as counting of the number of bytes the
    code takes. For all I know it could even be just a nice output given by objdump and not actually present in hello.o. I don't know.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Meredith Montgomery on Wed Nov 17 18:20:42 2021
    Meredith Montgomery <mmontgomery@levado.to> writes:
    scott@slp53.sl.home (Scott Lurndal) writes:

    Meredith Montgomery <mmontgomery@levado.to> writes:
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8--- >>>%objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:

    What is the meaning of these hexadecimal addresses, first column?

    [snip]

    The header will point to a table consisting of the program
    sections that need to be loaded when the program is executed.
    (called program headers in ELF). Each entry contains the
    program address (which could be virtual, or if the code is
    running sans the benefit of an operating system, it could
    be physical) at which that section of the codefile should be
    loaded.

    That makes sense. Thank you. But, of course, when we see a virtual
    address in an executable, we can't assume it will be loaded at that
    exact address, right?

    It must, by definition, be loaded at that virtual address.

    If you're not familiar with how virtual memory works, a quick
    high-level refresher:

    Modern (post 80286 for intel, earlier for mainframes) CPU's
    support multiple "address spaces". Leaving aside the exotic
    systems like the B6500, there is physical address space and
    one or more virtual address spaces.

    Processors support virtual address spaces using translation
    tables in memory that translate a virtual address into a
    corresponding physical address which describes the actual
    location in memory of the data referred via the virtual address.

    Each process/task/thread supported the operating system will
    have an unique virtual address space assigned to it, and the
    physical memory will be apportioned between those virtual address
    spaces by the operating system as needs require. If a valid
    translation is not present for a virtual address, the processor
    traps to the operating system to resolve the issue (i.e. allocate
    a free physical page to the virtual address and, if necessary,
    loading the required data from backing storage into that newly
    allocated page). Returning from the trap will re-execute the
    instruction that caused the "page fault" and the application will
    continue using the data at the newly loaded virtual page.


    I suppose the system reserves itself the right to
    relocate these bytes in virtual memory itself. (Perhaps some DLL needs
    some of these locations for whatever reason?)

    Shared objects (Unix/Linux) and Dynamically Linked Libraries (DLLs in windows) are compiled into what is called "Position Independent Code" and
    use various indirection techniques to interface between the library
    and the application such that the libraries can be loaded anywhere
    in the virtual address space.


    The compiler doesn't generate addresses at all, most addresses in
    intermediate object files are relocatable (generally relative to
    the start of a function) and the final program
    addresses are assigned by the linker (or for the case of dynamically
    loaded shared objects, the run-time linker/loader).

    That makes sense. (By ``compiler'' I had in mind the whole >executable-production pipeline.) Perhaps an illustration of ``relative
    to the start of a function'' is the following COFF hello.o.

    --8<---------------cut here---------------start------------->8---
    %objdump -d hello.o
    hello.o: file format pe-x86-64
    Disassembly of section .text:
    0000000000000000 <main>:
    0: 55 push %rbp
    1: 48 89 e5 mov %rsp,%rbp
    4: 48 83 ec 30 sub $0x30,%rsp
    8: 89 4d 10 mov %ecx,0x10(%rbp)
    b: 48 89 55 18 mov %rdx,0x18(%rbp)
    f: 4c 89 45 20 mov %r8,0x20(%rbp)
    [...]
    --8<---------------cut here---------------end--------------->8---

    The addresses there just serve as counting of the number of bytes the
    code takes. For all I know it could even be just a nice output given by >objdump and not actually present in hello.o. I don't know.

    They are produced by the disassembler as it interprets the bytestream,
    since instructions on x86 are variable in length. The object file
    you dumped also contains a relocation table with entries that point to bytes within the instruction byte stream that contain addresses to unresolved
    symbols and the linker uses the relocation entries to update the
    instruction stream with the correct address of the symbol after it
    has been determined.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jorgen Grahn@21:1/5 to Scott Lurndal on Thu Nov 18 07:22:45 2021
    On Wed, 2021-11-17, Scott Lurndal wrote:
    Meredith Montgomery <mmontgomery@levado.to> writes:
    scott@slp53.sl.home (Scott Lurndal) writes:

    Meredith Montgomery <mmontgomery@levado.to> writes:
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8--- >>>>%objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:

    What is the meaning of these hexadecimal addresses, first column?

    [snip]

    The header will point to a table consisting of the program
    sections that need to be loaded when the program is executed.
    (called program headers in ELF). Each entry contains the
    program address (which could be virtual, or if the code is
    running sans the benefit of an operating system, it could
    be physical) at which that section of the codefile should be
    loaded.

    That makes sense. Thank you. But, of course, when we see a virtual >>address in an executable, we can't assume it will be loaded at that
    exact address, right?

    It must, by definition, be loaded at that virtual address.

    Yes, and it has important effects, such as seeing the same addresses
    in core dumps and various stack traces.

    Shared libraries, on the other hand, are not loaded into predictable
    addresses.

    [long snip]

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Jorgen Grahn on Thu Nov 18 18:17:53 2021
    Jorgen Grahn <grahn+nntp@snipabacken.se> writes:
    On Wed, 2021-11-17, Scott Lurndal wrote:
    Meredith Montgomery <mmontgomery@levado.to> writes:
    scott@slp53.sl.home (Scott Lurndal) writes:

    Meredith Montgomery <mmontgomery@levado.to> writes:
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8--- >>>>>%objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:

    What is the meaning of these hexadecimal addresses, first column?

    [snip]

    The header will point to a table consisting of the program
    sections that need to be loaded when the program is executed.
    (called program headers in ELF). Each entry contains the
    program address (which could be virtual, or if the code is
    running sans the benefit of an operating system, it could
    be physical) at which that section of the codefile should be
    loaded.

    That makes sense. Thank you. But, of course, when we see a virtual >>>address in an executable, we can't assume it will be loaded at that
    exact address, right?

    It must, by definition, be loaded at that virtual address.

    Yes, and it has important effects, such as seeing the same addresses
    in core dumps and various stack traces.

    Shared libraries, on the other hand, are not loaded into predictable >addresses.

    While generally true, there was a period of time in the 1980s
    when SVR3.2 had a form of COFF static shared library that required loading at
    a fixed address. Made it very complicated to include multiple shared
    libraries an application without the VAs clashing, particularly in
    a 32-bit address space where half was occupied by the OS (3b2, x86).

    John Levine discusses them here:

    https://www.cs.tufts.edu%2F~nr%2Fcs257%2Farchive%2Fjohn-levine%2Flinker09.ps.gz

    Note that you'll need some form of postscript reader (e.g. ghostscript)
    to read the document. It is _not_ pdf.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Scott Lurndal on Fri Nov 19 06:34:01 2021
    On 2021-11-18, Scott Lurndal <scott@slp53.sl.home> wrote:
    Jorgen Grahn <grahn+nntp@snipabacken.se> writes:
    On Wed, 2021-11-17, Scott Lurndal wrote:
    Shared libraries, on the other hand, are not loaded into predictable >>addresses.

    While generally true, there was a period of time in the 1980s
    when SVR3.2 had a form of COFF static shared library that required loading at a fixed address.

    There was a period of time in the 1990's, when GNU/Linux distros had the
    same thing.

    This is why we have "libc.so.6", where the 6 has nothing to do with the
    GLibc 2 version number.

    Prior to that there were libc4 and libc5 based on GNU C Library 1.

    Under libc4, binaries were a.out, and I think shared libraries were
    still not relocatable. It may be libc5 that made them relocatable.
    libc6 was then based of GNU C Library 2, and so we have that 6 in the
    sonames.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to 480-992-1380@kylheku.com on Fri Nov 19 08:27:16 2021
    In article <20211118222512.683@kylheku.com>,
    Kaz Kylheku <480-992-1380@kylheku.com> wrote:
    ...
    While generally true, there was a period of time in the 1980s
    when SVR3.2 had a form of COFF static shared library that required loading at
    a fixed address.

    There was a period of time in the 1990's, when GNU/Linux distros had the
    same thing.

    This is why we have "libc.so.6", where the 6 has nothing to do with the
    GLibc 2 version number.

    Prior to that there were libc4 and libc5 based on GNU C Library 1.

    Under libc4, binaries were a.out, and I think shared libraries were
    still not relocatable. It may be libc5 that made them relocatable.
    libc6 was then based of GNU C Library 2, and so we have that 6 in the >sonames.

    But what about (and whatever became of) libcs 1, 2, and 3?

    --
    b w r w g y b r y b

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Meredith Montgomery@21:1/5 to Scott Lurndal on Fri Nov 19 16:13:52 2021
    scott@slp53.sl.home (Scott Lurndal) writes:

    Meredith Montgomery <mmontgomery@levado.to> writes:
    scott@slp53.sl.home (Scott Lurndal) writes:

    Meredith Montgomery <mmontgomery@levado.to> writes:
    Here's a brief passage from objdump on a MSYS2 system on Windows 10.

    --8<---------------cut here---------------start------------->8--- >>>>%objdump.exe -D hello.exe
    hello.exe: file format pei-x86-64

    Disassembly of section .text:

    0000000100401000 <WinMainCRTStartup>:

    What is the meaning of these hexadecimal addresses, first column?

    [snip]

    The header will point to a table consisting of the program
    sections that need to be loaded when the program is executed.
    (called program headers in ELF). Each entry contains the
    program address (which could be virtual, or if the code is
    running sans the benefit of an operating system, it could
    be physical) at which that section of the codefile should be
    loaded.

    That makes sense. Thank you. But, of course, when we see a virtual >>address in an executable, we can't assume it will be loaded at that
    exact address, right?

    It must, by definition, be loaded at that virtual address.

    If you're not familiar with how virtual memory works, a quick
    high-level refresher:

    Thank you. That was a very nice read along with the subsequent posts
    that followed. I do have a much better picture in mind now. Thanks
    very much.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)