• Actually Portable Executable

    From Ben Collver@21:1/5 to All on Fri Feb 17 22:36:07 2023
    # αcτµαlly pδrταblε εxεcµταblε (actually portable executable)

    One day, while studying old code, I found out that it's possible to
    encode Windows Portable Executable files as a UNIX Sixth Edition
    shell script, due to the fact that the Thompson Shell didn't use a
    shebang line. Once I realized it's possible to create a synthesis of
    the binary formats being used by Unix, Windows, and MacOS, I couldn't
    resist the temptation of making it a reality, since it means that high-performance native code can be almost as pain-free as web apps.
    Here's how it works:

    MZqFpD='
    BIOS BOOT SECTOR'
    exec 7<> $(command -v $0)
    printf '\177ELF...LINKER-ENCODED-FREEBSD-HEADER' >&7
    exec "$0" "$@"
    exec qemu-x86_64 "$0" "$@"
    exit 1
    REAL MODE...
    ELF SEGMENTS...
    OPENBSD NOTE...
    NETBSD NOTE...
    MACHO HEADERS...
    CODE AND DATA...
    ZIP DIRECTORY...

    I started a project called Cosmopolitan which implements the αcτµαlly pδrταblε εxεcµταblε format. I chose the name because I like the idea of having the freedom to write software without restrictions that
    transcends traditional boundaries. My goal has been helping C become
    a build-once run-anywhere language, suitable for greenfield
    development, while avoiding any assumptions that would prevent
    software from being shared between tech communities. Here's how
    simple it is to get started:

    gcc -g -O -static -fno-pie -no-pie -mno-red-zone -nostdlib -nostdinc \
    -o hello.com hello.c -Wl,--oformat=binary -Wl,--gc-sections \
    -Wl,-z,max-page-size=0x1000 -fuse-ld=bfd -gdwarf-4 -Wl,-T,ape.lds \
    -include cosmopolitan.h crt.o ape.o cosmopolitan.a

    https://justine.lol/cosmopolitan/ape.lds https://justine.lol/cosmopolitan/cosmopolitan.h https://justine.lol/cosmopolitan/crt.o
    https://justine.lol/cosmopolitan/ape.o https://justine.lol/cosmopolitan/cosmopolitan.a

    In the above one-liner, we've basically reconfigured the stock
    compiler on Linux so it outputs binaries that'll run on MacOS,
    Windows, FreeBSD, OpenBSD, and NetBSD too. They also boot from the
    BIOS. Please note this is intended for people who don't care about
    desktop GUIs, and just want stdio and sockets without devops toil.

    # Platform Agnostic C / C++ / FORTRAN Tooling

    Who could have predicted that cross-platform native builds would be
    this easy? As it turns out, they're surprisingly cheap too. Even with
    all the magic numbers, win32 utf-8 polyfills, and bios bootloader
    code, exes still end up being roughly 100x smaller than Go Hello
    World:

    life.com is 12kb (symbols, source)
    https://justine.lol/life.com
    https://worker.jart.workers.dev/life.com.dbg
    https://raw.githubusercontent.com/jart/cosmopolitan/1.0/examples/life.c

    hello.com is 16kb (symbols, source)
    https://justine.lol/hello.com
    https://worker.jart.workers.dev/hello.com.dbg
    <https://raw.githubusercontent.com/jart/cosmopolitan/1.0/examples/
    hello.c>

    Please note that zsh has a minor backwards compatibility glitch with
    Thompson Shell [update 2021-02-15: zsh has now been patched] so try
    sh hello.com rather than ./hello.com. That one thing aside, if it's
    this easy, why has no one done this before? The best answer I can
    tell is it requires a minor ABI change, where C preprocessor macros
    relating to system interfaces need to be symbolic. This is barely an
    issue, except in cases like switch(errno){case EINVAL:...}. If we
    feel comfortable bending the rules, then the GNU Linker can easily be configured to generate at linktime all the PE/Darwin data structures
    we need, without any special toolchains.

    # PKZIP Executables Make Pretty Good Containers

    Single-file executables are nice to have. There are a few cases where
    static executables depending on system files makes sense, e.g.
    zoneinfo. However we can't make that assumption if we're building
    binaries intended to run on multiple distros with Windows support too.

    As it turns out, PKZIP was designed to place its magic marker at the
    end of file, rather than the beginning, so we can synthesize
    ELF/PE/MachO binaries with ZIP too! I was able to implement this
    efficiently in the Cosmopolitan codebase using a few lines of linker
    script, along with a program for incrementally compressing sections.

    It's possible to run unzip -vl executable.com to view its contents.
    It's also possible on Windows 10 to change the file extension to .zip
    and then open it in Microsoft's bundled ZIP GUI. Having that
    flexibility of being able to easily edit assets post-compilation
    means we can also do things like create an easily distributable
    JavaScript interpreter that reflectively loads interpreted sources
    via zip.

    hellojs.com is 300kb (symbols, source)
    https://justine.lol/hellojs.com
    https://worker.jart.workers.dev/hellojs.com.dbg
    https://github.com/jart/cosmopolitan/blob/1.0/examples/hellojs.c

    Cosmopolitan also uses the ZIP format to automate compliance with the
    GPLv2 [update 2020-12-28: APE is now licensed ISC]. The
    non-commercial libre build is configured, by default, to embed any
    source file linked from within the hermetic make mono-repo. That
    makes binaries roughly 10x larger. For example:

    life2.com is 216kb (symbols, source)
    https://justine.lol/life2.com
    https://worker.jart.workers.dev/life2.com.dbg
    https://github.com/jart/cosmopolitan/blob/1.0/examples/life.c

    hello2.com is 256kb (symbols, source)
    https://justine.lol/hello2.com
    https://worker.jart.workers.dev/hello2.com.dbg
    https://github.com/jart/cosmopolitan/blob/1.0/examples/hello.c

    Rock musicians have a love-hate relationship with dynamic range
    compression, since it removes a dimension of complexity from their
    music, but is necessary in order to sound professional. Bloat might
    work by the same principles, in which case, zip source file embedding
    could be a more socially conscious way of wasting resources in order
    to gain appeal with the non-classical software consumer.

    # x86-64 Linux ABI Makes a Pretty Good Lingua Franca

    It wasn't until very recently in computing history that a clear
    shakeout occurred with hardware architectures, which is best
    evidenced by the TOP 500 list. Outside phones, routers, mainframes,
    and cars, the consensus surrounding x86 is so strong, that I'd
    compare it to the Tower of Babel. Thanks to Linus Torvalds, we not
    only have a consensus on architecture, but we've come pretty close to
    having a consensus on the input output mechanism by which programs
    communicate with their host machines, via the SYSCALL instruction. He accomplished that by sitting at home in a bathrobe sending emails to
    huge corporations, getting them to agree to devote their resources to
    creating something beautifully opposite to tragedy of the commons.

    So I think it's really the best of times to be optimistic about
    systems engineering. We agree more on sharing things in common than
    we ever have. There are still outliers like the plans coming out of
    Apple and Microsoft we hear about in the news, where they've sought
    to pivot PCs towards ARM. I'm not sure why we need a C-Class
    Macintosh, since the x86_64 patents should expire this year. Apple
    could have probably made their own x86 chip without paying royalties.
    The free/open architecture that we've always dreamed of, might turn
    out to be the one we're already using.

    If a microprocessor architecture consensus finally exists, then I
    believe we should be focusing on building better tools that help
    software developers benefit from it. One of the ways I've been
    focusing on making a contribution in that area, is by building a
    friendlier way to visualize the impact that x86-64 execution has on
    memory. It should should hopefully clarify how αcτµαlly pδrταblε εxεcµταblε works.

    You'll notice that execution starts off by treating the Windows PE
    header as though it were code. For example, the ASCII string "MZqFpD"
    decodes as pop %r10 ; jno 0x4a ; jo 0x4a and the string "\177ELF"
    decodes as jg 0x47. It then hops through a mov statement which tells
    us the program is being run from userspace rather than being booted,
    and then hops to the entrypoint.

    Magic numbers are then easily unpacked for the host operating system
    using decentralized sections and the GNU Assembler .sleb128
    directive. Low entropy data like UNICODE bit lookup tables will
    generally be decoded using either a 103 byte LZ4 decompressor or a 17
    byte run-length decoder, and runtime code morphing can easily be done
    using Intel's 3kb x86 decoder.

    https://github.com/jart/cosmopolitan/blob/1.0/libc/sysv/systemfive.S https://github.com/jart/cosmopolitan/blob/1.0/libc/str/lz4cpy.c https://github.com/jart/cosmopolitan/blob/1.0/libc/nexgen32e/rldecode.S <https://github.com/jart/cosmopolitan/blob/1.0/third_party/xed/
    x86ild.greg.c>

    Please note that this emulator isn't a requirement. αcτµαlly pδrταblε εxεcµταblεs work fine if you just run them on the shell, the NT
    command prompt, or boot them from the BIOS. This isn't a JVM. You
    only use the emulator if you need it. For example, it's helpful to be
    able to have cool visualizations of how program execution impacts
    memory.

    It'll be nice to know that any normal PC program we write will "just
    work" on Raspberry Pi and Apple ARM. All we have to do embed an ARM
    build of the emulator above within our x86 executables, and have them
    morph and re-exec appropriately, similar to how Cosmopolitan is
    already doing doing with qemu-x86_64, except that this wouldn't need
    to be installed beforehand. The tradeoff is that, if we do this,
    binaries will only be 10x smaller than Go's Hello World, instead of
    100x smaller. The other tradeoff is the GCC Runtime Exception forbids
    code morphing, but I already took care of that for you, by rewriting
    the GNU runtimes.

    The most compelling use case for making x86-64-linux-gnu as tiny as
    possible, with the availability of full emulation, is that it enables
    normal simple native programs to run everywhere including web
    browsers by default. Many of the solutions built in this area tend to
    focus too much on the interfaces that haven't achieved consensus,
    like GUIs and threads, otherwise they'll just emulate the entire
    operating system, like Docker or Fabrice Bellard running Windows in
    browsers. I think we need compatibility glue that just runs programs,
    ignores the systems, and treats x86_64-linux-gnu as a canonical
    software encoding.

    # Long Lifetime Without Maintenance

    One of the reasons why I love working with a lot of these old
    technologies, is that I want any software work I'm involved in to
    stand the test of time with minimal toil. Similar to how the Super
    Mario Bros ROM has managed to survive all these years without needing
    a GitHub issue tracker.

    I believe the best chance we have of doing that, is by gluing
    together the binary interfaces that've already achieved a
    decades-long consensus, and ignoring the APIs. For example, here are
    the magic numbers used by Mac, Linux, BSD, and Windows distros.
    They're worth seeing at least once in your life, since these numbers
    underpin the internals of nearly all the computers, servers, and
    phones you've used.

    https://github.com/jart/cosmopolitan/blob/1.0/libc/sysv/consts.sh

    If we focus on the subset of numbers all systems share in common, and
    compare it to their common ancestor, Bell System Five, we can see
    that few things about systems engineering have changed in the last 40
    years at the binary level. Magnums are boring. Platforms can't break
    them without breaking themselves. Few people have proposed visions
    over the years on why UNIX numerology needs to change.

    # download [Linux/Windows/DOS/MacOS/FreeBSD/OpenBSD/NetBSD]

    emulator.com (280k PE+ELF+MachO+ZIP+SH)
    https://justine.lol/emulator.com

    tinyemu.com (188k PE+ELF+MachO+ZIP+SH)
    https://justine.lol/tinyemu.com

    # source code

    https://raw.githubusercontent.com/jart/cosmopolitan/1.0/ape/ape.S

    https://raw.githubusercontent.com/jart/cosmopolitan/1.0/ape/ape.lds

    <https://github.com/jart/cosmopolitan/blob/1.0/tool/build/
    blinkenlights.c>

    <https://github.com/jart/cosmopolitan/blob/1.0/third_party/xed/
    x86ild.greg.c>

    https://github.com/jart/cosmopolitan/blob/1.0/libc/sysv/syscalls.sh

    https://github.com/jart/cosmopolitan/blob/1.0/libc/sysv/consts.sh

    # programs

    life.com (12kb ape symbols)
    https://justine.lol/life.com
    https://worker.jart.workers.dev/life.com.dbg

    sha256.elf (3kb x86_64-linux-gnu)
    https://justine.lol/sha256.elf

    hello.bin (55b x86_64-linux-gnu)
    https://justine.lol/hello.bin

    # example

    bash hello.com # runs it natively
    ./hello.com # runs it natively
    ./tinyemu.com hello.com # just runs program
    ./emulator.com -t life.com # show debugger gui
    echo hello | ./emulator.com sha256.elf

    # manual

    SYNOPSIS

    ./emulator.com [-?HhrRstv] [ROM] [ARGS...]

    DESCRIPTION

    Emulates x86 Linux Programs w/ Dense Machine State Visualization
    Please keep still and only watchen astaunished das blinkenlights

    FLAGS

    -h help
    -z zoom
    -v verbosity
    -r real mode
    -s statistics
    -H disable highlight
    -t tui debugger mode
    -R reactive tui mode
    -b ADDR push a breakpoint
    -L PATH log file location

    ARGUMENTS

    ROM files can be ELF or a flat αcτµαlly pδrταblε εxεcµταblε.
    It should use x86_64 in accordance with the System Five ABI.
    The SYSCALL ABI is defined as it is written in Linux Kernel.

    FEATURES

    8086, 8087, i386, x86_64, SSE3, SSSE3, POPCNT, MDA, CGA, TTY

    WEBSITE

    https://justine.lol/blinkenlights/

    Written by Justine Tunney

    jtunney@gmail.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)