What's the difference between the com and exe extensions? (2008)

codeflo · on May 27, 2017

Back when I was a kid, in the days before Google, you tried to figure out how things worked just by poking around your computer. For example, I noticed that professional programs were always .com/.exe files, but I only knew how to edit simple batch scripts. So of course, the "logical" thing to try was to write a .bat and save it with a different extension and see what happened.

.exe files created this way wouldn't launch, but a .com created this way would instantly reboot the computer. Even more weirdly, this only worked for the specific command that happened to enter in my first attempt -- it might have been something like "cd games". If I changed anything at all, it no longer worked. Only years later I realized that the characters I typed were directly executed as machine code, and that I must have stumbled upon a set of instructions that caused a CPU exception!

For several years, this weird file was the only way I knew to programmatically reboot a computer, so I renamed it "reset.com" and kept it around. I think it stopped working only in the Windows XP era when DOS programs were finally sandboxed to some extent.

eriknstr · on May 27, 2017

>So of course, the "logical" thing to try was to write a .bat and save it with a different extension and see what happened.

Ha ha, same here. Also the next thing I did was to open exe files with notepad and conclude that this is what real code looked like and that the programmers that wrote the programs had written it that way.

When a little bit later I learned that programs were written in languages like C++, I got a book about C++ while on vacation and I read a bit in that book and I wrote down the Hello World program on paper and wrote variations of it and a couple of programs that would ask questions and say something. When I got back from vacation I sat down in front of my computer, typed in hello world and used a compiler called Bloodshed C++ which the book had recommended and compiled the program and double-clicked the exe file, only to have it open a window and immediately close. I had failed to understand that I needed to run the program via cmd.exe because of course Hello World exits immediately when it's printed the text.

This discouraged me a bit so I stayed away from programming languages for several years after that but at least I kept learning HTML and a bit about reverse engineering and a couple of other things and eventually I got back to programming when I started high school and we were using Texas Instruments TI-84 Plus calculators and we were given a couple of programs written in TI-BASIC which we were told to store on our calculators. Once I'd written some programs of my own in TI-BASIC I then read a book about PHP and wrote some PHP.

Such was the beginning of my experience with programming.

Sharlin · on May 27, 2017

> I had failed to understand that I needed to run the program via cmd.exe because of course Hello World exits immediately when it's printed the text.

And till this day, people writing command-line programs in an IDE like Visual Studio stumble on this fact, and the workaround is still the same: adding an stdin read or something like system("PAUSE") at the end, pausing the execution until the user hits Enter.

to3m · on May 27, 2017

Best solution I've found:

    static void PressEnter(void) {
        if(IsDebuggerPresent()) {
            fprintf(stderr,"press enter\n");
            getchar();
        }
    }

Then from somewhere right near the top of main or whatever:

    atexit(&PressEnter);

You can throw in some isatty calls, or #ifdef _CONSOLE, or if(GetConsoleWindow()), etc., but the above pretty much always does the right thing.

int_19h · on May 28, 2017

Or you can flip the switch in the IDE itself to pause on exit.

to3m · on May 28, 2017

Does Visual Studio have such an option? If so, where is it?

jungletek · on May 29, 2017

https://stackoverflow.com/questions/1775865/preventing-conso...

dfox · on May 27, 2017

'cd ' diassembles into arpl [si+0x20],sp which seems like privileged instruction but actually isn't, on the other hand it is not even valid instruction in real/virtual mode and thus causes illegal instruction exception.

It is interesting to note that code for the warmest of warm reboots on PC is 0xcd 0x19 (int 0x19). This jumps back into the point in BIOS code when it starts looking for the operating system to boot. Needles to say, this works reliably only in pure real mode and confuses NT's NTLDR in really interesting way (the thing was obviously originally designed to be run from DOS and contains some left over logic from that time).

int_19h · on May 28, 2017

The best part was sneaking that instruction sequence somewhere in the middle of an innocuously looking C program (pointer arithmetic in DOS was truly a power to behold; and string literals are mutable in real mode!), and then submitting it to the teacher for grading.

qubex · on May 27, 2017

Please expand ”confuses in a really interesting way”.

dfox · on May 27, 2017

It produces error message along the lines of "you computer does not have enough free conventional memory to run windows NT". It is interesting that the message is actually translated into language of the NT install, says that you need 639kB of free memory for windows NT and reports whatever was the amount of free conventional memory in DOS before the "reboot" (which makes sense given the fact that at that point the original DOS is in memory in probably completely working state).

Interesting about this is that there probably is not any other way to get the same message, because NTLDR actually is not valid as anything that DOS can execute (IIRC it is not MZ image and is too large to be loaded as COM).

mycall · on May 27, 2017

Wouldn't something like HIMEM or QEMM run it?

dfox · on May 27, 2017

It is designed to run in real mode. The NT bootblock probably even loads it as if it was .COM, but it is too large for DOS's .COM loader. The actual binary is concatenation of two files, there is realmode part (on XP SP3 it is 0x4cd0 bytes long, most of which are either zeros or text strings, including the aforementioned error message) and rest of the file is 32 bit PE binary, complete with "This program cannot be run in DOS mode" stub. The PE binary part is obviously somewhat special because it has to be able to switch to real or vm86 mode in order to call BIOS and run NTDETECT.COM, which is valid DOS .COM (from quick look at the code it even looks like it takes command line arguments) except the obvious fact that it does not do any DOS calls.

Edit: The message in the XP's NTLDR says that it needs 512kB of conventional memory (I think that I've seen the 639kB number on NT4 or 2000) and there is also message along the lines "Windows NT needs 7MB of expanded memory", which is also somewhat interesting given that BugCheck 0x0000007D INSTALL_MORE_MEMORY is defined as "more than 5MB" and the fact that when you boot NT normally on computer with 4MB RAM you get the 0x0000007D BSOD, not the message from NTLDR.

qubex · on May 27, 2017

Thanks, that's fascinating.

vram22 · on May 27, 2017

>a .com created this way would instantly reboot the computer. Even more weirdly, this only worked for the specific command that happened to enter in my first attempt

Interesting that you did that by chance, since the probability was somewhat low. but possible, of course.

I used to assemble a tiny .COM program on new computers sometimes, using just DEBUG.COM (the built-in rudimentary debugger that came with DOS), using something like the following (the dash is the DEBUG prompt):

DEBUG reboot.com

-A

JMP F000:FFF0

-.

-W

-Q

Meaning of the above: A to enter Assembly mode - to enter assembly language instructions, the JMP jumps to that address (Intel x86 segment:offset format), which was the location of the instruction in ROM that would reboot the PC, the dot to signal end of assembly input, W to write the code to reboot.com, Q to quit DEBUG.

Then if you typed "reboot", it would.

webtechgal · on May 27, 2017

This!! And as I recall, if we replace -W with -G (for 'go') and hit Enter, it would immediately reboot. Then, there also used to be a flag (boolean) that could be used to effect either a warm or a cold boot. Nostalgic. :-)

vram22 · on May 27, 2017

I had forgotten about -G, and knew about warm and cold boot, but did not know about the flag. Cool, and nostalgic indeed :-)

rep_movsd · on May 27, 2017

Or you could just use

INT 19 ; (CD 19)

On my Windows 9x box it was the fastest way to soft reboot when made to be an MSDOS mode shortcut

vram22 · on May 27, 2017

Didn't know, thanks. And nice appropriate HN handle :)

int_19h · on May 28, 2017

I'll one-up that.

webtechgal · on May 28, 2017

In the context of the current thread, your handle is cool too!! As I recall, the code for up to INT 19H (or 20?) was stored in the ROM BIOS and for INT 21H and up, in the hidden system files (io.sys/msdos.sys) which would get loaded just before command.com at boot.

int_19h · on May 28, 2017

If I remember correctly, the lowest DOS interrupt was 20h. But I don't know if all interrupts below that were actually used. The highest that I remember is 1Ch for the timer interrupt (for which the stock BIOS interrupt handler just returns immediately, but which was commonly overriden by TSRs to simulate multitasking).

vram22 · on May 28, 2017

Also there were two groups of interrupts, IIRC: DOS interrupts and BIOS interrupts. BIOS ones were lower level and I guess some of the DOS interrupts used some BIOS ones. I remember a friend using BIOS interrupts to write a program to format a floppy disk and we sat there watching the drive make sounds as it did it.

int_19h · on May 31, 2017

There were four broad groups:

CPU interrupts - handlers for CPU errors like division by zero (0h) or overflow (4h).

IRQ interrupts - handlers for hardware events; e.g. hardware timer (8h) or keyboard (9h).

BIOS interrupts - for BIOS-implemented syscalls; e.g. video output (10h) or sector-oriented disk I/O (13h). Also callbacks to be invoked by BIOS, like the aforementioned 1Ch for configurable timer.

DOS interrupts - for DOS-implemented syscalls; mostly this was 21h with actual syscall ID in the register. This is where you got things like file-oriented I/O, and some basic memory management.

And then there was "everything else", meaning things installed by third party TSRs and drivers. For example, 33h for the Microsoft mouse driver (which all other mouse drivers promptly copied, so it became the de facto standard mouse API in DOS), or 3Eh for Borland's software floating point emulator.

Of these, only the first two are actually "special" as far as hardware is concerned - the rest is just convention.

vram22 · on June 6, 2017

Interesting, didn't know that there were 4 groups. I do remember INT 21H and setting a value in a register to specify what syscall you want run. Turbo Pascal had some support for assembly and even register handling, IIRC, so you could write such low-level programs in it. Also had absolute memory addressing which could be used for anything where you need to manipulate fixed, known memory addresses, such as video RAM.

webtechgal · on May 28, 2017

> using BIOS interrupts to write a program to format a floppy disk and we sat there watching the drive make sounds...

There used to be a virus/trojan that would execute the floppy drive head re-calibration interrupt (BIOS) in an infinite loop to cause the head alignment to go out of whack, rendering the drive defunct.

vram22 · on May 31, 2017

The things people will do ...

UnoriginalGuy · on May 27, 2017

I'd imagine that wouldn't have worked on NT or 2000 either. You just likely jumped from the 9x series of OSs to XP which was the first one to offer "good" memory protection. Full explanation for how you could do this here[0].

It really was the wild west back when 9x was popular. I'm genuinely surprised they ran as well as they did. NT was a vast improvement.

[0] https://en.wikipedia.org/wiki/Windows_9x#Kernel

dfox · on May 27, 2017

While NT was vast improvement, the 3.1/9x was pretty significant feat of engineering and in comparison NT is "just another OS".

In my opinion the lacking memory protection was not about the architecture itself but about the fact that it was meant to be single-user and thus there were no permissions. IIRC from Win32 code you had to jump though some hoops to be able to access whatever memory/hardware you wanted (ie. you could not overwrite random non-owned memory by mistake), the fact that you could do almost whatever you wanted in vm86 code was essentially an feature and that Win16 code could access any other Win16 process data was weird design decision that is also shared by all NT versions up to 2000 (where it is configurable).

TeMPOraL · on May 27, 2017

> the fact that you could do almost whatever you wanted in vm86 code was essentially an feature and that Win16 code could access any other Win16 process data was weird design decision that is also shared by all NT versions up to 2000 (where it is configurable).

Hey, I actually liked that! Back when Internet was young and the OS was single-user, I felt like I had more control over my own machine than I do now. Being able to mess with other processes' memory was a feature for me :).

gerard · on May 28, 2017

> The format of a COM file is... um, none. There is no format.

This makes any small file a valid COM file as far as Windows is concerned. NTVDM doesn't care, it will happily execute your holiday snaps if given the chance. It's not difficult to craft a valid GIF, PNG, etc that does something useful when executed from byte 0.

Such an image will pass most mime-sniffing protections. For example, given such an image and a "foo.png.exe" Content-Disposition header, Internet Explorer used to skip all security warnings. Combined with "Hide extensions for known file types" it would ask you where you'd like to save "foo.png", preserving the executable extension behind your back.

Upon double-click the loader notices the MZ signature is missing, fires up NTVDM, and starts executing the image from byte 0. If running under NTVDM is too restrictive, it can always break out with BOP instructions.

The lack of structure also makes COM files a simple vector for exploiting hash collisions. Any two prefix blocks with matching hash that can survive execution can be used to create two variants of an program with matching hashes. Bit differences in the two blocks can be used as switches to control program behaviour.

xg15 · on May 27, 2017

Nice info. My suspicion is this also explains why many corrupted EXE files cause the cryptic error "program is too large to fit in memory" error if you try to execute them:

The corruption mangles the header, removing the "MZ" magic number -> Windows thinks the file is a COM executable and attempts to load it into memory, COM style -> the file is not actually COM though and thus is far too large to fit into the memory available to the loader -> that error appears.

hypervis0r · on May 27, 2017

You might want to read this:

https://blogs.msdn.microsoft.com/oldnewthing/20060130-00/?p=...

xg15 · on May 27, 2017

Awesome, thanks much!

Also:

> "MZ" = the legendary Mark Zbikowski

I feel this is highly relevant to this post's discussion.

elliotec · on May 27, 2017

Spoiler alert: GP was spot-on

berlincount · on May 27, 2017

My favorite .COM-file still:

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

(yes, you can just copy-paste it into a .COM-file and it will execute and just show a text - unless you've got a correctly working virus scanner)

This is a Virus Scanner test signature (https://en.wikipedia.org/wiki/EICAR_test_file), so the fact alone that you can read this comment might be meaningful ;)

nikbackm · on May 27, 2017

Does not run on 64-bit Windows versions.

0x0 · on May 27, 2017

No .COM file does, because 64-bit Windows does not have the 16-bit NTVDM. 32-bit Windows (even Windows 10) should work.

https://en.wikipedia.org/wiki/Virtual_DOS_machine#NTVDM

molticrystal · on May 27, 2017

I miss being able to make small programs in debug on a Dos prompt and just dumping the machine code to a com file. It was a great way to test out and learn assembly language, and you didn't need a compiler or any other programs to make something that executed other than what was built in and a nice printed list of interrupts.

Some simple things I remember playing with in middle school, might work in a dosbox emulator window?:

pc speaker:

  out 61,1

change video mode:

  mov ax,0x13
  int 0x10

Then there were cool msdos debug scripts, where you had text listings of assembly with debug commands that when piped into debug it wrote out a com file:

https://www.google.com/search?q=dos+debug+script&ie=utf-8&oe...

Some could low level format hard drives, and there were lots of other fun utilities and amusements, much more so than batch files. I'd say the modern analogue is python scripts.

int_19h · on May 28, 2017

If you're feeling nostalgic about the DOS era and the associated hackery, give PCem a try:

https://pcem-emulator.co.uk/

It's an x86 emulator that works on hardware level (so from BIOS on, you're running real code that was running on those DOS machines), and with a focus on emulating the hardware from that era specifically, ranging from the earliest 8086 machines and MDA adapters, and up to 3dfx Voodoo era. Obviously this is mostly useful for gaming, but if you wanted to hack around with TASM or Borland Pascal just like in good old times, it's also great.

webtechgal · on May 28, 2017

And this one is a live/online emulator:

http://www.pcjs.org/

grkvlt · on May 28, 2017

Yeah, I remember getting a huge Microsoft reference book as a birthday present that listed all the BIOS, DOS and VGA system calls and their INT numbers and arguments. I then used it and DEBUG to write a tiny 'paint' program in ~80 bytes that did the following:

    1. entered VGA mode
    2. enabled the mouse cursor
    3. checked the mouse status
    4. turned on the pixel at the mouse location if the button was down
    5. goto 3

mojuba · on May 27, 2017

If I remember correctly, not just the files themselves but the entire memory available to COM programs at run time was limited to 64k.

It's funny how engineers think those times are long gone whereas the demand for very compact systems with low energy footprint is still there. So mastering programming within kilobytes is still a thing, don't disregard it!

dfox · on May 27, 2017

COM files were not limited to only 64k of accessible memory, but the difference was that upon loading COM file got one full preallocated 64kB segment that got used for all of data, code and stack, while MZ format files specified in their header how many paragraphs (segment register increments, ie. 16 byte blocks) they need for code and how many additional uninitialized paragraphs should follow that (minimum and maximum required).

In both cases programs were free to allocate additional memory from DOS, but by default EXE files were loaded in such way that this wasn't necessary and automatically got the largest continuous block of free memory allocated to them (in fact they got the highest addressed block of free memory, but in any sane state of DOS it was also the largest).

nils-m-holm · on May 28, 2017

> upon loading COM file got one full preallocated 64kB segment

IIRC, COM files got all available memory up to the 640K boundary and if you wanted to load additional programs, you had to shrink the preallocated region first.

pizza234 · on May 27, 2017

To be more specific, COM files were loaded into a single segment (64k) - this is where the limitation comes from.

At the time of DOS, memory access was unrestricted, so, in a basic form, one had direct access to the entire base memory (640k). DOS viruses for example frequently resided just below the top of it, and/or manipulated the interrupt descriptor table (located instead at the very beginning).

userbinator · on May 27, 2017

COM files can access all addressable memory in the system:

https://news.ycombinator.com/item?id=14427269

Remember that they run in an environment with no memory protection.

bluedino · on May 27, 2017

Your C (or more likely, Pascal) compiler just didn't allow you to use it in their 'tiny' memory model

WalterBright · on May 27, 2017

The Datalight C compiler would generate COM files that provided for 64K code and 64K data, as long as the code + static data was < 64K.

Of course, you could still access all the memory in the machine by using "far pointers" and far allocators.

int_19h · on May 28, 2017

Digital Mars C++ (!) compiler can generate COM files as output.

http://www.digitalmars.com/download/freecompiler.html

It can be amusing trying to use the C++ standard library given the 64k code size constraint. You can pretty much forget about iostreams...

Waterluvian · on May 27, 2017

So why not do something like make a shim COMMAND.COM that handles compatibility and calls the new COMMAND.EXE?

From my novice view, it seems like it would be generally simpler to reason about.

dfox · on May 27, 2017

Efficiency.

On the platforms DOS was meant for the hundred or so bytes of memory wasted by creating another process (PSP + few bytes for the actual code) was significant so the simple approach of writing trivial .COM that does EXEC(COMMAND.EXE), EXIT is wasteful. DOS does not have exact equivalent of unix-style exec() so replacing the process in memory would involve either adding that to DOS or implementing it in the .COM stub, both approaches have the slight problem that there is no way to guarantee that the block of free memory that was selected by DOS for the .COM stub (and thus starts with the PSP of our process and its starting paragraph is our PID) is large enough so the EXE would fit there.

Also such .COM stub will have to be stored on the disk as separate file which on FAT16 would probably waste most of the used cluster given the expectation that it would be few bytes long stub.

imron · on May 27, 2017

Because then you also need a shim format.com and a shim this.com and that.com and so on.

Much easier just to make the loader check the header.

Waterluvian · on May 27, 2017

Ahh yes. I misunderstood. The loader was modified, not COMMAND.COM, etc.

rocky1138 · on May 27, 2017

I was thinking a symlink would be even easier.

ygra · on May 28, 2017

NTFS gained symlink support in 2007. FAT never had it. If presented with this problem, I'd say changing the executable loader is a better solution to the problem than adding symlink support to the file system.

NewSystems · on May 28, 2017

The days of .COM/.EXE got me started in programming. I thought, gee, I have some game ideas. So I looked into BASIC, seemed great, but the distributed files required a certain flavor of BASIC. After looking into C for developing standalone binaries, I switched to C. Programming something like DOOM required 'vertical strips' calculated across the screen and other graphics gems, so I learned about graphics gems and programming. C++ and TurboPascal and then Delphi and VB6 were released. Seemed like a never ending journey!

Today we have CocosX for cross-platform sprites so games can be quickly deployed on multiple platforms, plus Unity and Unreal engines for 3D development. It's much better today, but I wonder what it's like to not have the kind of background of learning that I did.

oneplane · on May 27, 2017

Too bad that things back then done for 'compatibility' meant that things now suck because breaking changes back then weren't done. This is probably why UTC time on Windows is still broken.

jwilk · on May 27, 2017

https://www.cl.cam.ac.uk/~mgk25/mswish/ut-rtc.html

minikites · on May 27, 2017

More details on what comes after MZ: https://blogs.msdn.microsoft.com/oldnewthing/20060130-00/?p=...

TazeTSchnitzel · on May 27, 2017

Ah, so DOS did this too! I was aware that Windows NT similarly has a few “.COM” command-line utilities that are actually Portable Executables (the successor to the successor to the MZ format), for compatibility's sake.

dfox · on May 27, 2017

I find it actually somewhat hard to believe that DOS ever cared about the extension of filename passed to EXEC because the same syscall is also used (with AL != 0) for overlays and can also be used as generic MZ loader, in both these cases such files typically did not have EXE extension.

In DOS 5.0+ (at least, never tried it on earlier) and probably every version of Windows DOS's EXEC and Windows' CreateProcess()/LoadLibrary() does not care about the filename and just uses loader implementation appropriate for whatever was detected from the header.

ShellExecute() (and it's DOS "equivalent" EXEC(%COMSPEC%, "/C foo")) does care about the extension. In DOS there is hardcoded list of extensions that are executable (BAT, COM, EXE, in this order, with BAT being handled by the command.com itself. There is no way to directly execute image with different extension from the command line), while in Windows what happens depends on contents of HKCR.

One notable feature of how this is handled by DOS is that when you have both foo.exe and foo.com then foo.com takes precedence (IIRC this has something to do with the fact that traditional Windows entrypoint was called WIN.COM although it always was MZ binary).

int_19h · on May 28, 2017

If I remember correctly, the order was COM, EXE, BAT.

On NT platforms today, this is actually user-configurable via the PATHEXT environment variable - and that definitely has them in that order.

I wonder why that was the case. I remember reading some explanation about how it was about backwards compatibility, but that doesn't make much sense, since the first release of DOS 1.0 already had all three. Strictly speaking, COM as a format predates DOS (it's originally from CP/M), while EXE and BAT are more recent additions, COM files compiled for CP/M cannot run on DOS for other reasons, so there's still no compat issue there.

TazeTSchnitzel · on May 28, 2017

Having an order of precedence provides predictable behaviour and, perhaps more importantly, is simpler to implement than forcing disambiguation.

But probably it's just that they added the checks in that order and put no further thought into it.

tomcam · on May 27, 2017

For years I made a living off a batch file and compiler named Combat, then Son of a Batch, and finally Builder.

pjc50 · on May 27, 2017

Note that lots of other extensions can be interpreted as executable: https://superuser.com/questions/228680/on-windows-what-filen...

(and you can also run code out of DLL and CPL files)

johnchristopher · on May 27, 2017

Ah! I used to wonder about that when I was a kid playing with DOS. I had the vague notion that com programs were 'dumber' than exe or less complex and were often part of the operating system. And if a 'big' application was a com then it means there were a bunch of additional com or exe files in the directory.

return0 · on May 27, 2017

But wait what does "MZ" stand for ?

hannob · on May 27, 2017

The initials of the formats designer: https://en.wikipedia.org/wiki/Mark_Zbikowski

chime · on May 28, 2017

That's one way to immortalize yourself. Wonder if 'MZ' is the most common electronic initials in the world. Can't think of any other initials or acronym repeated more frequently than # of Windows Systems x # of Exe files per system.

unethical_ban · on May 27, 2017

>So when did the program loader change to ignore the extension entirely and just use the presence or absence of an MZ header to determine what type of program it is? Compatibility, of course.

Either "Compatibility" is a time, or they should have asked "why".

ajdlinux · on May 27, 2017

IIRC, when the DOS/Windows shell needs to resolve "program" to either "program.com" or "program.exe", the .com file takes precedence - ISTR this being abused by malware once upon a time.

colejohnson66 · on May 27, 2017

I'm guessing it's because `example.com` comes before `example.exe` when you sort alphabetically? So when it goes looking for `example`, it gets the list of files and takes the first matching one it sees.

vram22 · on May 27, 2017

Or maybe because .COM format was invented before .EXE format (since .COM is simpler)? Just guessing.

int_19h · on May 28, 2017

See my reply in another branch.

https://news.ycombinator.com/item?id=14433246

colejohnson66 · on May 27, 2017

Maybe someone could disassemble DOS to find out why? Sounds like a project for me this weekend

lucb1e · on May 28, 2017

Iirc, MS-DOS is open source and in some museum now.

maxlybbert · on May 28, 2017

DOS 1.1 is available online from the Computer History Museum ( http://www.computerhistory.org/press/ms-source-code.html ). But I think that may be too early to be useful.

int_19h · on May 28, 2017

They also offer DOS 2.0.

But DOS had EXE, COM and BAT files since version 1.0.

maxlybbert · on May 28, 2017

Thank you.

int_19h · on May 28, 2017

It's not alphabetic, because the full order is COM, EXE, BAT - and if it were alphabetic, it would have been BAT, EXE, COM.

justin66 · on May 28, 2017

My first experience with truly low-level programming was when I was a teenager. I saw a list of PC interrupts in BYTE magazine and was interested in the one that brought you to the ROM-based BASIC (18h), something I'd never seen. I stuck it into a text editor, saved it as a .com file, and boom! Something that took you to BASIC, which turned out not to be very useful compared to the Apple II (and its DOS) that I was used to at school, but still.

bitwize · on May 27, 2017

COM is a binary image that just gets copied into memory. Because it is a holdover from CP/M it can only occupy one 64k segment for code, data -- everything.

EXE files have a header which tells the loader whether and how they use different segments in memory. They can have much larger memory footprints than COM programs.

mungoid · on May 27, 2017

Huh.. This was one question i have always wondered but never thought to look up even after 15 years of coding..

graycat · on May 28, 2017

Good stuff. Good to know. Thanks, I needed that. Always wondered. Good explanation, 20 years late for me to read this, but better late than never.

Yes, it sounds like a COM file, with no relocation dictionary, etc., would have to be loaded at the same virtual memory address. IIRC one of the OP comments mentioned this.

Uh, more of the same, piled higher and deeper?

Okay, for my startup I decided to use the .NET Framework 4 with Visual Basic .NET and the usual acronyms SQL (structured query language) Server, ADO.NET (active data objects for using SQL Server), ASP.NET (active server pages for writing Web pages), IIS (Internet information server, which actually makes the TCP/IP calls for moving the data on the Internet and runs my VB.NET program that writes the Web pages), etc. Okay.

But eventually it dawned on me that there is a lot of code running on my Windows machine that very likely is not from Visual Basic and .NET; I'm concluding that there is an older, say, Windows 95, 98, NT development environment based on C++ and a lot of library calls, a message queue for inputs, some entry variables and call backs to them, and quite a lot of functionality not also in .NET.

E.g., when my HP laser printer quit and I got a Brother laser printer, my old HP/GL (Hewlett Packard graphics language, a cute, simple thing for printing and drawing on pages) program to print files didn't work -- the Brother printer didn't do with HP/GL just what my HP printer did.

So, via .NET I wrote a replacement for such simple printing. Well, for writing the characters to the page, object, image, whatever it was (I don't recall the details just now), all I could see to use was some GDI+ (graphical data interface or some such?) calls, but those seem not to be the usual way old Windows programs write to paper, graphics files, or the screen and instead there's something else, maybe back before .NET, and I don't know what that older stuff was. E.g., the GDI or whatever it is didn't let me actually calculate accurately how many pixels horizontally a string of printable characters would occupy when printed and, thus, all my careful arithmetic about alignment became just crude approximations -- no doubt Firefox, Chrome, Windows Image and FAX Viewer, etc. all use something better, more accurate.

So, what am I missing? Am I supposed to get out some 20 year old books by Petzold or some such, get a Windows Software Development Kit or some such, review C and C++, get lots of include files organized, etc., start with some sample code that illustrates all the Windows standard things of a standard Windows app or some such?

Okay, but it appears that .NET does not yet replace all that old stuff? I mean, is it possible to write a full function Windows app with just .NET, VB.NET/C#?

Q 1. What really is that old stuff? Where is it? Are there good tutorials? E.g., if we are to explain COM files, then let's also point to explanations of some of the other old stuff also still important?

Q 2. Is the functionality of that old stuff, for user level programs, maybe not for device drivers, by now all available via .NET but I've just not found it all yet? If so, where is it in .NET, say, for graphics and printing the .NET presentation thingy or whatever?

The code for my startup looks fine, all seems to work, does all I intended, but still I wonder about that old stuff. E.g., eventually I noticed that my favorite editor KEdit, apparently like any well behaved old standard Windows program, will let me print a file. And the printing works great! I get a standard Windows popup window that lets me select font, font size, bold or italic, etc., and it works fine with the Brother printer. KEdit knows some things about writing to a printer I didn't see in .NET. So, with just .NET, I'm missing a lot of Windows functionality?

int_19h · on May 28, 2017

> Yes, it sounds like a COM file, with no relocation dictionary, etc., would have to be loaded at the same virtual memory address.

Because of x86's real mode segmented architecture, it only has to be loaded at the beginning of a segment (i.e. at offset 0 - although actually it's offset 256, with the first 256 bytes reserved for the program segment prefix). The segment part of the address could be anything.

And since real mode segments overlap, this means that the program can be loaded anywhere at the 16-byte ("paragraph") boundary, with no need for any kind of relocation.

0x0 · on May 28, 2017

The "stuff" you are asking about is the original win32 API, which as far as I know is still the bread and butter for everything running on top of a windows install. .NET and its various frameworks like WPF or WinForms or GDI+ may be reimplementations of GUI components instead of reusing existing GDI (part of Win32) components but certainly Win32 is the gateway to anything getting done still.

graycat · on May 28, 2017

Thanks.

I got the impression that GDI+ is some slightly off on the side, goofy thing good for some things but not the main way, say, C++, Win32 programmers at Mozilla write to the screen or a printer, that there are other, really in most ways, better ways in Win32.

So, looks like I should get some documentation on Win32, scan it, see what the major functionality is, and then see if I can find that functionality in .NET class wrappers or some such so that I don't have to do mud wresting with C structs or C++ classes.

I just did a Google search on some of these keywords and found that Microsoft is now awash in developer videos, are supposed to download as MP4 and view in Windows Media Player, now talking about not just "productivity" but "creativity" in apps. Hmm ....

It took me 30 minutes of mud wrestling, two e-mail messages, lots of retrys, a phone call from one of their robots, etc. just to "sign in". They had up a page with radio buttons or round check boxes or some such with a lot of gibberish but no instructions about what the heck I was supposed to do with that page. Typical -- just try it and find out! So, to try it, just run the mouse over and click on everything in sight, hope don't cause the universe to shrink to nothing, and see what can do. With enough such exploratory clicking, I was able to "sign in".

Hmm .... Microsoft is failing in basic, old UI lessons about being clear what are asking users to do, getting a low grade in basic functionality, but asking their devoted, dedicated developers to move from "productivity" to "creativity", e.g., where a screen now looks like some "acrylic" surface with some nice emotional impressions? Hmm.

Uh, Gates, buddy, art is one heck of a challenging field to get right! Not so easy to keep up with Michelangelo, Bernini, Bach, Wagner, Degas, Renoir, etc.!

Why don't you plan to return to art and creativity after you are able to get sign ins on your Web sites working and to write documentation that does well explaining your current work including .NET and Win32?

PS

Okay, at Google I started typing "Win32 dev ..." or some such and right away got a little window of a search for Win32 development tutorials, or some such. Sooooo, that search is very popular!

I picked Microsoft page

https://msdn.microsoft.com/en-us/library/bb384843.aspx

which is

Walkthrough: Creating WindowsDesktop Applications (C++)

which looks like a quite good start!

By now I've used enough Win32 applications to appreciate that Microsoft has had a framework, starter program, standard first program or some such with all the key stuff for printing, using the system clipboard, getting data from the mouse and keyboard, etc. All that stuff is rock solid standard so don't program it, maybe even don't try to study it, and just USE it as it is because it is needed and it WORKS and getting the same just from documentation might require massive mud wrestling! ]

The page does say that Win32 is not the up to date or some such way to develop. Okay, sure, for 64 bit addressing, can believe that Win32 won't work! So, this Microsoft comment seems to imply that can get all the Win32 functionality and more in 64 bit addressing with C#, VB.NET, etc. Okay. Maybe if I keep reading I'll see how to do that, too!

PPS

Yup, suddenly a little more of the world is starting to make sense!

Just Google search

     Win32 .NET

right away gave

https://msdn.microsoft.com/en-us/library/aa302340.aspx

with

Microsoft Win32 to Microsoft .NET Framework API Map

with "applies to"

Microsoft® .NET Framework version 1.0 or 1.1

which is essentially just what I hoped existed! And the page does fairly strongly suggest that anything that can be done in Win32 can also be done .NET! And from the 1.0 and 1.1, that's old, that is, apparently .NET offered essentially all the Win32 stuff right from the beginning of .NET! Since my code was written with .NET 4, I should be able to include in my code essentially any Win32 functionality I want!

GOOD! Thank you Microsoft! Looks good! Took me a while to find, but, again, looks good.

And it's clear from how Google responded I'm not nearly the first looking for that info!

Great! Now I don't have to get out a 20 year old book from Pezold! And I get to continue to f'get about C and C++ and stay with VB.NET, or C# if Microsoft insists -- I hope they don't drop VB.NET!

Good! And it's a nice looking Web page, printed, 64 pages long, that is, not short!

Maybe I've now answered my own questions!

dwarman · on May 29, 2017

Correct me if I'm wrong, but I seem to be the only person here to have weathered the entire evolution from 8008 on, for a living not as a hobby (though is both for me).

I am possibly approaching Senior Curmidgeon status, but I mean well. Even the early chips definitely required RTFM and some serious visualization chops, there are millions of bad and very bad things in a random poke, those two help you probe their borders instead of blue screening your brain. I see examples of ignoring both boasted about here. It's not lame to study before committing, and the needed info has never been secret. Please forgive me this, if you feel targeted. I am 70 years old, and am still doing this shit, on modern multicore 64 bit architectures, and would not be here without doing what I say. RTFM, run code in your head, when you got both start runnin the real thing. Short term nor for the impatient, but long term?

I wrote in ASM up to i286, and always had the specifications handly. Never poke around randomly executing nonsense code. Too easy to trash your HD, if you had one. Or find the real "Halt and Catch Fire". Messy. The books were easily avaiable, back to way before even the 4004, no excuse for not using them to run your probes in the virtual model of the chip implanted in your head from reading that book. The first experiments should then be in areas of confidence with valid data and no addressing violations. Steps, not f*it leaping off a cliff into clouds.

8080 was the last small space part, with only a total addressable 64K. I consider it the root of all evil, as it were, because it set the parameters for 64K limited program. All else flows from there because of the decision to maintain backward binary compatibility. So it was wrapped in redirectors and segment registers in an expanded 1M space, without any security or permissions management. Which would cost peanuts, a flag and some specific address decoder disables. Simple days.

I often took advantage of COM runtimes in those days. Often, and actually most usefully, without a DOS or RTOS between me and the metal. It was trivial to write a serial interface COM loader, and drop my FORTH into it for itelligent debugging. Or anything else, investigative or command.

The freedom started to go away with extra wrappers in 80286, which among other things intoduce "Real" and "Protected" execution modes, plus some extended native addressing modes. Starting to grow up, can still run 8008 COM code, but in that mode it sees only COM compatible register set, and has no access to the execution mode flag. Much "safer", sorta. Except still gets the 32 bit 808x indirect instruction address modes, if invoked in Real Mode.

Heard about "non-modal design"? 68000 got much closer. Every additional wrapper introduces large arrays of data re-routing and management logic. Needs a restart in design and philosopy. dropping COM support hardware and only allowing virtual machines to play is today's answer, getting closer, and much safer. And as core counts go up, have only latest clean architectures run on most cores, no wrappers, and a dedicated legacy core. Or chip. with FPGGA run time loadable architecture for that, agan no wrappers. But it may alreadt be too late. Too much investment that may get broken in amusing ways.

After the i286, the i386 on on to the present have foccussed more on speed and size of, and access to, Protected mode, runtimes, sharing memory, and verious support tricks. All carefully avoiding disturbing the ability to run by now ancient OS utities, buried deep in the underground of the OS. till WinNT, Win32, and the Pentium etc IIRC. Though I swear I still see several in the procces list on W7.