Thursday, March 09, 2006

ASM: Working with the Video Palette

I finally figured out how I screwed up this little program I wrote in ASM. I was trying to experiment a little with working with the color palette, and I finally remembered that in mode 13h, the 256 color palettes RGB components are actually 6 bits, not 8 bits. So when I was looping through 0 to 256, it caused all sorts of unexpected behaviors. What a PITA.

Anyway, below is the final result. This program doesn’t do anything really fancy. It simply switches to mode 13h, colors the screen all 1-palette entry, and then gradually increases the palettes blue byte from min to max, causing the screen as a while to go from black to blue. This also gave me a chance to work with the DOS interrupt for retrieving the time to cause a slight delay in the loop.

0BD0:0100 B81300        MOV     AX,0013
0BD0:0103 CD10          INT     10
0BD0:0105 B800A0        MOV     AX,A000
0BD0:0108 8ED8          MOV     DS,AX
0BD0:010A 8EC0          MOV     ES,AX
0BD0:010C 31FF          XOR     DI,DI
0BD0:010E B80303        MOV     AX,0303
0BD0:0111 B9F000        MOV     CX,00F0
0BD0:0114 51            PUSH    CX
0BD0:0115 B9A000        MOV     CX,00A0
0BD0:0118 F3            REPZ
0BD0:0119 AB            STOSW
0BD0:011A 59            POP     CX
0BD0:011B E2F7          LOOP    0114
0BD0:011D 31DB          XOR     BX,BX
0BD0:011F FEC7          INC     BH
0BD0:0121 B94000        MOV     CX,0040
0BD0:0124 BAC803        MOV     DX,03C8
0BD0:0127 B003          MOV     AL,03
0BD0:0129 EE            OUT     DX,AL
0BD0:012A 42            INC     DX
0BD0:012B B000          MOV     AL,00
0BD0:012D EE            OUT     DX,AL
0BD0:012E EE            OUT     DX,AL
0BD0:012F 88F8          MOV     AL,BH
0BD0:0131 EE            OUT     DX,AL
0BD0:0132 FEC7          INC     BH
0BD0:0134 90            NOP
0BD0:0135 90            NOP
0BD0:0136 50            PUSH    AX
0BD0:0137 53            PUSH    BX
0BD0:0138 51            PUSH    CX
0BD0:0139 52            PUSH    DX
0BD0:013A B42C          MOV     AH,2C
0BD0:013C CD21          INT     21
0BD0:013E 89D3          MOV     BX,DX
0BD0:0140 BA1200        MOV     DX,0012
0BD0:0143 01D3          ADD     BX,DX
0BD0:0145 CD21          INT     21
0BD0:0147 39D3          CMP     BX,DX
0BD0:0149 7DFA          JGE     0145
0BD0:014B 5A            POP     DX
0BD0:014C 59            POP     CX
0BD0:014D 5B            POP     BX
0BD0:014E 58            POP     AX
0BD0:014F E2D3          LOOP    0124
0BD0:0151 B80300        MOV     AX,0003
0BD0:0154 CD10          INT     10
0BD0:0156 B8004C        MOV     AX,4C00
0BD0:0159 CD21          INT     21

So lets go through each section of the program.

0BD0:0100 B81300        MOV     AX,0013
0BD0:0103 CD10          INT     10

This section, of course, switches the video mode to mode 13h, which is 320x240x256.

0BD0:0105 B800A0        MOV     AX,A000
0BD0:0108 8ED8          MOV     DS,AX
0BD0:010A 8EC0          MOV     ES,AX
0BD0:010C 31FF          XOR     DI,DI

Here I am setting up ES:DI to point to the video memory at A000:0000 so I can draw directly to the screen. This is much faster than using the BIOS interrupts to draw to the screen.

0BD0:010E B80303        MOV     AX,0303
0BD0:0111 B9FF0F        MOV     CX, 00F0

Here I am setting up my AX register to put palette entry number 3 to the screen. I am using the full word size so the program will only have to loop half the number of times. Then I am setting up CX to loop 240 times, which is the vertical resolution. This will loop another loop that will execute 320 times. So the inside loop will actually loop a total number of  76800 times. I could have set up a single loop to execute once instead of doing the outer loop, inner loop code, but I wanted to work with a few extra instructions.

0BD0:0114 51            PUSH    CX
0BD0:0115 B94001        MOV     CX, 00A0
0BD0:0118 F3            REPZ
0BD0:0119 AB            STOSW
0BD0:011A 59            POP     CX
0BD0:011B E2F7          LOOP    0114

This is the inner loop. First I push the value of CX to the stack so we do not lose the value of the outer loops counter. Then I move the value of 160 (A0h) into CX to represent the horizontal resolution (I’ll explain why in a minute). Then I am going to repeat the STOSW instruction the 160 times, which will put the value of AX (0303) into the memory pointed to by ES:DI (A000:0000). So why use 160 instead of the horizontal resolution of 320? The reason is that I am using a word instruction (STOSW) instead of a byte instruction (STOSB). So I only need to loop through the inner loop half as many times since I am working with a larger unit of data. Once the inner loop is complete, I retrieve the current value of the outer loop counter from the stack, and loop back to the PUSH instruction. This will automatically decrease that outer loop counter by 1, and repeat the inner loop again.

0BD0:011D 31DB          XOR     BX,BX
0BD0:011F FEC7          INC     BH
0BD0:0121 B94000        MOV     CX,0040

What I am doing here is preparing another loop to ramp up through the blue spectrum for the palette entry I drew to the screen. First I clear out BX using the XOR technique, then I increment BH by 1. I could have simply moved 0100h into BX, but I wanted to demonstrate the XOR technique, and using the INC instruction. Then I move the value of 64 (40h) into CX. This represents the full spectrum of blue.

0BD0:0124 BAC803        MOV     DX,03C8
0BD0:0127 B003          MOV     AL,03
0BD0:0129 EE            OUT     DX,AL

Here I am getting ready to work directly with the VGA cards IO port. Port 3c8h tells the video card that I want to set a palette entry, which is set in register AL. Then I use the OUT instruction to output to that port.

0BD0:012A 42            INC     DX
0BD0:012B B000          MOV     AL,00
0BD0:012D EE            OUT     DX,AL
0BD0:012E EE            OUT     DX,AL
0BD0:012F 88F8          MOV     AL,BH
0BD0:0131 EE            OUT     DX,AL
0BD0:0132 FEC7          INC     BH

Now we will output the new palette entries directly to the video card. Rather than moving the port 3c9 into DX, I simply increase it by 1. This method, however, will not work when reading from the palette since the signal port to read from the VGA card is 3c7 and the data port is 3c9. Next I set AL to 0, since I want red and green to be absent from the increase in color. I could have XORed AL, but I wanted to demonstrate moving 0 into AL as an alternative method. If we compare the size, we see that an XOR and a MOV are roughly the same size. Word of mouth is that XOR two registers to zero them is slightly faster, but I have never been able to confirm this myself. Now that AL is set, I output the red and green values with the two OUT instructions. Then I move BH to AL, since I am using BH to store the current value that I want to set blue to. Once I OUT to the VGA card, I increase BH by 1 for the next iteration of the loop.

0BD0:0134 90            NOP
0BD0:0135 90            NOP

This little bit of code was a booboo that I fixed. I had left in an instruction that was unnecessary, so I replaced that instruction with 2 NOP instructions. This is a useful instruction to keep in mind if you get into software cracking. I won’t elaborate on why that is though ;)

0BD0:0136 50            PUSH    AX
0BD0:0137 53            PUSH    BX
0BD0:0138 51            PUSH    CX
0BD0:0139 52            PUSH    DX
0BD0:013A B42C          MOV     AH,2C
0BD0:013C CD21          INT     21
0BD0:013E 89D3          MOV     BX,DX
0BD0:0140 BA1200        MOV     DX,0012
0BD0:0143 01D3          ADD     BX,DX
0BD0:0145 CD21          INT     21
0BD0:0147 39D3          CMP     BX,DX
0BD0:0149 7DFA          JGE     0145
0BD0:014B 5A            POP     DX
0BD0:014C 59            POP     CX
0BD0:014D 5B            POP     BX
0BD0:014E 58            POP     AX
0BD0:014F E2D3          LOOP    0124

Now this whole section of code is a little tricky to understand. First I push all registers into the stack. One higher processors, there is a PUSHA instruction that will do this automatically, but the 80x86 that is being emulated in debug does not have this instruction, so I have to do it manually. Keep track of which order registers are pushed into the stack since the stack is a Last In-First Out data structure. So when I pop the registers again, DX will be the first one out.

Next I prepare to call DOS function 2C. Most DOS functions are under Interrupt 21h. This will get the current time, and store them in several registers. However, I am only interested in register DX, which stores the seconds and milliseconds. I move DX into register BX for later comparison. Next I move the value of 12h (18) into DX and add it into register BX. The reason I did this is because for some bizarre reason, adding directly was causing bugs to occur in this routine. Then I call the function again, compare against the value in BX. If the current time has exceeded the time stored in BX after the addition of 18 milliseconds, then the loop can continue, otherwise, repeat the loop until it does. While this is not exactly the most effective way to do a delay, it works for this program. Once complete, restore the values of all the registers. Then, I loop all the way back up to the code outputting the data to the VGA port.

0BD0:0151 B80300        MOV     AX,0003
0BD0:0154 CD10          INT     10
0BD0:0156 B8004C        MOV     AX,4C00
0BD0:0159 CD21          INT     21

This will set the video mode back to 80x25 text mode and exit to DOS.

Although many consider DOS to be obsolete, I find that DOS has incredible educational value. Since DOS is not as controlled as current OS’s, you can directly access memory and hardware for experimentation. While in its heyday, this was the cause of many of the headaches of working with DOS, it also helps students understand the fundamentals of hardware IO in an unrestricted environment. I have yet to build me a lab PC with DOS on it to build some circuits I want to demonstrate and control them via the PC Parallel port. For the time being I will have to live with working in VMWare and hold off on the circuits.

No comments: