Jump to content

XP running on a 486 cpu


Dietmar

Recommended Posts

Hi,

I try to install XP SP3 on the Shuttle Hot 433 board with 486 cpu.

But very early in Setup comes a message, that the 486 cpu does not support the hex opcode cmpxchg8b  and so XP cant be installed.

I also try an XP SP3 from another compi in IDE mode, crash at once.

Now I look at the hex wíth Ida pro for this cmpxchg8b on an ready XP SP3 install.

On a first try I find it in ntoskrnl.exe (one cpu) and in ntdll.dll.

There may be other PE files in XP also with this opcode.

The use is always the same. This opcode does a atomic search in a register.

So, when a working solution is found, the replacement in other files is easy!

I try to replace it with a series of opcodes, that the 486 cpu understands.

This is not easy.

I found this (Edit: This is wrong).

push    ebx                 ; save nonvolatile registers
    push    ebp

    xor     ebx, ebx            ; zero out new pointer
    mov     ebp, ecx            ; save listhead address
        mov     edx, [ebp] + 4          ; get current sequence number
        mov     eax, [ebp] + 0          ; get current next link


Efls10:
    or      eax, eax            ; check if list is empty
    jz      short Efls20        ; if z set, list is empty
    mov     ecx, edx            ; copy sequence number
    mov     cx, bx              ; clear depth leaving sequence number


        jnz     short Efls10            ; if z clear, exchange failed

Efls20:
    pop     ebp                 ; restore nonvolatile registers
    pop     ebx

    ret

This I try as a replacement for this function ExInterlockedFlushSList in ntoskrnl.exe in XP SP3.

The funny thing in this is, that simple the opcode cmpxchg8b qword ptr [ebp+0]  is deleted. May be it works on NT4 but for me it crashes XP.

EDIT: May be, that this version for i368 cpu of ExInterlockedFlushSList   works really only on a compi with 1 cpu and 1 core. Like in 1992 486 cpu.

Then, my test on modern compi will fail.

Also can be, that now I use a mix of cmpxchg8b, nothing from this, cmpxchg on one compi, because I simulated only one appearence of this function in ntoskrnl.exe. Funny, this is from Cutler, 13. March 1996, now also identic in XP SP3,

THis is the original ExInterlockedFlushSList in XP SP3, first introduced in NT4 Servicepack4,

Hex code 53 55 33 DB 8B E9 8B 55 04 8B 45 00 0B C0 74 0B 8B CA 66 8B CB 0F C7 4D 00 75 F1 5D 5B C3

.text:0040B0B2 ; Exported entry   7. ExInterlockedFlushSList
.text:0040B0B2
.text:0040B0B2 ; =============== S U B R O U T I N E =======================================
.text:0040B0B2
.text:0040B0B2
.text:0040B0B2                 public ExInterlockedFlushSList
.text:0040B0B2 ExInterlockedFlushSList proc near       ; CODE XREF: sub_45F0DF:loc_45F0F7p
.text:0040B0B2                 push    ebx
.text:0040B0B3                 push    ebp
.text:0040B0B4                 xor     ebx, ebx
.text:0040B0B6                 mov     ebp, ecx
.text:0040B0B8                 mov     edx, [ebp+4]
.text:0040B0BB                 mov     eax, [ebp+0]
.text:0040B0BE
.text:0040B0BE loc_40B0BE:                             ; CODE XREF: ExInterlockedFlushSList+19j
.text:0040B0BE                 or      eax, eax
.text:0040B0C0                 jz      short loc_40B0CD
.text:0040B0C2                 mov     ecx, edx
.text:0040B0C4                 mov     cx, bx
.text:0040B0C7                 cmpxchg8b qword ptr [ebp+0]
.text:0040B0CB                 jnz     short loc_40B0BE
.text:0040B0CD
.text:0040B0CD loc_40B0CD:                             ; CODE XREF: ExInterlockedFlushSList+Ej
.text:0040B0CD                 pop     ebp
.text:0040B0CE                 pop     ebx
.text:0040B0CF                 retn
.text:0040B0CF ExInterlockedFlushSList endp
.text:0040B0CF
.text:0040B0CF ; ---------------------------------------------------------------------------

With PE Maker I make a relocate of this function in ntoskrnl.exe.

This works(!).

The relocation I do, because the following replacement is bigger than the original Hex code.

I split the cmpxchg8b opcode in 2 parts with lock cmpxchg,

because the 486 cpu understands this. But Bsod. I use Windbg, cant fetch the reason.

I check my hex code several times, find no error. The only thing in my eyes that can happen, is a missing syncronic between the 2 cmpxchg.

This does not happen on cmpxchg8b, because all memory is blocked during this operation.

Here is my last try for the replacement of the ExInterlockedFlushSList

 

.data:004762B2 ; ---------------------------------------------------------------------------
.data:004762B2 ; Exported entry   7. ExInterlockedFlushSList
.data:004762B2
.data:004762B2                 public ExInterlockedFlushSList
.data:004762B2 ExInterlockedFlushSList:                ; CODE XREF: sub_45F0DF:loc_45F0F7p
.data:004762B2                                         ; DATA XREF: .edata:off_5AC2A8o
.data:004762B2                 push    ebx
.data:004762B3                 push    ebp
.data:004762B4                 xor     ebx, ebx
.data:004762B6                 mov     ebp, ecx
.data:004762B8                 mov     edx, [ebp+4]
.data:004762BB                 mov     eax, [ebp+0]
.data:004762BE
.data:004762BE loc_4762BE:                             ; CODE XREF: .data:004762D5j
.data:004762BE                 or      eax, eax
.data:004762C0                 jz      short loc_4762DA
.data:004762C2                 mov     ecx, edx
.data:004762C4                 mov     cx, bx
.data:004762C7                 lock cmpxchg [ebp+4], eax
.data:004762CC                 mov     ecx, edx
.data:004762CE                 mov     edx, ecx
.data:004762D0                 lock cmpxchg [ebp+0], eax
.data:004762D5                 jnz     short near ptr loc_4762BE+1
.data:004762D7                 nop
.data:004762D8                 nop
.data:004762D9                 nop
.data:004762DA
.data:004762DA loc_4762DA:                             ; CODE XREF: .data:004762C0j
.data:004762DA                 pop     ebp
.data:004762DB                 pop     ebx
.data:004762DC                 nop
.data:004762DD                 nop
.data:004762DE                 nop
.data:004762DF                 retn
.data:004762DF ; ---------------------------------------------------------------------------

I put this via relocation to the new address 4762B2. This is in .data section and not in .text section. But this does not matter, because when I put the original Hex code to this new place, it works. The original place at 40B0B2 I fill with 00 00 00.. for to make sure, that now my function at this new place is used.

I want to get better in Assembler. No free KI for Assembler in Internet. Do you have an idea @Mov AX, 0xDEAD?

Chatgpt, Bard AI and Bing behave like crazy, when it comes to Hex code

Dietmar

Edited by Dietmar
Link to comment
Share on other sites


Surely people at Vogons have tried similar things. I just did a quick search with the terms: ""site:vogons.org "windows xp" 486 cmpxchg8b""

See for example post from KCompRoom2000 here:

https://www.vogons.org/viewtopic.php?t=82914

EDIT: Also the link to the POD tests at winhistory.de here:

https://www.vogons.org/viewtopic.php?t=75778

PS, this is my 486 system, With DOS and Windows 95: https://www.vogons.org/viewtopic.php?p=1117089#p1117089

Edited by gerwin
Link to comment
Share on other sites

Hi,

I found also this but have no idea how to make a simulation for 486 cpu from it, because it has an retn, a second retn is not good in a function

Dietmar

    the single instruction

        lock    cmpxchg8b qword ptr [ebp]

is replaceable with the following sequence

        pushfd
try:
        cli
        lock    bts dword ptr [edi],0
        jnb     acquired
        popfd
        pushfd
wait:
        test    dword ptr [edi],1
        je      try
        pause                   ; if available
        jmp     wait

acquired:
        cmp     eax,[ebp]
        jne     keep
        cmp     edx,[ebp+4]
        je      exchange
keep:
        mov     eax,[ebp]
        mov     edx,[ebp+4]
        jmp     done

exchange:
        mov     [ebp],ebx
        mov     [ebp+4],ecx
done:
        mov     byte ptr [edi],0
        popfd

and this

        lock    cmpxchg8b qword ptr [esi]

is replaceable with the following sequence

        pushfd
try:
        cli
        lock    bts dword ptr [edi],0
        jnb     acquired
        popfd
        pushfd
wait:
        test    dword ptr [edi],1
        je      try
        pause                   ; if available
        jmp     wait

acquired:
        cmp     eax,[esi]
        jne     keep
        cmp     edx,[esi+4]
        je      exchange
keep:
        mov     eax,[esi]
        mov     edx,[esi+4]
        jmp     done

exchange:
        mov     [esi],ebx
        mov     [esi+4],ecx
done:
        mov     byte ptr [edi],0
        popfd

 

Link to comment
Share on other sites

well you certainly can translate this command to a 32 bit variant code

you already have used the "cmpxchg" assembly command
but it actually should do the wrong job sometimes
because that compares up only 32 bits (and then already react to the 32 bits) (if that compare was the same or not already changed the result
because it can already react to either the first 32 bits or the next 32 bits)
(.data:004762D5                 jnz     short near ptr loc_4762BE+1  - that done again erased the first 32 compare results and only react to the next 32 bits compare)

but you need the result for 64 bits compare! 


it seems to me that you can also solve this problem by :

making 2 compares "cmp" commands for the flags/reaction

now it is about not to make the same mistake (if you do just the 32 bit compare again it reads the next 32 bits and ignored the first 32 bits
from the first compare)
you need a reaction to the first compare (if that was the case) 
and making the "cmp" command again and react a second time


if both compares was correct you make the reaction just as described (else the other described reaction) : 

https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b

that command description actually dont say something about exchanging the values
it just says that if the 64 bit compare was equal  

it says "if the compare was equal the values in it stores the data in ECX and EBX 
in other case in EDX EAX 
(what dont look a exchange for me) - maybe the description lacks (what i useally do then i try it out and take looks)


// if it would be an exchange it would be:

(later reading the code i dont see a common exchange 
a common exchange would be if eax would be changed to edx - eax having eax and edx having eax):
4 assembly "mov" commands (2 for the destination and 2 for the source) 

or:
2 times the "xchg" command 
// 

but ! looking the assembly code from you it seems different to me 
i dont see a exchange (just let me say im not entire certain here, but it might helps to talk about that):

the cmpxchg8b command seems to compare registers EDX and EAX for equal 
and then changing an offset to a memory location (stack register two "EBP") (qword ptr [ebp+0]) (qword useally describes a 64 bit movement (word * 4 (16 bits * 4)) 
if that result was equal it should store EAX and EDX to that offset (otherwise it probaly loads that values to EDX EAX)

the next command is "jnz" that command still has the results from this compare, if they was equal it jumps back to "Efls10" (what seems a loop to me)
if not it continues the end and and this function


seeing your code again "lock cmpxchg [ebp+4], eax" dont have a reaction but it might need (as said before it need a reaction to both of the 32 bits)
if that was not the case it need to end this (not always just continue)
done that way the first 32 bit can have a false result - and if the next 32 bit are right - then it just still do the job - while it should not

---------------------
if the 64 bit guys apear, that is not neccesary needed

if you have to use more then 32 bits there are severial methods you can solve this (to name a few)


1: 
one is using 2 registers and just create its behavoir for that

there is a such 32 bit assembly command that is used for that ( CDQ - Convert Word to Doubleword/Convert Doubleword to Quadword ) 


2:
an offset to somewhere in memory that is bigger then 32 bits and control it as 64 bits 

3 (even more is possible with a offset location):
if you have more then 64 bit flags you just need an offset to a location , where you actually control the flags/ or data


4: 
for file movements there is for the REP command 

the CPU actually can see that it has to move a certain amount of data, and the cpu can translate the filemovement to something it actually can progress

the FSB (quad pumped) to the RAM is doing a such thing

unlike the 64 bit guys might would think you dont need a 64 bit offset for this 

a other example would be the CACHE, HDD´s use a CACHE to fill up the data 

that data can then be progressed differently - like with 2 bit(wires), 4 bit, 16, 32, 64 or even more (it rather comes down what the physical cable/wire can do)

Link to comment
Share on other sites

Posted (edited)

Now I will describe as good as I can the work of the function ExInterlockedFlushSList in XP SP3.

cmpxchg8b works on 64  contiguous bits. Those 64 bits (8 bytes) stand in memory (RAM) of the compi at a given place.

Those 64 bits are here given indirect to cmpxchg8b by the 32 bit register EBP on the cpu.

In EBP stands a 32 bit address, which points exact to the first byte from those 64bit.

Even EBP holds in XP only a 32-bit address,

cmpxchg8b qword ptr [ebp+0] works from the RAM location given by ebp for all the 64bit from there.

The cmpxchg8b instruction works now directly on these 64bits in memory.

So we have cmpxchg8b qword ptr [ebp+0].

Example:The 64bits in memory are 0x1122334455667788.  11223344 are the higher 32bit. 55667788 the lower 32bit.

In EAX stand 0x55667788 in EDX stand 94712056 (any values).

Now only the 32bits in EAX are compared via cmpxchg8b with the 64 bit in ram. (Only each lower 32bit compare.)

This behavior is, because we have a 32bit OS.

The higher bits in EDX are just ignored. Also those higher 32bits from the 64bit in Ram.

By the way this means, that when we use "lock cmpxchg"  in a simulation, it is without any sense to use  "lock cmpxchg"  2 times. Here we need the "lock" because only cmpxchg8b is from home atomic, means no other processor can disturb the memory during its comparing operation. This is only garanted for cmpxchg with the lock before it.

 

In my example we have the case, that the lower 32 bit in Ram and in EAX are identic.

In this case, the lower 32 bits (of the 64-bit value in memory)  will be replaced with the 32 bits stored in ebx.

But EBX = 00 00 00 00. This means, the real list in memory is filled to half from botten with 00. From a 32 bit view, this list is now empty at all.

The higher 32bit in Ram are not changed, whatever is there, whatever is in EDX.

The Zero flag is set after a change happens.

If the bits in EAX and the lower 32bits in Ram from the 64 bits are not identic,

cmpxchg8b will do nothing with the 64 bit in memory and also change nothing in EAX, EDX, EBX, ECX, EBP.

So, in this case cmpxchg8b has the same effect as 90 90 90 90. The Zero flag is NOT set.

Now I see, what happens with my try, when I just replace cmpxchg8b qword ptr [ebp+0] with 90 90 90 90.

At once I have an infinite loop, because no Zero flag is set.

Unclear for me, why there is this loop. I n my eyes, in a first try the both lower 32 bit pairs are identic and exchanged against 00 00 00 00.

Edited by Dietmar
Link to comment
Share on other sites

Posted (edited)

Now we come to the whole work of the function ExInterlockedFlushSList in XP SP3.

This function starts after its call with

push    ebx                              ; Push value of the ebx register to the stack to rescue its content there, its value is not changed.
push    ebp                             ; Push value of the ebp register to the stack to rescue its content there, its value is not changed.
xor     ebx, ebx                        ; Set the ebx register to zero (EBX = 00 00 00 00)  by performing a bitwise XOR operation with itself.
mov     ebp, ecx                      ; Copy value of the ecx register in the ebp register (ECX value has to be prepared outside this function).
mov     edx, [ebp+4]                ; Copy the high 32-bit value stored at the RAM address [ebp+4] into the edx register (ebp is new from above ecx).
mov     eax, [ebp+0]                ; Copy the low 32-bit value stored at the RAM address [ebp+0] into the eax register (ebp is new from above ecx).

 Now we have empty ebx, and the lower 32bit in ram from the address of ecx, and the higher 32bit from the address from ecx.  

 

or eax, eax                 ; If eax was zero, the zero flag will be set. If eax was non-zero, the zero flag will be cleared.

jz short loc_4762CD  ; If EAX was zero, we overjump (short) all of the compare, to address 4762CD.

mov ecx, edx              ; Now we move the content of edx to ecx. The content of ecx is lost, the content in edx is still kept. But the content of ecx is (see before) already rescued in ebp. in ECX are now the higher 32 bit from 64 bit in Ram.

 

mov cx, bx     ; cx represents the lower 16 bits of the ecx register. bx represents the lower 16 bits of the ebx register.

mov cx, bx copies the content of the lower 16 bit of the ebx register (bx) into the lower 16 bit of the ecx register (cx).

The upper 16 bits of both ebx and ecx remain unchanged. This means: In ECX now only the 2 highest Byte survive from the 64 bit in memory. They can be 00 00 also. So, it is not impossible, that ECX = 00 00 00 00 , but only when the 2 highest bytes from the 64 bit in memry are also 00 00.

Example:

EBX = 0x12345678    (upper 16 bit: 0x1234, lower 16 bit: 0x5678)
ECX = 0x98765432    (upper 16 bit: 0x9876, lower 16 bit: 0x5432)

Now mov cx, bx

EBX remains unchanged (0x12345678).

ECX will have only its lower 16 bit replaced with the lower 16 bit  from bx = 0x5678.

The upper 16 bit of ECX will remain the same (0x9876).

So, this is the only change from mov cx, bx is in this example

ECX = 0x98765678

 

jnz short loc_4762BE    ; If the operation cmpxchg8b qword ptr [ebp+0] changes Ram via EBX, the Zero flag is set.

Then, we go out of the loop, just next opcode after this jnz short loc_4762BE  instruction.

If the bits in EAX and the lower 32bits in Ram from the 64 bits are not identic, the cmpxchg8b qword ptr [ebp+0] does just nothing with any memory or register. But the Zero flag is not set.

So, the jump to loc_4762BE happens.

 

pop ebp  ;Fetches the topmost value from the stack and store it in the ebp register and delete its value on top of stack.

pop ebx  ; Fetches the now topmost value from the stack, store it in the ebx register. Delete this value on stack.

retn         ; Return from the  function ExInterlockedFlushSList  to the caller.

And delets the return address from the stack (the address where the function was called from).

Jumps to the popped return address, effectively resuming execution from the point where the function was called.

Edited by Dietmar
Link to comment
Share on other sites

Posted (edited)

Here is the from me relocated function ExInterlockedFlushSList from XP SP3

.data:004762B2 ; Exported entry   7. ExInterlockedFlushSList
.data:004762B2
.data:004762B2 ; =============== S U B R O U T I N E =======================================
.data:004762B2
.data:004762B2
.data:004762B2                 public ExInterlockedFlushSList
.data:004762B2 ExInterlockedFlushSList proc near       ; CODE XREF: sub_45F0DF:loc_45F0F7p
.data:004762B2                                         ; DATA XREF: .edata:off_5AC2A8o
.data:004762B2                 push    ebx
.data:004762B3                 push    ebp
.data:004762B4                 xor     ebx, ebx
.data:004762B6                 mov     ebp, ecx
.data:004762B8                 mov     edx, [ebp+4]
.data:004762BB                 mov     eax, [ebp+0]
.data:004762BE
.data:004762BE loc_4762BE:                             ; CODE XREF: ExInterlockedFlushSList+19j
.data:004762BE                 or      eax, eax
.data:004762C0                 jz      short loc_4762CD
.data:004762C2                 mov     ecx, edx
.data:004762C4                 mov     cx, bx
.data:004762C7                 cmpxchg8b qword ptr [ebp+0]
.data:004762CB                 jnz     short loc_4762BE
.data:004762CD
.data:004762CD loc_4762CD:                             ; CODE XREF: ExInterlockedFlushSList+Ej
.data:004762CD                 pop     ebp
.data:004762CE                 pop     ebx
.data:004762CF                 retn
.data:004762CF ExInterlockedFlushSList endp
.data:004762CF
.data:004762CF ; ---------------------------------------------------------------------------

 

Edited by Dietmar
Link to comment
Share on other sites

Posted (edited)

And now the explanation, what this function ExInterlockedFlushSList is doing in real:

The calling function gives the register ECX to this function ExInterlockedFlushSList.

In ECX stays the information of the startpoint for a 64 bit list in memory.

Now the function ExInterlockedFlushSList checks 2 scenarios: ECX=NULL is given back to the calling function, which means, that never such a list existed, because EAX=0. The second scenario is, that EAX is not NULL. In this case, the ONLY thing, that the function ExInterlockedFlushSList is doing, is to delete the pointer in the register ECX. But the first 2 highest bytes are stored in ECX. So, mostly ECX is not Null, only when the highest 2 Byte are 00 00.

The list itself stays untouched in memory. But now, the calling function has lost all information about the place in memory about this list, because the work of  ExInterlockedFlushSList on ECX. And it cant be repaired from the calling function via ECX, because ECX contains only 2 highest 2 bytes from the 64 bit in Ram. The whole list is kept in Ram and also with its higher 32 bit in EDX and the higher 32 bit in EAX.

Edited by Dietmar
Link to comment
Share on other sites

I make a new try with my hacked function

.text:0040B0B2 ; Exported entry   7. ExInterlockedFlushSList
.text:0040B0B2
.text:0040B0B2 ; =============== S U B R O U T I N E =======================================
.text:0040B0B2
.text:0040B0B2
.text:0040B0B2                 public ExInterlockedFlushSList
.text:0040B0B2 ExInterlockedFlushSList proc near       ; CODE XREF: sub_45F0DF:loc_45F0F7p
.text:0040B0B2                                         ; DATA XREF: .edata:off_5AC2A8o
.text:0040B0B2                 push    ebx
.text:0040B0B3                 push    ebp
.text:0040B0B4                 xor     ebx, ebx
.text:0040B0B6                 mov     ebp, ecx
.text:0040B0B8                 mov     edx, [ebp+4]
.text:0040B0BB                 mov     eax, [ebp+0]
.text:0040B0BE                 or      eax, eax
.text:0040B0C0                 jz      short loc_40B0C9
.text:0040B0C2                 mov     ecx, edx
.text:0040B0C4                 mov     cx, bx
.text:0040B0C7                 xor     ecx, ecx
.text:0040B0C9
.text:0040B0C9 loc_40B0C9:                             ; CODE XREF: ExInterlockedFlushSList+Ej
.text:0040B0C9                 pop     ebp
.text:0040B0CA                 pop     ebx
.text:0040B0CB                 nop
.text:0040B0CC                 nop
.text:0040B0CD                 nop
.text:0040B0CE                 nop
.text:0040B0CF                 retn
.text:0040B0CF ExInterlockedFlushSList endp
.text:0040B0CF
.text:0040B0CF ; ---------------------------------------------------------------------------

Hex code

53 55 33 DB 8B E9 8B 55 04 8B 45 00 09 C0 74 07 8B CA 66 89 D9 33 C9 5D 5B 90 90 90 90 C3

 

 

 

But I get this Bsod

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0a130038, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: f7839bd8, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS:  0a130038

CURRENT_IRQL:  2

FAULTING_IP:
storport!StorPortExtendedFunction+57cd
f7839bd8 8b7e24          mov     edi,dword ptr [esi+24h]

DEFAULT_BUCKET_ID:  DRIVER_FAULT

BUGCHECK_STR:  0xD1

PROCESS_NAME:  System

ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) x86fre

DPC_STACK_BASE:  FFFFFFFFF78A3000

TRAP_FRAME:  f78a2ef8 -- (.trap 0xfffffffff78a2ef8)
ErrCode = 00000000
eax=8a619ab8 ebx=00000000 ecx=8a619b4c edx=00000000 esi=0a130014 edi=8a619ab8
eip=f7839bd8 esp=f78a2f6c ebp=f78a2f78 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
storport!StorPortExtendedFunction+0x57cd:
f7839bd8 8b7e24          mov     edi,dword ptr [esi+24h] ds:0023:0a130038=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from 80532747 to 804e3592

STACK_TEXT:  
f78a2aac 80532747 00000003 f78a2e08 00000000 nt!RtlpBreakWithStatusInstruction
f78a2af8 8053321e 00000003 0a130038 f7839bd8 nt!KiBugCheckDebugBreak+0x19
f78a2ed8 804e187f 0000000a 0a130038 00000002 nt!KeBugCheck2+0x574
f78a2ed8 f7839bd8 0000000a 0a130038 00000002 nt!KiTrap0E+0x233
WARNING: Stack unwind information not available. Following frames may be wrong.
f78a2f78 f783a26e 8a619ab8 8a6129f0 8a4be024 storport!StorPortExtendedFunction+0x57cd
f78a2fa8 f782b356 8a610438 8a619ab8 8a610438 storport!StorPortExtendedFunction+0x5e63
f78a2fd0 804dbbd4 8a6129ac 8a612938 00000000 storport!DllInitialize+0xfc5
f78a2ff4 804db89e f789ded8 00000000 00000000 nt!KiRetireDpcList+0x46
f78a2ff8 f789ded8 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x2a
804db89e 00000000 00000009 bb835675 00000128 0xf789ded8


STACK_COMMAND:  kb

FOLLOWUP_IP:
storport!StorPortExtendedFunction+57cd
f7839bd8 8b7e24          mov     edi,dword ptr [esi+24h]

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  storport!StorPortExtendedFunction+57cd

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: storport

IMAGE_NAME:  storport.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  6142afab

IMAGE_VERSION:  6.1.7601.25735

FAILURE_BUCKET_ID:  0xD1_storport!StorPortExtendedFunction+57cd

BUCKET_ID:  0xD1_storport!StorPortExtendedFunction+57cd

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0xd1_storport!storportextendedfunction+57cd

FAILURE_ID_HASH:  {2d353e86-f9c7-de18-d8db-956bcb502646}

Followup: MachineOwner
---------

Link to comment
Share on other sites

Posted (edited)

So I think,

that even on one cpu with one core and one thread,

via this attempt

cmpxchg8b qword ptr [ebp+0]   

is necessary

Dietmar

PS: Now I think, that I read the paper from Cutler wrong.

There is NO version for .386 at all in this paper.

Edited by Dietmar
Link to comment
Share on other sites

Use a slim lock instead.

If an SList node is present, it must be processed (Next and Depth zeroed). A pointer to the next node in the list must be returned.

 

Link to comment
Share on other sites

Posted (edited)

@jumper

I do not think, that always the register is set to ECX = Null. Only, when the first 2 highest bytes are also 00 00.

Because in this case, my fake function from above would always work.

Can you please explain me in detail, what you think about the work of ExInterlockedFlushSList.

"If an SList node is present, it must be processed (Next and Depth zeroed). A pointer to the next node in the list must be returned."

This sounds for me, that something of the original list hast to be given back to the calling function via the register ECX, means ECX not Null, if a real list exist. But from the code I see, that the last 16 bits of ECX for sure are set to zero,

mov ebp, ecx  means, that now the original pointer in ecx to the list is rescued is ebp.

mov edx, [ebp+4] means, that this original content in ram, to what the pointer  shiftet by 4 bytes = 32 bit point and now those bytes are stored in edx. In EBP is the original pointer stored from ECX. It points to the lowest byte of the 64 real bits in Ram. So, now EDX contains the whole higher 32 bits (not a pointer) from the original 64 Bit in Ram.

In EAX is with mov eax, [ebp+0] the original content of the 32 lower bits, from original 64 bits in Ram.

With mov ecx, edx are now in ECX also the 32 higher bits from Ram (no pointer any more, Adress to 64 bit is lost).

With mov cx, bx now for the lowest 16 bit in ECX are set to 00 00, because EBX is empty at all.

What is now in ECX? The 2 Highest Bytes from the original 64 bits in Ram, with 00 00 at its end.

in [EBP+0] is still the Pointer to the lowest byte in ram, but with [  ] it becomes the real 64 original bit in Ram.

Now, the lower 32 bit from the original 64 bit in Ram are compared with the content of EAX.

In EAX are also the 32 lower bits, so the same bits as at the adress of [EBP+0].

The lower half of the 64 but list in memory is filled with 00 00 00 00, because EBX= 00 00 00 00.

The upper half of the 64 bit list in memory stays untouched.

So, no loop at all, the Zero flag is set.

But ECX = 2highest bytes from the original 64 bits in ram, followed by 00 00.

Even no value is direct returned from this function, ECX contains the 2 highest Bytes from original 64 bits in ram.

EBP and EBX are set from the stack back to there original value before the function is used.

In EAX are still the 32 lower bits from the original 64 bits in Ram. in EDX are still the 32 higher bits from Ram.

So, the Adress (Pointer) to the 64 bit in Ram is lost. Also the real 64 bit list keeps only her upper 32 bits.

The lower 32 bits of this list becomes 00 00 00 00.

So, where is flush? The pointer to the 64 bit in ram is complete destroyed.

A simulation of cmpxchg8b has to show exact those values in all the registers as here. This can be testet by hand.

Edited by Dietmar
Link to comment
Share on other sites

6 hours ago, Dietmar said:

Here is the from me relocated function ExInterlockedFlushSList from XP SP3

...
.data:004762BE loc_4762BE:                             ; CODE XREF: ExInterlockedFlushSList+19j
.data:004762BE                 or      eax, eax
.data:004762C0                 jz      short loc_4762CD
...

or  eax, eax - what does that do? Is this ment to initialize the Flags OF, CF or modify the SF, ZF, PF Flag!

Edited by Mark-XP
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   1 member

×
×
  • Create New...