Jump to content

Dietmar

Member
  • Posts

    1,117
  • Joined

  • Last visited

  • Days Won

    5
  • Donations

    0.00 USD 
  • Country

    Germany

Everything posted by Dietmar

  1. @user57 What do you think: On a one core cpu CMPXCHG8B qword ptr [Destination] <==> if (EDX:EAX == *Destination) { ZF = 1; *Destination = ECX:EBX; } else { ZF = 0; EDX:EAX = *Destination; } <==> cli push ebp push ebx pushf newtry: mov ebp, ecx mov eax, [ebp] mov edx, [ebp + 4] cmp eax, [ebp] jne fail cmp edx, [ebp + 4] jne fail mov [ebp], ebx mov [ebp + 4], ecx jmp done fail: mov eax, [ebp] mov edx, [ebp + 4] done: jne newtry popf pop ebx pop ebp sti retn
  2. @user57 Waaaaooohhhh yours work!!! This just means, that XP on a 486 can be done. I am not good in assembler, my does not work. How do you learn it, I am soso curious^^ Dietmar Here is your Code for XP SP3 for the working function ExInterlockedFlushSList without any cmpxchg8b. which I relocate in ntoskrnl.exe for to have enough space. 53 55 9C FA 33 DB 8B E9 8B 55 04 8B 45 00 0B C0 74 1F 8B CA 66 89 D9 3B 45 00 75 0D 3B 55 04 75 08 89 5D 00 89 4D 04 EB 06 8B 45 00 8B 55 04 75 D5 FB 9D 5D 5B 90 90 90 90 90 90 90 90 C3 And here is the from it fresh build ntoskrnl.exe https://ufile.io/n1a92q0w .data:004762B2 ; Exported entry 7. ExInterlockedFlushSList .data:004762B2 .data:004762B2 ; =============== S U B R O U T I N E ======================================= .data:004762B2 .data:004762B2 .data:004762B2 public ExInterlockedFlushSList .data:004762B2 ExInterlockedFlushSList proc near ; CODE XREF: sub_45F0DF:loc_45F0F7p .data:004762B2 ; DATA XREF: .edata:off_5AC2A8o .data:004762B2 push ebx .data:004762B3 push ebp .data:004762B4 pushf .data:004762B5 cli .data:004762B6 xor ebx, ebx .data:004762B8 .data:004762B8 loc_4762B8: ; CODE XREF: ExInterlockedFlushSList:loc_4762E1j .data:004762B8 mov ebp, ecx .data:004762BA mov edx, [ebp+4] .data:004762BD mov eax, [ebp+0] .data:004762C0 or eax, eax .data:004762C2 jz short loc_4762E3 .data:004762C4 mov ecx, edx .data:004762C6 mov cx, bx .data:004762C9 cmp eax, [ebp+0] .data:004762CC jnz short loc_4762DB .data:004762CE cmp edx, [ebp+4] .data:004762D1 jnz short loc_4762DB .data:004762D3 mov [ebp+0], ebx .data:004762D6 mov [ebp+4], ecx .data:004762D9 jmp short loc_4762E1 .data:004762DB ; --------------------------------------------------------------------------- .data:004762DB .data:004762DB loc_4762DB: ; CODE XREF: ExInterlockedFlushSList+1Aj .data:004762DB ; ExInterlockedFlushSList+1Fj .data:004762DB mov eax, [ebp+0] .data:004762DE mov edx, [ebp+4] .data:004762E1 .data:004762E1 loc_4762E1: ; CODE XREF: ExInterlockedFlushSList+27j .data:004762E1 jnz short loc_4762B8 .data:004762E3 .data:004762E3 loc_4762E3: ; CODE XREF: ExInterlockedFlushSList+10j .data:004762E3 sti .data:004762E4 popf .data:004762E5 pop ebp .data:004762E6 pop ebx .data:004762E7 nop .data:004762E8 nop .data:004762E9 nop .data:004762EA nop .data:004762EB nop .data:004762EC nop .data:004762ED nop .data:004762EE nop .data:004762EF retn .data:004762EF ExInterlockedFlushSList endp .data:004762EF .data:004762EF ; ---------------------------------------------------------------------------
  3. .data:004762B2 ; Exported entry 7. ExInterlockedFlushSList .data:004762B2 .data:004762B2 ; =============== S U B R O U T I N E ======================================= .data:004762B2 .data:004762B2 .data:004762B2 public ExInterlockedFlushSList .data:004762B2 ExInterlockedFlushSList proc near ; CODE XREF: sub_45F0DF:loc_45F0F7p .data:004762B2 push ebx .data:004762B3 push ebp .data:004762B4 xor ebx, ebx .data:004762B6 mov ebp, ecx .data:004762B8 mov edx, [ebp+4] .data:004762BB mov eax, [ebp+0] .data:004762BE or eax, eax .data:004762C0 jz short loc_4762FA .data:004762C2 mov ecx, edx .data:004762C4 mov cx, bx .data:004762C7 pushf .data:004762C8 .data:004762C8 loc_4762C8: ; CODE XREF: ExInterlockedFlushSList+25j .data:004762C8 cli .data:004762C9 lock btr dword ptr [edi], 0 .data:004762CE jnz short loc_4762DB .data:004762D0 popf .data:004762D1 .data:004762D1 loc_4762D1: ; CODE XREF: ExInterlockedFlushSList+27j .data:004762D1 test dword ptr [edi], 1 .data:004762D7 jz short loc_4762C8 .data:004762D9 jmp short loc_4762D1 .data:004762DB ; --------------------------------------------------------------------------- .data:004762DB .data:004762DB loc_4762DB: ; CODE XREF: ExInterlockedFlushSList+1Cj .data:004762DB cmp eax, [ebp+0] .data:004762DE jnz short loc_4762E5 .data:004762E0 cmp edx, [ebp+4] .data:004762E3 jz short loc_4762ED .data:004762E5 .data:004762E5 loc_4762E5: ; CODE XREF: ExInterlockedFlushSList+2Cj .data:004762E5 mov eax, [ebp+0] .data:004762E8 mov edx, [ebp+4] .data:004762EB xor ebx, ebx .data:004762ED .data:004762ED loc_4762ED: ; CODE XREF: ExInterlockedFlushSList+31j .data:004762ED mov [ebp+0], ebx .data:004762F0 mov [ebp+4], ecx .data:004762F3 mov byte ptr [ebp+0], 0 .data:004762F7 pop ebp .data:004762F8 pop ebx .data:004762F9 retn .data:004762FA ; --------------------------------------------------------------------------- .data:004762FA .data:004762FA loc_4762FA: ; CODE XREF: ExInterlockedFlushSList+Ej .data:004762FA pop ebp .data:004762FB pop ebx .data:004762FC nop .data:004762FD nop .data:004762FE nop .data:004762FF retn .data:004762FF ExInterlockedFlushSList endp
  4. Now I make a new try with the simulated opcode cmpxchg8b for the 486 cpu
  5. Yessaaa, I understand to 100%, what this function is doing. During crazy thinking today about the few lines in Assembler of the function ExInterlockedFlushSList, I start to wonder, why this is there: jnz short loc_40B0BE (a loop) In my eyes, no loop is needed, when I understand the opcode correct. Now I understand, what happens: cmpxchg8b is not really atomic, even not on a single cpu. So, it can happen, that the content in memory does NOT match witch the content in EDX EAX, because another process killes during this operation the 64 bit content in memory or just disturbs operation. Crazy. Oh my, what kind of errors may be there in other OS too. Cutler solves this problem with his loop. When the content in memory is disturbed, or the operation cmpxchg8b is broken, there was a next try and a next try and a next try.. I testet this with disabling the loop with 90 90. Oh my, Bsod. So it happens nearly always, that this operation is disturbed. And I can overcome this to be disturbed, when I change the opcode from cmpxch8b ==> lock cmpxch8b in the original ntoskrnl.exe from XP SP3. Now, without the loop, XP boots. This means, only with the lock nobody is allowed to disturb the operation of cmpxch8b, and so for any simulator, real atomic is needed. Waaaoh, this is the first time, that a modd in this function works for me Dietmar
  6. I think, that a make a mistake in my description of the work of CMPXCHG8B. It works really on ALL bits of the 64 bit in memory. The lower 32 bit are stored in EAX, the higher 32 bit in EDX. Now comes the check: ONLY, when all of the 64 bit in memory match with the EDX EAX combination, the combination of ECX EBX is written back into those 64 bit in memory. In ECX are the higher 32bit for the exchange and in EBX the lower 32bit for exchange. ONLY when this exchange has happened, the Zero flag ZF is set Dietmar CMPXCHG8B qword ptr [Destination] <==> if (EDX:EAX == *Destination) { ZF = 1; *Destination = ECX:EBX; } else { ZF = 0; EDX:EAX = *Destination; } PS: This is the base for the Simulator.
  7. @user57 Give me the assembler code of the emulator and I test. After 1 day of work on this few lines, I think, now I understand how this function works. But to emulate it, is another hard job Dietmar
  8. @Mark-XP or eax, eax ; If eax was zero, the zero flag will be set. If eax was non-zero, the zero flag will be cleared Dietmar
  9. @jumper I do not think, that always the register is set to ECX = Null. Only, when the first 2 highest bytes are also 00 00. Because in this case, my fake function from above would always work. Can you please explain me in detail, what you think about the work of ExInterlockedFlushSList. "If an SList node is present, it must be processed (Next and Depth zeroed). A pointer to the next node in the list must be returned." This sounds for me, that something of the original list hast to be given back to the calling function via the register ECX, means ECX not Null, if a real list exist. But from the code I see, that the last 16 bits of ECX for sure are set to zero, mov ebp, ecx means, that now the original pointer in ecx to the list is rescued is ebp. mov edx, [ebp+4] means, that this original content in ram, to what the pointer shiftet by 4 bytes = 32 bit point and now those bytes are stored in edx. In EBP is the original pointer stored from ECX. It points to the lowest byte of the 64 real bits in Ram. So, now EDX contains the whole higher 32 bits (not a pointer) from the original 64 Bit in Ram. In EAX is with mov eax, [ebp+0] the original content of the 32 lower bits, from original 64 bits in Ram. With mov ecx, edx are now in ECX also the 32 higher bits from Ram (no pointer any more, Adress to 64 bit is lost). With mov cx, bx now for the lowest 16 bit in ECX are set to 00 00, because EBX is empty at all. What is now in ECX? The 2 Highest Bytes from the original 64 bits in Ram, with 00 00 at its end. in [EBP+0] is still the Pointer to the lowest byte in ram, but with [ ] it becomes the real 64 original bit in Ram. Now, the lower 32 bit from the original 64 bit in Ram are compared with the content of EAX. In EAX are also the 32 lower bits, so the same bits as at the adress of [EBP+0]. The lower half of the 64 but list in memory is filled with 00 00 00 00, because EBX= 00 00 00 00. The upper half of the 64 bit list in memory stays untouched. So, no loop at all, the Zero flag is set. But ECX = 2highest bytes from the original 64 bits in ram, followed by 00 00. Even no value is direct returned from this function, ECX contains the 2 highest Bytes from original 64 bits in ram. EBP and EBX are set from the stack back to there original value before the function is used. In EAX are still the 32 lower bits from the original 64 bits in Ram. in EDX are still the 32 higher bits from Ram. So, the Adress (Pointer) to the 64 bit in Ram is lost. Also the real 64 bit list keeps only her upper 32 bits. The lower 32 bits of this list becomes 00 00 00 00. So, where is flush? The pointer to the 64 bit in ram is complete destroyed. A simulation of cmpxchg8b has to show exact those values in all the registers as here. This can be testet by hand.
  10. So I think, that even on one cpu with one core and one thread, via this attempt cmpxchg8b qword ptr [ebp+0] is necessary Dietmar PS: Now I think, that I read the paper from Cutler wrong. There is NO version for .386 at all in this paper.
  11. I make a new try with my hacked function .text:0040B0B2 ; Exported entry 7. ExInterlockedFlushSList .text:0040B0B2 .text:0040B0B2 ; =============== S U B R O U T I N E ======================================= .text:0040B0B2 .text:0040B0B2 .text:0040B0B2 public ExInterlockedFlushSList .text:0040B0B2 ExInterlockedFlushSList proc near ; CODE XREF: sub_45F0DF:loc_45F0F7p .text:0040B0B2 ; DATA XREF: .edata:off_5AC2A8o .text:0040B0B2 push ebx .text:0040B0B3 push ebp .text:0040B0B4 xor ebx, ebx .text:0040B0B6 mov ebp, ecx .text:0040B0B8 mov edx, [ebp+4] .text:0040B0BB mov eax, [ebp+0] .text:0040B0BE or eax, eax .text:0040B0C0 jz short loc_40B0C9 .text:0040B0C2 mov ecx, edx .text:0040B0C4 mov cx, bx .text:0040B0C7 xor ecx, ecx .text:0040B0C9 .text:0040B0C9 loc_40B0C9: ; CODE XREF: ExInterlockedFlushSList+Ej .text:0040B0C9 pop ebp .text:0040B0CA pop ebx .text:0040B0CB nop .text:0040B0CC nop .text:0040B0CD nop .text:0040B0CE nop .text:0040B0CF retn .text:0040B0CF ExInterlockedFlushSList endp .text:0040B0CF .text:0040B0CF ; --------------------------------------------------------------------------- Hex code 53 55 33 DB 8B E9 8B 55 04 8B 45 00 09 C0 74 07 8B CA 66 89 D9 33 C9 5D 5B 90 90 90 90 C3 But I get this Bsod kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: 0a130038, memory referenced Arg2: 00000002, IRQL Arg3: 00000000, value 0 = read operation, 1 = write operation Arg4: f7839bd8, address which referenced memory Debugging Details: ------------------ READ_ADDRESS: 0a130038 CURRENT_IRQL: 2 FAULTING_IP: storport!StorPortExtendedFunction+57cd f7839bd8 8b7e24 mov edi,dword ptr [esi+24h] DEFAULT_BUCKET_ID: DRIVER_FAULT BUGCHECK_STR: 0xD1 PROCESS_NAME: System ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) x86fre DPC_STACK_BASE: FFFFFFFFF78A3000 TRAP_FRAME: f78a2ef8 -- (.trap 0xfffffffff78a2ef8) ErrCode = 00000000 eax=8a619ab8 ebx=00000000 ecx=8a619b4c edx=00000000 esi=0a130014 edi=8a619ab8 eip=f7839bd8 esp=f78a2f6c ebp=f78a2f78 iopl=0 nv up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 storport!StorPortExtendedFunction+0x57cd: f7839bd8 8b7e24 mov edi,dword ptr [esi+24h] ds:0023:0a130038=???????? Resetting default scope LAST_CONTROL_TRANSFER: from 80532747 to 804e3592 STACK_TEXT: f78a2aac 80532747 00000003 f78a2e08 00000000 nt!RtlpBreakWithStatusInstruction f78a2af8 8053321e 00000003 0a130038 f7839bd8 nt!KiBugCheckDebugBreak+0x19 f78a2ed8 804e187f 0000000a 0a130038 00000002 nt!KeBugCheck2+0x574 f78a2ed8 f7839bd8 0000000a 0a130038 00000002 nt!KiTrap0E+0x233 WARNING: Stack unwind information not available. Following frames may be wrong. f78a2f78 f783a26e 8a619ab8 8a6129f0 8a4be024 storport!StorPortExtendedFunction+0x57cd f78a2fa8 f782b356 8a610438 8a619ab8 8a610438 storport!StorPortExtendedFunction+0x5e63 f78a2fd0 804dbbd4 8a6129ac 8a612938 00000000 storport!DllInitialize+0xfc5 f78a2ff4 804db89e f789ded8 00000000 00000000 nt!KiRetireDpcList+0x46 f78a2ff8 f789ded8 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x2a 804db89e 00000000 00000009 bb835675 00000128 0xf789ded8 STACK_COMMAND: kb FOLLOWUP_IP: storport!StorPortExtendedFunction+57cd f7839bd8 8b7e24 mov edi,dword ptr [esi+24h] SYMBOL_STACK_INDEX: 4 SYMBOL_NAME: storport!StorPortExtendedFunction+57cd FOLLOWUP_NAME: MachineOwner MODULE_NAME: storport IMAGE_NAME: storport.sys DEBUG_FLR_IMAGE_TIMESTAMP: 6142afab IMAGE_VERSION: 6.1.7601.25735 FAILURE_BUCKET_ID: 0xD1_storport!StorPortExtendedFunction+57cd BUCKET_ID: 0xD1_storport!StorPortExtendedFunction+57cd ANALYSIS_SOURCE: KM FAILURE_ID_HASH_STRING: km:0xd1_storport!storportextendedfunction+57cd FAILURE_ID_HASH: {2d353e86-f9c7-de18-d8db-956bcb502646} Followup: MachineOwner ---------
  12. And now the explanation, what this function ExInterlockedFlushSList is doing in real: The calling function gives the register ECX to this function ExInterlockedFlushSList. In ECX stays the information of the startpoint for a 64 bit list in memory. Now the function ExInterlockedFlushSList checks 2 scenarios: ECX=NULL is given back to the calling function, which means, that never such a list existed, because EAX=0. The second scenario is, that EAX is not NULL. In this case, the ONLY thing, that the function ExInterlockedFlushSList is doing, is to delete the pointer in the register ECX. But the first 2 highest bytes are stored in ECX. So, mostly ECX is not Null, only when the highest 2 Byte are 00 00. The list itself stays untouched in memory. But now, the calling function has lost all information about the place in memory about this list, because the work of ExInterlockedFlushSList on ECX. And it cant be repaired from the calling function via ECX, because ECX contains only 2 highest 2 bytes from the 64 bit in Ram. The whole list is kept in Ram and also with its higher 32 bit in EDX and the higher 32 bit in EAX.
  13. Here is the from me relocated function ExInterlockedFlushSList from XP SP3 .data:004762B2 ; Exported entry 7. ExInterlockedFlushSList .data:004762B2 .data:004762B2 ; =============== S U B R O U T I N E ======================================= .data:004762B2 .data:004762B2 .data:004762B2 public ExInterlockedFlushSList .data:004762B2 ExInterlockedFlushSList proc near ; CODE XREF: sub_45F0DF:loc_45F0F7p .data:004762B2 ; DATA XREF: .edata:off_5AC2A8o .data:004762B2 push ebx .data:004762B3 push ebp .data:004762B4 xor ebx, ebx .data:004762B6 mov ebp, ecx .data:004762B8 mov edx, [ebp+4] .data:004762BB mov eax, [ebp+0] .data:004762BE .data:004762BE loc_4762BE: ; CODE XREF: ExInterlockedFlushSList+19j .data:004762BE or eax, eax .data:004762C0 jz short loc_4762CD .data:004762C2 mov ecx, edx .data:004762C4 mov cx, bx .data:004762C7 cmpxchg8b qword ptr [ebp+0] .data:004762CB jnz short loc_4762BE .data:004762CD .data:004762CD loc_4762CD: ; CODE XREF: ExInterlockedFlushSList+Ej .data:004762CD pop ebp .data:004762CE pop ebx .data:004762CF retn .data:004762CF ExInterlockedFlushSList endp .data:004762CF .data:004762CF ; ---------------------------------------------------------------------------
  14. Now we come to the whole work of the function ExInterlockedFlushSList in XP SP3. This function starts after its call with push ebx ; Push value of the ebx register to the stack to rescue its content there, its value is not changed. push ebp ; Push value of the ebp register to the stack to rescue its content there, its value is not changed. xor ebx, ebx ; Set the ebx register to zero (EBX = 00 00 00 00) by performing a bitwise XOR operation with itself. mov ebp, ecx ; Copy value of the ecx register in the ebp register (ECX value has to be prepared outside this function). mov edx, [ebp+4] ; Copy the high 32-bit value stored at the RAM address [ebp+4] into the edx register (ebp is new from above ecx). mov eax, [ebp+0] ; Copy the low 32-bit value stored at the RAM address [ebp+0] into the eax register (ebp is new from above ecx). Now we have empty ebx, and the lower 32bit in ram from the address of ecx, and the higher 32bit from the address from ecx. or eax, eax ; If eax was zero, the zero flag will be set. If eax was non-zero, the zero flag will be cleared. jz short loc_4762CD ; If EAX was zero, we overjump (short) all of the compare, to address 4762CD. mov ecx, edx ; Now we move the content of edx to ecx. The content of ecx is lost, the content in edx is still kept. But the content of ecx is (see before) already rescued in ebp. in ECX are now the higher 32 bit from 64 bit in Ram. mov cx, bx ; cx represents the lower 16 bits of the ecx register. bx represents the lower 16 bits of the ebx register. mov cx, bx copies the content of the lower 16 bit of the ebx register (bx) into the lower 16 bit of the ecx register (cx). The upper 16 bits of both ebx and ecx remain unchanged. This means: In ECX now only the 2 highest Byte survive from the 64 bit in memory. They can be 00 00 also. So, it is not impossible, that ECX = 00 00 00 00 , but only when the 2 highest bytes from the 64 bit in memry are also 00 00. Example: EBX = 0x12345678 (upper 16 bit: 0x1234, lower 16 bit: 0x5678) ECX = 0x98765432 (upper 16 bit: 0x9876, lower 16 bit: 0x5432) Now mov cx, bx EBX remains unchanged (0x12345678). ECX will have only its lower 16 bit replaced with the lower 16 bit from bx = 0x5678. The upper 16 bit of ECX will remain the same (0x9876). So, this is the only change from mov cx, bx is in this example ECX = 0x98765678 jnz short loc_4762BE ; If the operation cmpxchg8b qword ptr [ebp+0] changes Ram via EBX, the Zero flag is set. Then, we go out of the loop, just next opcode after this jnz short loc_4762BE instruction. If the bits in EAX and the lower 32bits in Ram from the 64 bits are not identic, the cmpxchg8b qword ptr [ebp+0] does just nothing with any memory or register. But the Zero flag is not set. So, the jump to loc_4762BE happens. pop ebp ;Fetches the topmost value from the stack and store it in the ebp register and delete its value on top of stack. pop ebx ; Fetches the now topmost value from the stack, store it in the ebx register. Delete this value on stack. retn ; Return from the function ExInterlockedFlushSList to the caller. And delets the return address from the stack (the address where the function was called from). Jumps to the popped return address, effectively resuming execution from the point where the function was called.
  15. Now I will describe as good as I can the work of the function ExInterlockedFlushSList in XP SP3. cmpxchg8b works on 64 contiguous bits. Those 64 bits (8 bytes) stand in memory (RAM) of the compi at a given place. Those 64 bits are here given indirect to cmpxchg8b by the 32 bit register EBP on the cpu. In EBP stands a 32 bit address, which points exact to the first byte from those 64bit. Even EBP holds in XP only a 32-bit address, cmpxchg8b qword ptr [ebp+0] works from the RAM location given by ebp for all the 64bit from there. The cmpxchg8b instruction works now directly on these 64bits in memory. So we have cmpxchg8b qword ptr [ebp+0]. Example:The 64bits in memory are 0x1122334455667788. 11223344 are the higher 32bit. 55667788 the lower 32bit. In EAX stand 0x55667788 in EDX stand 94712056 (any values). Now only the 32bits in EAX are compared via cmpxchg8b with the 64 bit in ram. (Only each lower 32bit compare.) This behavior is, because we have a 32bit OS. The higher bits in EDX are just ignored. Also those higher 32bits from the 64bit in Ram. By the way this means, that when we use "lock cmpxchg" in a simulation, it is without any sense to use "lock cmpxchg" 2 times. Here we need the "lock" because only cmpxchg8b is from home atomic, means no other processor can disturb the memory during its comparing operation. This is only garanted for cmpxchg with the lock before it. In my example we have the case, that the lower 32 bit in Ram and in EAX are identic. In this case, the lower 32 bits (of the 64-bit value in memory) will be replaced with the 32 bits stored in ebx. But EBX = 00 00 00 00. This means, the real list in memory is filled to half from botten with 00. From a 32 bit view, this list is now empty at all. The higher 32bit in Ram are not changed, whatever is there, whatever is in EDX. The Zero flag is set after a change happens. If the bits in EAX and the lower 32bits in Ram from the 64 bits are not identic, cmpxchg8b will do nothing with the 64 bit in memory and also change nothing in EAX, EDX, EBX, ECX, EBP. So, in this case cmpxchg8b has the same effect as 90 90 90 90. The Zero flag is NOT set. Now I see, what happens with my try, when I just replace cmpxchg8b qword ptr [ebp+0] with 90 90 90 90. At once I have an infinite loop, because no Zero flag is set. Unclear for me, why there is this loop. I n my eyes, in a first try the both lower 32 bit pairs are identic and exchanged against 00 00 00 00.
  16. @PPeti66x A nice solution for this would be, when at the moment, when opcode for cmpxchg8b is asked from a file for the 486 cpu, that there is something like a tender between the opcode and the cpu, that makes exact the same operations on all the registers, that cmpxchg8b is doing. It would be like a software simulation for cmpxchg8b direct before the cpu. The program for this can be done in C language. I will call it 486.dll . At once, such an XP would work on 386, 486 586 686 cpu with all functionality. And once it has been done one time, crazy work, other unknown opcodes for other cpu can be done the same way. Dietmar PS: @Mov AX, 0xDEAD I remember, that you have a tool, that can check, if 2 binaries are doing the same.
  17. For the 486 cpu, this AMD64 opcode will look something like this Dietmar 8B 01 ; mov eax, [ecx] 25 00 00 00 FE ; and eax, 0fe000000H #ifdef NTOS_KERNEL_RUNTIME 80 38 01 ; cmp byte ptr [eax], 1 F6 D1 ; not cl D1 C2 ; ror eax, 1 C1 F8 2B ; sar eax, 43 #else C1 F8 2A ; sar eax, 42 #endif C3 ; ret
  18. I think, this is a crazy nice way to overcome this problem, that works on all cpu. Problem is only the translate from AMD64 with 64bit registers this opcode to x86 (486 cpu) with 32 bit registers. I think, it can be done Dietmar For bit64 mov rax, [rcx] ; get address, sequence, and depth and rax, 0fe000000H ; isolate packed address ; ; The following code takes advantage of the fact that the high order bit ; for user mode addresses is zero and for system addresses is one. ; ifdef NTOS_KERNEL_RUNTIME cmp rax, 1 ; set carry if address is zero cmc ; set carry if address is not zero rcr rax, 1 ; rotate carry into high bit sar rax, 63 - 43 ; extract first entry address else shr rax, 63 - 42 ; extract first entry address endif ret ; return
  19. @PPeti66x Hi, until now not. But when I take a look at the Source Code from XP SP1 or even XP bit64, there is in each a file with name slist.asm. The 64bit version is from 2000, same author. But 64bit slist.asm takes another way, not using cmpxchg8b. If this opcodes from 64bit can be translated into x86 code, this would be also a possibility. But I have no idea, how to reach this in Hex code. It is unchanged since 1996 for NT4. "Only" the opcode cmpxchg8b has to be simulated with opcode from 486 cpu. Cutler in 1996 and so also for XP SP1 solved this problem, just "jump" over this opcode during assembly for .386 (and not .586). I make a try with 90 90 90 90 for all apearance of this opcode in XP SP3 in ntoskrnl.exe. But Bsod. Strange, because also in XP SP3 there is exact the same code from Cutler used from 1996, as you can see in ntoskrnl.exe via Ida Pro. It is a heavy memory operation and with more than 1 cpu there can be problems with this "jump". But the 486 cpu has only 1 processor, so it may be possible Dietmar EDIT: With Windbg, starting with bu ExInterlockedFlushSList I come to its driver entry point of my modded driver. And via trace (t, F8) I can see, that the code in my modded driver was fully entered and left with retn, no Bsod. slist.asm XP SP1 title "Interlocked Support" ;++ ; ; Copyright (c) 1996 Microsoft Corporation ; ; Module Name: ; ; slist.asm ; ; Abstract: ; ; This module implements functions to support interlocked S-List ; operations. ; ; Author: ; ; David N. Cutler (davec) 13-Mar-1996 ; ; Environment: ; ; Any mode. ; ; Revision History: ; ;-- .386p .xlist include ks386.inc include callconv.inc ; calling convention macros include mac386.inc .list _TEXT$00 SEGMENT DWORD PUBLIC 'CODE' ASSUME DS:FLAT, ES:FLAT, SS:NOTHING, FS:NOTHING, GS:NOTHING page , 132 subttl "Interlocked Flush Sequenced List" ;++ ; ; PSINGLE_LIST_ENTRY ; FASTCALL ; RtlpInterlockedFlushSList ( ; IN PSINGLE_LIST_ENTRY ListHead ; ) ; ; Routine Description: ; ; This function removes the entire list from a sequenced singly ; linked list so that access to the list is synchronized in an MP system. ; If there are no entries in the list, then a value of NULL is returned. ; Otherwise, the address of the entry at the top of the list is removed ; and returned as the function value and the list header is set to point ; to NULL. ; ; Arguments: ; ; (ecx) = ListHead - Supplies a pointer to the sequenced listhead from ; which the list is to be flushed. ; ; Return Value: ; ; The address of the entire current list, or NULL if the list is ; empty. ; ;-- ; ; These old interfaces just fall into the new ones ; cPublicFastCall ExInterlockedFlushSList, 1 fstENDP ExInterlockedFlushSList cPublicFastCall RtlpInterlockedFlushSList, 1 cPublicFpo 0,1 ; ; Save nonvolatile registers and read the listhead sequence number followed ; by the listhead next link. ; ; N.B. These two dwords MUST be read exactly in this order. ; push ebx ; save nonvolatile registers push ebp ; xor ebx, ebx ; zero out new pointer mov ebp, ecx ; save listhead address mov edx, [ebp] + 4 ; get current sequence number mov eax, [ebp] + 0 ; get current next link ; ; N.B. The following code is the retry code should the compare ; part of the compare exchange operation fail ; ; If the list is empty, then there is nothing that can be removed. ; Efls10: or eax, eax ; check if list is empty jz short Efls20 ; if z set, list is empty mov ecx, edx ; copy sequence number mov cx, bx ; clear depth leaving sequence number .586 ifndef NT_UP lock cmpxchg8b qword ptr [ebp] ; compare and exchange else cmpxchg8b qword ptr [ebp] ; compare and exchange endif .386 jnz short Efls10 ; if z clear, exchange failed ; ; Restore nonvolatile registers and return result. ; cPublicFpo 0,0 Efls20: pop ebp ; restore nonvolatile registers pop ebx ; fstRET RtlpInterlockedFlushSList fstENDP RtlpInterlockedFlushSList page , 132 subttl "Interlocked Pop Entry Sequenced List" ;++ ; ; PVOID ; FASTCALL ; RtlpInterlockedPopEntrySList ( ; IN PSLIST_HEADER ListHead ; ) ; ; Routine Description: ; ; This function removes an entry from the front of a sequenced singly ; linked list so that access to the list is synchronized in an MP system. ; If there are no entries in the list, then a value of NULL is returned. ; Otherwise, the address of the entry that is removed is returned as the ; function value. ; ; Arguments: ; ; (ecx) = ListHead - Supplies a pointer to the sequenced listhead from ; which an entry is to be removed. ; ; Return Value: ; ; The address of the entry removed from the list, or NULL if the list is ; empty. ; ;-- ; ; These older interfaces just fall into the new code below ; cPublicFastCall InterlockedPopEntrySList, 1 fstENDP InterlockedPopEntrySList cPublicFastCall ExInterlockedPopEntrySList, 2 fstENDP ExInterlockedPopEntrySList cPublicFastCall RtlpInterlockedPopEntrySList, 1 cPublicFpo 0,2 ; ; Save nonvolatile registers and read the listhead sequence number followed ; by the listhead next link. ; ; N.B. These two dwords MUST be read exactly in this order. ; push ebx ; save nonvolatile registers push ebp ; mov ebp, ecx ; save listhead address ; ; N.B. The following code is the continuation address should a fault ; occur in the rare case described below. ; public ExpInterlockedPopEntrySListResume public _ExpInterlockedPopEntrySListResume@0 ExpInterlockedPopEntrySListResume: ; _ExpInterlockedPopEntrySListResume@0: ; mov edx,[ebp] + 4 ; get current sequence number mov eax,[ebp] + 0 ; get current next link ; ; If the list is empty, then there is nothing that can be removed. ; Epop10: or eax, eax ; check if list is empty jz short Epop20 ; if z set, list is empty lea ecx, [edx-1] ; Adjust depth only ; ; N.B. It is possible for the following instruction to fault in the rare ; case where the first entry in the list is allocated on another ; processor and free between the time the free pointer is read above ; and the following instruction. When this happens, the access fault ; code continues execution by skipping the following instruction. ; This results in the compare failing and the entire operation is ; retried. ; public ExpInterlockedPopEntrySListFault ExpInterlockedPopEntrySListFault: ; mov ebx, [eax] ; get address of successor entry public _ExpInterlockedPopEntrySListEnd@0 _ExpInterlockedPopEntrySListEnd@0: ; .586 ifndef NT_UP lock cmpxchg8b qword ptr [ebp] ; compare and exchange else cmpxchg8b qword ptr [ebp] ; compare and exchange endif .386 jnz short Epop10 ; if z clear, exchange failed ; ; Restore nonvolatile registers and return result. ; cPublicFpo 0,0 Epop20: pop ebp ; restore nonvolatile registers pop ebx ; fstRET RtlpInterlockedPopEntrySList fstENDP RtlpInterlockedPopEntrySList page , 132 subttl "Interlocked Push Entry Sequenced List" ;++ ; ; PVOID ; FASTCALL ; RtlpInterlockedPushEntrySList ( ; IN PSLIST_HEADER ListHead, ; IN PVOID ListEntry ; ) ; ; Routine Description: ; ; This function inserts an entry at the head of a sequenced singly linked ; list so that access to the list is synchronized in an MP system. ; ; Arguments: ; ; (ecx) ListHead - Supplies a pointer to the sequenced listhead into which ; an entry is to be inserted. ; ; (edx) ListEntry - Supplies a pointer to the entry to be inserted at the ; head of the list. ; ; Return Value: ; ; Previous contents of ListHead. NULL implies list went from empty ; to not empty. ; ;-- ; ; This old interface just fall into the new code below. ; cPublicFastCall ExInterlockedPushEntrySList, 3 pop [esp] ; Drop the lock argument fstENDP ExInterlockedPushEntrySList cPublicFastCall InterlockedPushEntrySList, 2 fstENDP InterlockedPushEntrySList cPublicFastCall RtlpInterlockedPushEntrySList, 2 cPublicFpo 0,2 ; ; Save nonvolatile registers and read the listhead sequence number followed ; by the listhead next link. ; ; N.B. These two dwords MUST be read exactly in this order. ; push ebx ; save nonvolatile registers push ebp ; mov ebp, ecx ; save listhead address mov ebx, edx ; save list entry address mov edx,[ebp] + 4 ; get current sequence number mov eax,[ebp] + 0 ; get current next link Epsh10: mov [ebx], eax ; set next link in new first entry lea ecx, [edx+010001H] ; increment sequence number and depth .586 ifndef NT_UP lock cmpxchg8b qword ptr [ebp] ; compare and exchange else cmpxchg8b qword ptr[ebp] ; compare and exchange endif .386 jnz short Epsh10 ; if z clear, exchange failed ; ; Restore nonvolatile registers and return result. ; cPublicFpo 0,0 pop ebp ; restore nonvolatile registers pop ebx ; fstRET RtlpInterlockedPushEntrySList fstENDP RtlpInterlockedPushEntrySList ;++ ; ; SINGLE_LIST_ENTRY ; FASTCALL ; InterlockedPushListSList ( ; IN PSLIST_HEADER ListHead, ; IN PSINGLE_LIST_ENTRY List, ; IN PSINGLE_LIST_ENTRY ListEnd, ; IN ULONG Count ; ) ; ; Routine Description: ; ; This function will push multiple entries onto an SList at once ; ; Arguments: ; ; ListHead - List head to push the list to. ; ; List - The list to add to the front of the SList ; ListEnd - The last element in the chain ; Count - The number of items in the chain ; ; Return Value: ; ; PSINGLE_LIST_ENTRY - The old header pointer is returned ; ;-- cPublicFastCall InterlockedPushListSList, 4 cPublicFpo 0,4 push ebx ; save nonvolatile registers push ebp ; mov ebp, ecx ; save listhead address mov ebx, edx ; save list entry address mov edx,[ebp] + 4 ; get current sequence number mov eax,[ebp] + 0 ; get current next link Epshl10: mov ecx, [esp+4*3] ; Fetch address of list tail mov [ecx], eax ; Store new forward pointer in tail entry lea ecx, [edx+010000H] ; increment sequence number add ecx, [esp+4*4] ; Add in new count to create correct depth .586 ifndef NT_UP lock cmpxchg8b qword ptr [ebp] ; compare and exchange else cmpxchg8b qword ptr[ebp] ; compare and exchange endif .386 jnz short Epshl10 ; if z clear, exchange failed cPublicFpo 0,0 pop ebp ; restore nonvolatile registers pop ebx ; fstRET InterlockedPushListSList fstENDP InterlockedPushListSList ;++ ; ; PSINGLE_LIST_ENTRY ; FirstEntrySList ( ; IN PSLIST_HEADER SListHead ; ) ; ; Routine Description: ; ; This function returns the address of the fisrt entry in the SLIST or ; NULL. ; ; Arguments: ; ; ListHead (rcx) - Supplies a pointer to the sequenced listhead from ; which the first entry address is to be computed. ; ; Return Value: ; ; The address of the first entry is the specified, or NULL if the list is ; empty. ; ;-- cPublicProc _FirstEntrySList, 1 cPublicFpo 1,0 mov eax, [esp+4] mov eax, [eax] stdRET _FirstEntrySList stdENDP _FirstEntrySList ;++ ; ; LONGLONG ; FASTCALL ; RtlInterlockedCompareExchange64 ( ; IN OUT PLONGLONG Destination, ; IN PLONGLONG Exchange, ; IN PLONGLONG Comperand ; ) ; ; Routine Description: ; ; This function performs a compare and exchange of 64-bits. ; ; Arguments: ; ; (ecx) Destination - Supplies a pointer to the destination variable. ; ; (edx) Exchange - Supplies a pointer to the exchange value. ; ; (esp+4) Comperand - Supplies a pointer to the comperand value. ; ; Return Value: ; ; The current destination value is returned as the function value. ; ;-- cPublicFastCall RtlInterlockedCompareExchange64, 3 cPublicFpo 0,2 ; ; Save nonvolatile registers and read the exchange and comperand values. ; push ebx ; save nonvolatile registers push ebp ; mov ebp, ecx ; set destination address mov ebx, [edx] ; get exchange value mov ecx, [edx] + 4 ; mov edx, [esp] + 12 ; get comperand address mov eax, [edx] ; get comperand value mov edx, [edx] + 4 ; .586 ifndef NT_UP lock cmpxchg8b qword ptr [ebp] ; compare and exchange else cmpxchg8b qword ptr[ebp] ; compare and exchange endif .386 ; ; Restore nonvolatile registers and return result in edx:eax. ; cPublicFpo 0,0 pop ebp ; restore nonvolatile registers pop ebx ; fstRET RtlInterlockedCompareExchange64 fstENDP RtlInterlockedCompareExchange64 _TEXT$00 ends end
  20. Hi, I found also this but have no idea how to make a simulation for 486 cpu from it, because it has an retn, a second retn is not good in a function Dietmar the single instruction lock cmpxchg8b qword ptr [ebp] is replaceable with the following sequence pushfd try: cli lock bts dword ptr [edi],0 jnb acquired popfd pushfd wait: test dword ptr [edi],1 je try pause ; if available jmp wait acquired: cmp eax,[ebp] jne keep cmp edx,[ebp+4] je exchange keep: mov eax,[ebp] mov edx,[ebp+4] jmp done exchange: mov [ebp],ebx mov [ebp+4],ecx done: mov byte ptr [edi],0 popfd and this lock cmpxchg8b qword ptr [esi] is replaceable with the following sequence pushfd try: cli lock bts dword ptr [edi],0 jnb acquired popfd pushfd wait: test dword ptr [edi],1 je try pause ; if available jmp wait acquired: cmp eax,[esi] jne keep cmp edx,[esi+4] je exchange keep: mov eax,[esi] mov edx,[esi+4] jmp done exchange: mov [esi],ebx mov [esi+4],ecx done: mov byte ptr [edi],0 popfd
  21. Hi, I try to install XP SP3 on the Shuttle Hot 433 board with 486 cpu. But very early in Setup comes a message, that the 486 cpu does not support the hex opcode cmpxchg8b and so XP cant be installed. I also try an XP SP3 from another compi in IDE mode, crash at once. Now I look at the hex wíth Ida pro for this cmpxchg8b on an ready XP SP3 install. On a first try I find it in ntoskrnl.exe (one cpu) and in ntdll.dll. There may be other PE files in XP also with this opcode. The use is always the same. This opcode does a atomic search in a register. So, when a working solution is found, the replacement in other files is easy! I try to replace it with a series of opcodes, that the 486 cpu understands. This is not easy. I found this (Edit: This is wrong). push ebx ; save nonvolatile registers push ebp xor ebx, ebx ; zero out new pointer mov ebp, ecx ; save listhead address mov edx, [ebp] + 4 ; get current sequence number mov eax, [ebp] + 0 ; get current next link Efls10: or eax, eax ; check if list is empty jz short Efls20 ; if z set, list is empty mov ecx, edx ; copy sequence number mov cx, bx ; clear depth leaving sequence number jnz short Efls10 ; if z clear, exchange failed Efls20: pop ebp ; restore nonvolatile registers pop ebx ret This I try as a replacement for this function ExInterlockedFlushSList in ntoskrnl.exe in XP SP3. The funny thing in this is, that simple the opcode cmpxchg8b qword ptr [ebp+0] is deleted. May be it works on NT4 but for me it crashes XP. EDIT: May be, that this version for i368 cpu of ExInterlockedFlushSList works really only on a compi with 1 cpu and 1 core. Like in 1992 486 cpu. Then, my test on modern compi will fail. Also can be, that now I use a mix of cmpxchg8b, nothing from this, cmpxchg on one compi, because I simulated only one appearence of this function in ntoskrnl.exe. Funny, this is from Cutler, 13. March 1996, now also identic in XP SP3, THis is the original ExInterlockedFlushSList in XP SP3, first introduced in NT4 Servicepack4, Hex code 53 55 33 DB 8B E9 8B 55 04 8B 45 00 0B C0 74 0B 8B CA 66 8B CB 0F C7 4D 00 75 F1 5D 5B C3 .text:0040B0B2 ; Exported entry 7. ExInterlockedFlushSList .text:0040B0B2 .text:0040B0B2 ; =============== S U B R O U T I N E ======================================= .text:0040B0B2 .text:0040B0B2 .text:0040B0B2 public ExInterlockedFlushSList .text:0040B0B2 ExInterlockedFlushSList proc near ; CODE XREF: sub_45F0DF:loc_45F0F7p .text:0040B0B2 push ebx .text:0040B0B3 push ebp .text:0040B0B4 xor ebx, ebx .text:0040B0B6 mov ebp, ecx .text:0040B0B8 mov edx, [ebp+4] .text:0040B0BB mov eax, [ebp+0] .text:0040B0BE .text:0040B0BE loc_40B0BE: ; CODE XREF: ExInterlockedFlushSList+19j .text:0040B0BE or eax, eax .text:0040B0C0 jz short loc_40B0CD .text:0040B0C2 mov ecx, edx .text:0040B0C4 mov cx, bx .text:0040B0C7 cmpxchg8b qword ptr [ebp+0] .text:0040B0CB jnz short loc_40B0BE .text:0040B0CD .text:0040B0CD loc_40B0CD: ; CODE XREF: ExInterlockedFlushSList+Ej .text:0040B0CD pop ebp .text:0040B0CE pop ebx .text:0040B0CF retn .text:0040B0CF ExInterlockedFlushSList endp .text:0040B0CF .text:0040B0CF ; --------------------------------------------------------------------------- With PE Maker I make a relocate of this function in ntoskrnl.exe. This works(!). The relocation I do, because the following replacement is bigger than the original Hex code. I split the cmpxchg8b opcode in 2 parts with lock cmpxchg, because the 486 cpu understands this. But Bsod. I use Windbg, cant fetch the reason. I check my hex code several times, find no error. The only thing in my eyes that can happen, is a missing syncronic between the 2 cmpxchg. This does not happen on cmpxchg8b, because all memory is blocked during this operation. Here is my last try for the replacement of the ExInterlockedFlushSList .data:004762B2 ; --------------------------------------------------------------------------- .data:004762B2 ; Exported entry 7. ExInterlockedFlushSList .data:004762B2 .data:004762B2 public ExInterlockedFlushSList .data:004762B2 ExInterlockedFlushSList: ; CODE XREF: sub_45F0DF:loc_45F0F7p .data:004762B2 ; DATA XREF: .edata:off_5AC2A8o .data:004762B2 push ebx .data:004762B3 push ebp .data:004762B4 xor ebx, ebx .data:004762B6 mov ebp, ecx .data:004762B8 mov edx, [ebp+4] .data:004762BB mov eax, [ebp+0] .data:004762BE .data:004762BE loc_4762BE: ; CODE XREF: .data:004762D5j .data:004762BE or eax, eax .data:004762C0 jz short loc_4762DA .data:004762C2 mov ecx, edx .data:004762C4 mov cx, bx .data:004762C7 lock cmpxchg [ebp+4], eax .data:004762CC mov ecx, edx .data:004762CE mov edx, ecx .data:004762D0 lock cmpxchg [ebp+0], eax .data:004762D5 jnz short near ptr loc_4762BE+1 .data:004762D7 nop .data:004762D8 nop .data:004762D9 nop .data:004762DA .data:004762DA loc_4762DA: ; CODE XREF: .data:004762C0j .data:004762DA pop ebp .data:004762DB pop ebx .data:004762DC nop .data:004762DD nop .data:004762DE nop .data:004762DF retn .data:004762DF ; --------------------------------------------------------------------------- I put this via relocation to the new address 4762B2. This is in .data section and not in .text section. But this does not matter, because when I put the original Hex code to this new place, it works. The original place at 40B0B2 I fill with 00 00 00.. for to make sure, that now my function at this new place is used. I want to get better in Assembler. No free KI for Assembler in Internet. Do you have an idea @Mov AX, 0xDEAD? Chatgpt, Bard AI and Bing behave like crazy, when it comes to Hex code Dietmar
  22. Hi, I get for few Euro an 486 board with empty Bios battery, Shuttle Hot 433 v1. Oh crazy, I cant boot this compi without this Dallas battery. I come to the idea, to modd the Bios, so that it does not longer wait for CMOS error. From another old compi I put out its Bios chips, because only that chip is an EEprom, can be flashed without crazy UV light. With EEpromer TL 866 Plus I read the Bios out and modd. Now the fresh modded Bios recognices also my oldest 8.4 Gbyte harddisk, before it was not recogniced. For full XP SP3 I need about 1 Gbyte harddisk at minimum. Next problem was, that this board does not recognice my memory, PCI-graphik , mouse. The Isa card now is recogniced with name Trident Super VGA from an i386 compi. Still no mouse. The cache on this board is 256kB. Just now I work with 4 Mb, which was the only stick, that was recogniced until now, brrr.. Win98SE boots, not slow. I add an AMD AM486 DX4-100 SV8T. Oh..crazy to set that millions of jumpers. Something must be wrong in the head of those manufakturers, because for example 6 positions for one Jumper, but sometimes they are counted vertical, sometimes horizontal and sometimes mix. About 40 jumpers. This cpu wants 3 Volt, the board offers ony 3.3 Volt, I choose this. Voila, Win98SE works! XP SP3 will be tomorrow;)).. Dietmar EDIT: The 100MHZ cpu runs hot without any cooler, heatsink or fan. The DX-33 MHZ cpu before does not need a cooler at all. EDIT2: I succeed to install 256 MB of ram on this 486 board. But still no mouse, no working PCI Graphik card. EDIT3: The PCI GT610 graphik card is not recogniced, may be because it offers also HDMI and not only VGA(?!).
  23. Off topic: Soon I make a try to install XP on a i386 cpu, because with real i486 I already succeed. I got an 386 dx-25 board Octek Jaguar II with 32 MB and will report soon Dietmar
  24. I just get my Vobis Highscreen Tower from May 1992 back. New Dallas batterie chip, 486 cpu DX 33 MHZ. With 2x CD-rom, that I bought in 1993. Oh..soso much fun to install XP SP3 there Dietmar
×
×
  • Create New...