Jump to content

user57

Member
  • Posts

    108
  • Joined

  • Last visited

  • Donations

    0.00 USD 
  • Country

    Germany

Everything posted by user57

  1. i tested that website with a nativ win7 machine (with all upgrades) and a unchanged official chrome 109 supermium is already far above that, as seen even on win7 and on a nativ win10 machine with edge win10´s edge might show a few more, but not all while supermium shows all of them
  2. hmm for some reason i can only type a few words - i removed the links and it is still not doing it edited for not can write the entire text
  3. happy to see chappel again the first function mentioned actually tells a discrete use for that cmpxchg8b command in 32 bit mode making a 64 bit change (ExInterlockedCompareExchange64) the sequence chappel mentions is actually the same as the code i wrote chappel also says microsoft use that code if, if the cmpxchg8b command was not found chappel they says microsoft stopped to use to make that check since windows 5.1 (xp) (so we are a little smarter in that sence now) howeever the next part tells a downside, chappel says that needs a storage object for multiprocessors (that SLIST_HEADER structure actually might be a storage object) but actually i use 2 move for exactly what chappels mentions (64 bit PTE´s, i build up those entrys then move it 2 times (high and low parts) ), and it dont cause a crash for what i did use that that might need a confirm from others, maybe it is worth a try (and dietmar has a 486 cpu that one dont use more processors/hypterthreading - anyways) a thread/processor switch takes time if that would be random the entire kernel would interfere anytime - the biggest BSOD i can think of dietmar might can need the next part from chappel that mentions what microsoft is doing to test if that command is available he says microsoft checks (before winxp) that by masking the eflags with the mask 0x00200000 if that cant be done there is no CPUID command (that information is already "almost" enough to not use the cmpxchg8b command) but microsoft makes it correct if the first mask check can be done, microsoft use the cpuid command and checks for the CX8 flag - this makes certain if the cpuid command is available and also checks for the cmpxchg8b command to be available just in case the cpu dont support chmpxchg8b but actually have the cpuid command) i think dietmar can need that information if he makes his 486 (dll) he mentioned , and want to make the check correct
  4. and the firmware translate this correctly ? it would make sence the the harddrives firmware actually know this and translates these to physical places on the real harddrive if the partition can filled with how you want to have the clusters, what is even the problem ?
  5. Cixert creator of thread this has mentioned other methods it always came in to use bigger sectors, it it was mentioned again by Milkinis some say that already worked for them it is a similiar discussion: https://msfn.org/board/topic/176480-2-tib-limit-size-in-mbr-hard-drives/#comments user-mode wise it dont seems a problem to me since it use that overlapped structure it contain 2 times 32 bits (64 bits) offsets -> those get translated to a physical address on a harddrive (i think recently somewhere i pointed that out somewhere passing to 64 bit via a structure) https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped that harddrive example makes a good example why and how 32 bit where passed, we know why there already where HDD discs with more then 4 GB - so actually we have a passed method because harddrive reached that areas a lot ealier then the RAM to be honest it dont look hard either since the function already can do that - sure i might not know about the windows driver for now ... but that raise the question why the driver cant do that it looks simple to me up to the point i know about it it just has to convert that 64 bit address given in the overlapped structure to a physical offset on the disc if they are 512 / 4096 /whatever "cluster-sector" size thats easy too , that just means you have more data that you actually can use with the 64 bit offset to make an example if the sector size was 1 you might would have have the 4 GB limit with a 32 bit offset, but that simply didnt use the other 32 bits (that are available) in case the sector was 512 with and now having a 4096 sector that means you have 8 times more space 4 gb (32 bit) * 512 = 2,19 TB GPT is a partion not a disc , a partion is a small file on the disc (in the past it was easy to currupt, you had bad luck if that one got demaged) - thats why you rather dont come to easy to access it
  6. well that with the GPT might be wrong idea in this case the idea was for a MBR with bigger sectors - even tho the title was supposed for reading the GPT partition GPT has not really a use except the higher possible disc space the idea that came around was just to increase the MBR sectors, the boot or read of GPT partition would be a different question then that paragon driver is made from a public driver, but it dont increase the MBR sectors that driver probaly emulates a next disc, where that driver makes read and writes if the windows driver really cant do that only then a driver change would be needed
  7. well i dont know what this firmware is written at but even if it would be a pure assembly code i certainly can change that code to all of the needs i suspect for the firmware a c/c++ (there are some differences in these but they are not big and i know them too) , combined with some assembly code i certainly can understand those codes and change them , but its something to read into - i dont know all the disc norms but thats something a programmer can do i was involved in chrome gdi, supermium, llvm,sumatra pdf or that heic image encoder to say the least it took some time to read into that codec, but the code i actually understand https://msfn.org/board/topic/185879-winxp-hevcheifheic-image-encoderdecoder/#comment-1254293
  8. since its finalized you should write a protocol and make a release you told us it´s acting oddly slow ? maybe you should try the code i posted up it actually can be that the reaction sometimes, 1 effect can be that the subtraction dont cause it to pop/push that well then might a escape or other logic has to take it out happy to see that you found a new section to use too, i told you its risky just to use other ram and the one you had where used roytram gave you the right solution for this happy to see the 486 working well interesting to see XP actually choose 32 MB instead of 256 MB caches useally makes the the computer faster https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ns-winioctl-storage_write_cache_property https://www.seagate.com/de/de/support/kb/disabling-the-write-cache-feature-in-windows-2000-xp-vista-and-windows-7-187751en/
  9. i could write assembly or c++ to a firmware but i think i need a drive to test
  10. this is a good time to talk about the CPUID command that command returns info about the processor it stores that information in EAX, EBX, ECX, and EDX very interesting for WINXP might be the PSE flag and the PAE flag with this interesting result as we always have it around somewhere "32 bits or wires are the limit for 32 bits" that guy actually wrote it like this: "Summary of 32-bit paging": "This allows a maximum RAM configuration of 252 bytes, or 4 petabytes (about 4.5×1015 bytes)." and it tells us win2k actually used up these methods "Windows 2000 Datacenter Memory Limit 32 GB RAM" https://en.wikipedia.org/wiki/Physical_Address_Extension https://en.wikipedia.org/wiki/PSE-36 https://en.m.wikipedia.org/wiki/Page_Size_Extension we might can but OS, CPU and BUS/RAM have to do so but back to the cpuid command it has information what commands can be used or what "technology" is available for this cpu this includes if it can make that cmpxchg8b command in EDX MMX (flag 23), cx8(flag 8 = cmpxchg8b), (pse(page size extension) flag 3), pae ((physical address extension)flag 6) , in ECX (AVX (28), sse4.2 , sse4.1, sse3) and so on the operating system useally should know if that command in invalid if it just continue it might use SSE or the MMX commands, what should cause a BSOD so rather be safe and store them up with a CPU result you actually made with a cpuid command script from a old CPU (a script for cpuid is easy to write and around in web) maybe from a late 486 cpu (what we can google that those are to be said to have the cpuid command) then you know for sure what those CPU actually gave back as result (the few flags maybe if that cmpxchg8b was avaiable you can just delete up) then you fill up either the registers or where windows store that information, then the OS/WINXP can react to that information, if WINXP actually dont have a reaction, if the command was not correctly reconized, failed, ect
  11. cpuid not an essential command however you should set this command to values the OS/WINXP can act related to a 4x86 cpu https://www.felixcloutier.com/x86/cpuid ttps://en.m.wikipedia.org/wiki/CPUID
  12. you dont have to neccesary use a near jmp, short jump it is distance based with signed byte (-127 +127)
  13. well honestly i actually do not want to study the entire thing behind that if its a PCB control(what i dont know - nor think) you have to study the entire function chain for this - the entirety of windows in relation to this at least the entire reaction related to that SLIST_HEADER/PKSPIN_LOCK strucuture is needed that raise a big question why that 2 strucutes would actually be that - sounds at least very odd to me so i want to say im out of this for now i remember intel removed the lock prefix as a virus once used it to hide its activity/itself(if i remember correct it execute the lock prefix - but it no longer has any effect - that lets normal activity continue) that description from masm archiv tells us that lock rep where removed already on a 286 cpu, so a 486 is affected (wanna go back to a 186 ? (joke)) a different cpu however needs some time to react, if a interference should happen, to be honest i dont think so and i changed up the entire IDT table and even made it invalid, not even execution 10 commands caused a problem - if there would be a fault in the 10 command then maybe but this is not the case this mov commands are however in nanosecond´s area, i dont think it actually can that it can interrupt this so fast a thread/cpu switch takes time rather 10 milliseconds would be something here (for others nano are a lot faster then 1 ms 1000 ns = 1 ms) if the thinking was about some kind of high language problem like "java atom" java and programming languages dont have atom based relations that rather comes from the programming language itself and is not CPU based only assembly actually do a such thing, assembly dont work like a high language IRQL,SIT/CLI and lock 2 locks then 2 command then locks dont make a "atomic move" either again i dont think that is the problem the REP command without lock it still would be done with 1 command executed - this goes as fast the cpu can handle this whatever exactly cycles that caused on the CPU itself i think if there is a problem the problem is not with the emulation, the problem is elsewhere, without make a big code to try around and looking the WRK dietmar could look that 5 functions in the win2k kernel too, maybe that helps or maybe not if the structure reaction/s changed up if somebody has a proof or the right knowlegue - let me know actually maybe the cmpxchange8b command where not entire used, only a part of its doing/reaction some changes actually also can be skipped - some are bad like bsod - while others continue without full functionality - while others work correct - and while others work but not that well - while others made some code but that code just didnt change anything and function too very certain what controls SLIST_HEADER, PKSPIN_LOCK would be a next step to look if the functions did the right things but also a next fault could be a problem, it would not be uncommon if 1 problem is solved, that just a next problem apears - what actually then has nothing to do anymore with the first problem (just in case i wanted to say that - for now hopefully not the problem) lets just say very likely those 2 structures (if correctly changed with the emulation) will be processed with some next code (why a atomic move would be needed?) https://www.nirsoft.net/kernel_struct/vista/SLIST_HEADER.html
  14. there might be there would be would be the REP command https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz it can have a lock prefix it is actually used for buffers not for smallers moves
  15. interesting thats neither "atomic" in both 2 moves nor the non interrupt flag i wrote dietmar he might leave it out in a private message also it dont have the checks or the loop, and the cmp cmpxchg8b is not done correct maybe it just fulfills that functions needs that can be, instead of just replacing the function the function where written to its real needs so we didnt had to be so specific, just for the correct function reaction well done
  16. roytam gave you the patch code, it are 5 changes lea ecx, sub_40078C <--- thats the first function that this replaced if it founds that cpuid number (ExInterlockedCompareExchange64) lea ecx, loc_4006F0 <-- next (ExInterlockedPopEntrySList) lea ecx, loc_400704 (ExInterlockedPushEntrySList) lea ecx, loc_400714 (ExInterlockedFlushSList) lea ecx, ExInterlockedAddLargeInteger last one is different it replace sub_402352 with ExInterlockedAddLargeInteger somebody can tell that the functions are at these places
  17. well i would think a different offset is possible but if they are equal it exchange that offset with ECX and EBX (it overworks those) if you know about logical circuits you might know why but exactly this is why i wanted to say he dont need that command at least i dont see a reason for that the function itself seems to compare: "if (this offset 64 bit entry still has the same value as EAX and EDX) -> change that offset with ECX and EBX (then it actually has a 64 bit changed value there)" functions descriptions: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exinterlockedflushslist https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exinterlockedpopentryslist https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exinterlockedpushentrylist "If there were entries on the specified list, ExInterlockedFlushSList returns a pointer to the first SLIST_ENTRY structure that was an entry on the list; otherwise, it returns NULL." this opens the question why they used this command to return an pointer offset to the first SLIST_ENTRY structure and changing that SLIST_ENTRY (what is a internal windows structure) about that "atomical" it "suppose to be" a "non interruptable doing" therefore dietmar probaly deleted the interrupt flag (so no interrupts - actually it still do that - but thats a other story) (the other is the "lock" command) the next question what comes to mind is changing 64 bit at once (non interrupt doing), not stepwise that that cmpxchg8b actually do (even tho its 32 bits - it just use 2 registers) BUT i never seen that to be needed for 64 bits, that far to small to have an interaction, not even interrupts if they are changed up (and those are constandly used) have that problem https://www.quora.com/What-is-the-meaning-of-atomic-in-programming we also talking about a function here , so the function itself might actually be "atomical" if it solves its job, because the function has a start and a end to solve this step we are not in c++ that might has a code "atom (this)" in assembly you have to write the real instruction that physical do so (if that was a problem we might have some more answers for now) dietmar just said that he wants to remove cmpxchg8b with a working alternativ code, it might be a little road but over time we will find this i highly suspect that list is for threading/multicore
  18. i heared chappell died a few months ago, sad story we could still need him reading chappells writing it says that there once was a solution that dont use the command if not supported by cpu that dont neccesary say if you just use a different one from a different OS version that it just work - maybe - maybe not why would it has to be that other cmpxchg command the linux one is not perfect - depending on what the other routines do, the linux one might work, but certainly its not 100 % correct, while mine is the linux one looks almost the same to the one i posted up, but it dont compare the 64 bits for the false result (maybe the linux solution dont need, but again mine is correct the linux one is not) so why not "just the right one" doing it a other way cause more commands and maybe fixes, there are certainly multiple solutions https://www.felixcloutier.com/x86/cmpxchg the description might be wrong this time, the description here unlike cmpxchg8b it always compares EAX with the memory location the description actually dont tell that a other register then EAX can be choosen well this time your code might work but you rather trying to fix the results, the ZF non reaction is set to just go back that is ok but you have to do this in every function like this then also that makes 2 times locks xchg and 2 times lock cmpxchg you do cmpxchg for the atomic question ? if you have to change 64 bit at once then it might be atomic for the 64 bits, doing 64 bit in 1 step just having the lock prefix dont change it to a 64 bit mov
  19. LOCK CMPXCHG [EBP], EAX <-- that already makes a change if that was the case but it actually need the 64 bit compare, before making any changes because if both 32 + 32 bits are not the same it dont do that thats the first mistake again that other part has no decision, it both jmps on ZF 1 and ZF 0 .data:004762D5 jz short loc_4762E1 .data:004762D7 jmp short loc_4762E1
  20. .text:8013CEA0 ExInterlockedPopEntrySList proc near ; CODE XREF: CcScheduleReadAhead+2BB�p .text:8013CEA0 ; sub_80108058+10�p ... .text:8013CEA0 push ebx .text:8013CEA1 push ebp pushf cli .text:8013CEA2 mov ebp, ecx .text:8013CEA4 .text:8013CEA4 loc_8013CEA4: <-- this seems to has a jump to ; DATA XREF: .text:loc_80140E17�o .text:8013CEA4 mov edx, [ebp+4] .text:8013CEA7 mov eax, [ebp+0] .text:8013CEAA .text:8013CEAA loc_8013CEAA: // valid ; CODE XREF: ExInterlockedPopEntrySList+1C�j .text:8013CEAA or eax, eax .text:8013CEAC jz short end_of_ExInterlockedPopEntrySList // has to be changed .text:8013CEAE mov ecx, edx .text:8013CEB0 add ecx, 0FFFFh .text:8013CEB6 .text:8013CEB6 loc_8013CEB6: <-- seems to have some jumps at too ; DATA XREF: sub_80140AF4:loc_80140AFD�o .text:8013CEB6 ; .text:80140D28�o .text:8013CEB6 mov ebx, [eax] cmp eax, [ebp+0] jnz loc_fail // something we did cmp edx, [ebp+4] jnz loc_fail // again mov [ebp+0], ebx mov [ebp+4], ecx jmp loop_check_ExInterlockedPopEntrySList // the loop check loc_fail: mov eax, [ebp+0] mov edx, [ebp+4] .text:8013CEB8 loop_check_ExInterlockedPopEntrySList: .text:8013CEBC jnz short loc_8013CEAA // valid but need fix to that or eax,eax loop .text:8013CEBE .text:8013CEBE loc_8013CEBE/end_of_ExInterlockedPopEntrySList: ; CODE XREF: ExInterlockedPopEntrySList+C�j sti popf .text:8013CEBE pop ebp .text:8013CEBF pop ebx .text:8013CEC0 retn .text:8013CEC0 ExInterlockedPopEntrySList endp ------------------------------------------------------- well data refs dont make sence at these spots, bug view ? at 8013CEA4 it says it get 1 or more jumps and says from 80140E17 is 1 of the jumps - since the pushf and cli changed the offset 2 bytes therefore the jump is gambled if not location fixed it would be common to see some jumps into different functions and oposite location 8013CEB6 seems to bejumped at _80140AF4:loc_80140AFD�o .text:80140D28�o (you should look at least that 3 spots for this jump) looks ida disassembler to me you at best search for that address where they get jumped from the jz at 8013CEAC has to be fixed to jump at the end that is sti / end_of_ExInterlockedPopEntrySList .text:8013CEAC jz end_of_ExInterlockedPopEntrySList .text:8013CEAA loc_8013CEAA: that one is valid, but since we have more code the jump that do this is a bit bigger but that one is shown in the visable code at 8013CEBC jnz short loc_8013CEAA .text:8013CEAC is valid but also need a adjust to reach the (loc_8013CEBE/end_of_ExInterlockedPopEntrySList) if you want you can try to remove sti,cli popf pushf (but have to be all 4) --------------------- you actually could also use a different method cmpxchg8b has 4 bytes of opcode jnz short has 2 aka 6 bytes you need 5 that makes jmp at your location + 1 nop cmpxchg8b qword ptr [ebp+0] jnz short loc_8013CEAA those to you replace with your memory location , use jmp + nop you memory location then do cmp eax, [ebp+0] jnz loc_fail2 // something we did cmp edx, [ebp+4] jnz loc_fail2 // again mov [ebp+0], ebx mov [ebp+4], ecx jmp the_check // the loop check loc_fail2: mov eax, [ebp+0] mov edx, [ebp+4] the_check: jnz short loc_8013CEAA // this one conditional jmp to that loop (or eax, eax) // now you just have to jump back jmp to (.text:8013CEBE loc_8013CEBE/end_of_ExInterlockedPopEntrySList) / just backwards as if the command has happend i dont know if that NT version can be used for XP they might have used a different behavior
  21. at the moment 3 makers are (image called the .avif) AOM SVT-AV1 rav1e one called the HEIC/h.265/x265 by hardware (for example nvidias NVDEC) all of those say that they are new hevc (h.265 codecs) is there a proof ? hardware might be faster at the moment but not better (1:12 or 9:29 the mountain in the background) software is a lot more clear, SVT both P3 and P6 are better: https://youtu.be/5rgteZRNb-A?t=72 another candidate for pictures is JXL https://www.youtube.com/watch?v=w7UDJUCMTng but we have to consider what settings where used, that actually makes a big difference while others even tell something about a h.266 codec https://de.wikipedia.org/wiki/Versatile_Video_Coding there should be a real comparison, always going for the best settings the encoder offers , both picture and motion video (also looking b frames, because often some pictures that are secondary are just stronger compressed - in this case the first picture might look good, but the second rather looks blured)
  22. public ExInterlockedPopEntrySList .data:004762F2 ExInterlockedPopEntrySList proc near ; CODE XREF: sub_40E06D+1DAp .data:004762F2 ; sub_41159B+8Ap ... .data:004762F2 push ebx ; ExInterlockedPopEntrySList .data:004762F3 push ebp pushf cli .data:004762F4 mov ebp, ecx loc_jumper_unknown: .data:004762F6 mov edx, [ebp+4] .data:004762F9 mov eax, [ebp+0] .data:004762FC .data:004762FC loc_4762FC: ; CODE XREF: ExInterlockedPopEntrySList+17j .data:004762FC or eax, eax .data:004762FE jz short loc_end .data:00476300 lea ecx, [edx-1] loc_jumper_unknown2: .data:00476303 mov ebx, [eax] .data:00476305 loc_jumper_unknown3: cmp eax, [ebp+0] jnz short loc_4762E5 cmp edx, [ebp+4] jnz short loc_4762E5 .data:004762ED mov [ebp+0], ebx .data:004762F0 mov [ebp+4], ecx .data:004762F3 jmp loop_check .data:004762E5 mov eax, [ebp+0] .data:004762E8 mov edx, [ebp+4] loop_check: .data:00476309 jnz short loc_4762FC .data:0047630B .data:0047630B loc_47630B: / loc_end: ; CODE XREF: ExInterlockedPopEntrySList+Cj sti popf .data:0047630B pop ebp .data:0047630C pop ebx .data:0047630D .data:0047630E .data:0047630F retn .data:0047630F ExInterlockedPopEntrySList endp for the c++ code you just have to look the translation that the c++ compiler did , if equal good for this other function there is a jmp to "mov ebx, [eax]" from 40a7470 (this means if change is changed that jump has to be adjusted to there (if not bsod from other part of this code) (since you added assembly commands in the start (we could make rid of pushf, cli , sti and popf) to keep that location at place (that is extra jump missing in your 3 post of code too) it has 3 jumps that has to be fixed from other parts 0040B0DE jz short loc_40B0EB (has to be adjusted) it has more code below now the others i have wrote locations, i think you can solve this tell me if this works that command dont work in my VM so i actually cant see how it reacts if i could that would it make a lot easier
  23. happy to see you had a good result is it working now ? do this one work ? the jumps have to fixed to the right locations .data:004762B2 public ExInterlockedFlushSList .data:004762B2 ExInterlockedFlushSList proc near ; CODE XREF: sub_45F0DF:loc_45F0F7p .data:004762B2 push ebx .data:004762B3 push ebp pushf cli .data:004762B4 xor ebx, ebx .data:004762B6 mov ebp, ecx .data:004762B8 mov edx, [ebp+4] .data:004762BB mov eax, [ebp+0] loc_1: .data:004762BE or eax, eax .data:004762C0 jz short loc_end (004762F7) .data:004762C2 mov ecx, edx .data:004762C4 mov cx, bx .data:004762C7 .data:004762C8 .data:004762C8 .data:004762C8 .data:004762C9 .data:004762CE .data:004762D0 .data:004762D1 .data:004762D1 .data:004762D1 .data:004762D7 .data:004762D9 .data:004762DB ; --------------------------------------------------------------------------- .data:004762DB ; emulation of CMPXCHG8B .data:004762DB .data:004762DB cmp eax, [ebp+0] .data:004762DE jnz short loc_4762E5 .data:004762E0 cmp edx, [ebp+4] .data:004762E3 jnz short loc_4762E5 .data:004762E5 .data:004762E5 .data:004762ED .data:004762ED mov [ebp+0], ebx .data:004762F0 mov [ebp+4], ecx .data:004762F3 jmp loop_check (004762F3) .data:004762E5 mov eax, [ebp+0] .data:004762E8 mov edx, [ebp+4] .data:004762EB .data:004762ED .data:004762ED .data:004762ED ; end emulation of CMPXCHG8B .data:004762F0 ; --------------------------------------------------------------------------- loop_check: .data:004762F3 jnz short loc_1 (004762BE) loc_end: .data:004762F7 sti .data:004762F8 popf .data:004762F9 pop ebp .data:004762FA pop ebx .data:004762FB retn .data:004762FF ExInterlockedFlushSList endp
  24. you making the same mistake that command sets flags, and react if the compare was correct or not there 2 problems i can certainly tell in the first step cmpxchg can have 2 results (if equal it makes the mov if not it makes the mov to a register) (and it should not do that because it has to compare 64 bits) if you have 32 bits with the compare it reacts already to the 32 bits (the other 64 bit are ignored) then the following happens : the flags are lost and the reaction - for equal 32 bit already reacted or not then you do the code again but here sits the same problems now the flags get changed a second time (and it should not) the compare depending if equal reacts to the next 32 bit (while igoring the first 32 bit) if that compare was equal it sets the values and if not it sets no values (but you need the 64 bit) the flag registes (ZF) is that readed as if the first 32 bits are not there with other words the results are gambled up the solution looks not that hard to me you need 2 compares to see if the wanted to compare 64 bits are equal before you set the 64 bits reactions if those 2 compare where equal you set the values at the memory location, in the other case you need an extra reaction to set the other case the reaction stores them into EDX and EAX (the flag should still be activ, unless you start to use a command that affect flags) cmp edx and eax (destination operand) if equal store ECX EBX to destination operand (The destination operand is an 8-byte memory location) // CMPXCHG8B should be removed and followed by this code: // CMPXCHG8B - 32 bit emulator cmp dword ptr [ebp],eax // eax suppose to be the low part jne skip_and_load_edx_eax cmp dword ptr [ebp+4], edx // edx suppose to be the high part jne skip_and_load_edx_eax // 64 bits where equal, change with ECX and EBX mov dword ptr [ebp], ebx // suppose to have the low part mov dword ptr [ebp+4], ecx // suppose to have the high part jmp end_of_CMPXCHG8B // they where not equal do as the command is described and load those to EDX and EAX skip_and_load_edx_eax: mov eax, dword ptr [ebp] // suppose to be the low part mov edx, dword ptr [ebp+4] // suppose to be the high part end_of_CMPXCHG8B: // CMPXCHG8B - 32 bit emulator end // normal code continue this emulation for the 1 line of CMPXCHG8B, it also should have the correct flag jumps might need a adjust to their usual locations // notice i could not test that command if the order is right (like upper and higher parts) it might said something about the upper and lower part but as i remember right you never can be exactly certain about this (in memory if you have 11223344 - the 44 are the bits that control the high values (very old architecture stores that differently too - but in this case we dont have that problem even in a 486) if that dont work i certainly can fix this, i need a test to make certain the command reaction after that i can see its exact behavior the command description however says EDX and ECX contain the high part https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b if the high order is different then just the spot change from ebp to ebp+4 and ebp+4 to ebp (or change the registers assigned to that ebp locations) : // CMPXCHG8B - 32 bit emulator cmp dword ptr [ebp],edx // if different edx suppose to be the low part jne skip_and_load_edx_eax cmp dword ptr [ebp+4], eax // if different eax suppose to be the high part jne skip_and_load_edx_eax // 64 bits where equal, change with ECX and EBX mov dword ptr [ebp], ecx // if different ecx has the low part mov dword ptr [ebp+4], ebx // if different ebx has the high part jmp end_of_CMPXCHG8B // they where not equal do as the command is described and load those to EDX and EAX skip_and_load_edx_eax: mov edx, dword ptr [ebp] // suppose to be the low part mov eax, dword ptr [ebp+4] // suppose to be the high part // your 55667788 example say so end_of_CMPXCHG8B: // CMPXCHG8B - 32 bit emulator end
  25. well you certainly can translate this command to a 32 bit variant code you already have used the "cmpxchg" assembly command but it actually should do the wrong job sometimes because that compares up only 32 bits (and then already react to the 32 bits) (if that compare was the same or not already changed the result because it can already react to either the first 32 bits or the next 32 bits) (.data:004762D5 jnz short near ptr loc_4762BE+1 - that done again erased the first 32 compare results and only react to the next 32 bits compare) but you need the result for 64 bits compare! it seems to me that you can also solve this problem by : making 2 compares "cmp" commands for the flags/reaction now it is about not to make the same mistake (if you do just the 32 bit compare again it reads the next 32 bits and ignored the first 32 bits from the first compare) you need a reaction to the first compare (if that was the case) and making the "cmp" command again and react a second time if both compares was correct you make the reaction just as described (else the other described reaction) : https://www.felixcloutier.com/x86/cmpxchg8b:cmpxchg16b that command description actually dont say something about exchanging the values it just says that if the 64 bit compare was equal it says "if the compare was equal the values in it stores the data in ECX and EBX in other case in EDX EAX (what dont look a exchange for me) - maybe the description lacks (what i useally do then i try it out and take looks) // if it would be an exchange it would be: (later reading the code i dont see a common exchange a common exchange would be if eax would be changed to edx - eax having eax and edx having eax): 4 assembly "mov" commands (2 for the destination and 2 for the source) or: 2 times the "xchg" command // but ! looking the assembly code from you it seems different to me i dont see a exchange (just let me say im not entire certain here, but it might helps to talk about that): the cmpxchg8b command seems to compare registers EDX and EAX for equal and then changing an offset to a memory location (stack register two "EBP") (qword ptr [ebp+0]) (qword useally describes a 64 bit movement (word * 4 (16 bits * 4)) if that result was equal it should store EAX and EDX to that offset (otherwise it probaly loads that values to EDX EAX) the next command is "jnz" that command still has the results from this compare, if they was equal it jumps back to "Efls10" (what seems a loop to me) if not it continues the end and and this function seeing your code again "lock cmpxchg [ebp+4], eax" dont have a reaction but it might need (as said before it need a reaction to both of the 32 bits) if that was not the case it need to end this (not always just continue) done that way the first 32 bit can have a false result - and if the next 32 bit are right - then it just still do the job - while it should not --------------------- if the 64 bit guys apear, that is not neccesary needed if you have to use more then 32 bits there are severial methods you can solve this (to name a few) 1: one is using 2 registers and just create its behavoir for that there is a such 32 bit assembly command that is used for that ( CDQ - Convert Word to Doubleword/Convert Doubleword to Quadword ) 2: an offset to somewhere in memory that is bigger then 32 bits and control it as 64 bits 3 (even more is possible with a offset location): if you have more then 64 bit flags you just need an offset to a location , where you actually control the flags/ or data 4: for file movements there is for the REP command the CPU actually can see that it has to move a certain amount of data, and the cpu can translate the filemovement to something it actually can progress the FSB (quad pumped) to the RAM is doing a such thing unlike the 64 bit guys might would think you dont need a 64 bit offset for this a other example would be the CACHE, HDD´s use a CACHE to fill up the data that data can then be progressed differently - like with 2 bit(wires), 4 bit, 16, 32, 64 or even more (it rather comes down what the physical cable/wire can do)
×
×
  • Create New...