Tripredacus Posted August 20, 2019 Posted August 20, 2019 I am working on a project in which I want to be able to create a workaround to a flat file database issue, referenced here: https://msfn.org/board/topic/179737-cause-and-workaround-for-a-50k-file-size-limit-in-dos/ The problem in particular is that two files are potentially a problem, as they have a fixed size or addressable size. The software does not replace data, rather it appends it and does not actually recognize the EOF. The EOF is actually in the middle of a dataset, so when data is entered, it writes the half that it can, then duplication and data tabulation bugs begin to occur. I am fortunate that I have a backup from just before and just after this issue appeared, as well as the original files. I have already attempted to manually figure out how the addressing works in my design, but keep running into problems getting the correct results. So there is something I am just not grasping about how these files work and maybe someone has an idea about it. Perhaps whatever the programmer used to create this has a named method that I can look into... if I knew what it was. The files that this program uses are non-indexed, non-addressed but in hex. It uses hex values and appears to "count" entries (or known block sizes) to find relevant data. Examples: FILE1 has 44 blocks per entry FILE2 has 51 blocks per entry ADDRESS refers to data in either files in blocks of 30+20 (15+10) With an example in address of 01 00 within the first block of 30 (or 15), this is pointing to the first entry in FILE2 (Offset 0) file. If an address of 01 00 in the first block of 15 (or 10) which is the second section, it points to the first entry in FILE1 (offset 0) file. In the 3rd block of 30, an address of 10 00 is the 16th* entry (Offset 40B) in the FILE2 file, and so on and so on. Does this method have a name? How would I search for how this data is being organized? * This is one example where I can't figure out the math on locating the data manually. It is some math error I am making in my head. I know that 10 in hex is 16 in decimal, I know the data is the 16th entry (I counted it, and this is why I am using low numbers in my examples) but I do not know how 40B is determined. 40B in hex to dec is 1035, divided by 51 is ~20 not 16. 40B is the offset shown in the hex editor.
jaclaz Posted August 21, 2019 Posted August 21, 2019 Well, I can understand maybe 1/3 of what you wrote. If there is no sensible data involved, post the actual files. or at least some snippets of the files as they are. jaclaz
Tripredacus Posted August 21, 2019 Author Posted August 21, 2019 Here is what the beginning of the address file looks like: EDIT: Typo in this picture, "03 is first, should be 03 is third) Here is where the address file moves past 255 and FILE2 example data: The two questions I have are: 1. Does this flat file database method have a particular name? 2. How do I determine the Offset of an entry using the address file. In this example, 03 00 (from Address file) points to the third entry in FILE2 which starts at offset 8A. 8A to Dec is 138 / 3 = 46. Since the data entries are in 55 byte groups, this is the wrong math. I need to be able to take the entry from the address file and find it in FILE1 or FILE2. Is the problem that I am doing a decimal conversion in trying to figure this out? FILE2 reference. First entry starts at offset 0. Second entry starts at offset 45, third entry is at offset 8A.
jaclaz Posted August 21, 2019 Posted August 21, 2019 First thing, it is a two byte numbering, little endian. I.e. what you call "01" is a actually "01 00" and it means 0x0001. The 0101 is actually 257 , if you prefer the sequence is: 0x00FE 254 0x00FF 255 0x0100 256 0x0101 257 So you can have at the most, since 0x0000 is not used, max 0xFFFF values, i.e. 65535. It is "queer" how there are holes in the numbering, I continue to fail to understand half of what you are saying, what (the heck) do you mean by: Quote EDIT: Typo in this picture, "03 is first, should be 03 is third) How (the heck) are you determining that the entry "Mike" is corresponding to entry #03? Can you post the first (say) 512 bytes of File 2 (as opposed to an arbitrary small snippet of it starting at 0x0080?) You can replace text like "Mike" with "wxyz" if there is any sensitive data. Loosely, it resembles a lot a FAT 16: http://www.ntfs.com/fat-allocation.htm jaclaz
Tripredacus Posted August 21, 2019 Author Posted August 21, 2019 I do not know whether or not if the software is actually storing the data in hex. There is no key as to what kind of files these are, and I only have decoded how they work by opening the files in a hex editor, making a change with the program and comparing the changes. I think that the data is not actually in Hex, but I am viewing it in a hex editor. I was unable to view the data properly in Notepad or Scite, but it did show properly in UltraEdit... but it automatically had shown the data in the hex editor mode. I know Mike is entry #3 because it is the third entry in FILE2. That was my main concern. The ADDRESS file only is storing the position of the data entry in FILE2. BUT in FILE2, the entries have no index value. I can easily look at the ADDRESS file and say "ok I need to find the 200th entry in FILE2" but then I go into FILE2, I would have to manually count the entries (and hope I didn't miscount) to find what is in the 200th entry. What I want to do is determine how to calculate where in file 2 I can read the data, by either the offset shown in the hex editor, or by using a character position/count. So I will need to know this math portion for when I write a program to manage the "deleted" content. So far, my attempts at determining the position of the data in FILE2 from the values in ADDRESS have not worked, Even the example of the Mike I can't even math that properly.
jumper Posted August 22, 2019 Posted August 22, 2019 After reviewing both threads, it appears the high scores (or similar) database for your DOS game has filled up. Data entries are added by appending them to the end of file #2. Entries in file #2 are "deleted" by simply not referencing them from the index file #1. File #2 was never expected to grow through decades of use to anywhere near 50KB. A design flaw has limited that growth to 50991 bytes preventing any more additions or deletions. My only question is the name of the DOS game. Please attach a directory listing along with full copies of the two database files so jaclaz, others, and/or I can verify your analysis.
Tripredacus Posted August 22, 2019 Author Posted August 22, 2019 The licensing status is "unknown" as it is one of those products made by a company, who was bought by another company, that was folded into another company, etc. The common thought is that it is "abandonware" and can be found on those websites. The program is called "Major League Manager" by Spinnaker Software. It runs on DOS. The only version online is the "1990" release (has 1989 rosters) but apparently the older versions have not been found. The insinuation that the designer never expected it to be used "for years" is true... BUT this bug the software has is not due to an extreme length of time it is used. I first encountered this issue within a year of using the software initially back in the early 90s. The game is designed for you to create teams, and to create players for those teams. It also allows you to "delete" players, whether they are created or built-in. If you created 10 teams and the 25 players for each of those teams, you'd hit the bug quite quickly. I have attached my notes regarding the files. For the purpose of clarity, in this thread "FILE2" is BATTERS and "FILE1" is pitchers. Also to note: the stock program does not have any teams in custom league. In my notes, "MLM2018" is my active custom version that has about a half dozen slots left in the BATTERS file. "MLM2019" was the next instance I was working on that encountered the EOF situation. FileDetail.txt
jaclaz Posted August 22, 2019 Posted August 22, 2019 Can you post the actual files? This: Quote The following data: 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00 Represents these batters 01-0F (Brady Anderson through Craig Worthington) Where 00 is the delimiter. Is not accurate, 00 is - as said above - not a delimiter, it is more likely the higher value byte of a two byte numbering, Quote Team data consists of 15 batters and 10 pitchers. Good, this explains why second grouping stops at 0x00A0. jaclaz
Tripredacus Posted August 22, 2019 Author Posted August 22, 2019 You are right. It is closer to being a section. 01 00 is the first player in the first "group" of the batters file. Because further along, there is a 01 01, which is the first player int he second "group". This is only evident in the stock files, which are "in order" like this, where everyone in the first group are in order in the Address file. Files from the updated version are not so nice to look at. I have attached address and batters from the stock version. ADDRESS BATTERS
jaclaz Posted August 23, 2019 Posted August 23, 2019 (edited) Ok, in the meantime get TinyHexer from here: https://www.softpedia.com/get/Others/Miscellaneous/tiny-hexer.shtml Put the attached ,mps in the \mirkes.de\Tiny Hexer\scripts\Structure Viewer folder and use it on the BATTERS file. There is a little glitsh as the single quote character in the batter name makes things "go green", but it is only a visual one. Of course I need BOTH the "stock" and the "modified" files to go on. jaclaz BATTERS.mps Edited August 23, 2019 by jaclaz
Tripredacus Posted August 23, 2019 Author Posted August 23, 2019 This programs certainly looks a bit different. Attached is the Pitchers file, as well as some 3 year old versions of the modified files. In the 2016 example, players who have a Decimal value of 49 are players who are "deleted" as Team 49 (in the 2016 version) is the "temporary" team that I always move players to for deletion. PITCHERS ADDRESS2016 BATTERS2016 PITCHERS2016
jaclaz Posted August 23, 2019 Posted August 23, 2019 (edited) Try the attached ADDRESS.mps and PITCHERS.mps. jaclaz ADDRESS.mps PITCHERS.mps Edited August 23, 2019 by jaclaz
Tripredacus Posted August 26, 2019 Author Posted August 26, 2019 The duplication of next batters does not appear in the raw data. There is no repeating of data that I can see. Easier to see differences using the Address2016 file rather than the default one. 1
jaclaz Posted August 26, 2019 Posted August 26, 2019 (edited) You are right, I was *somehow* parsing the same range twice , but the record size is correct, 50 bytes. What throws off manually viewing the file is that some entries have 00's, i.e not all entries are "populated". Attached the corrected Address.MPS, replace the old one with this (no more the "whatever" field ) Now, in "Address" Entry # should be the Team #, whilst in "Batters" and "Pitchers" the "Prefix" should be the Team #. I.e. in Batters, Batter #589 Roy White should belong to Team #42 (value of "Prefix"). And in Address, in Entry # 42 there is (last filled entry in Batters) #589. As well, in Pitchers, Pitcher #387 Mike Torres should belong to team #42 (value of "Prefix"). And in Address in Entry #42 there is (last filled entry in Pitchers) #387. jaclaz ADDRESS.mps Edited August 26, 2019 by jaclaz 2
Recommended Posts
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now