Jump to content

How to determine hex locations? (flat file database structure)


Recommended Posts

I am working on a project in which I want to be able to create a workaround to a flat file database issue, referenced here:

https://msfn.org/board/topic/179737-cause-and-workaround-for-a-50k-file-size-limit-in-dos/

The problem in particular is that two files are potentially a problem, as they have a fixed size or addressable size. The software does not replace data, rather it appends it and does not actually recognize the EOF. The EOF is actually in the middle of a dataset, so when data is entered, it writes the half that it can, then duplication and data tabulation bugs begin to occur. I am fortunate that I have a backup from just before and just after this issue appeared, as well as the original files. I have already attempted to manually figure out how the addressing works in my design, but keep running into problems getting the correct results.

So there is something I am just not grasping about how these files work and maybe someone has an idea about it. Perhaps whatever the programmer used to create this has a named method that I can look into... if I knew what it was.

The files that this program uses are non-indexed, non-addressed but in hex. It uses hex values and appears to "count" entries (or known block sizes) to find relevant data. Examples:

FILE1 has 44 blocks per entry
FILE2 has 51 blocks per entry
ADDRESS refers to data in either files in blocks of 30+20 (15+10)

With an example in address of 01 00 within the first block of 30 (or 15), this is pointing to the first entry in FILE2 (Offset 0) file. If an address of 01 00 in the first block of 15 (or 10) which is the second section, it points to the first entry in FILE1 (offset 0) file. In the 3rd block of 30, an address of 10 00 is the 16th* entry (Offset 40B) in the FILE2 file, and so on and so on.

Does this method have a name? How would I search for how this data is being organized?

* This is one example where I can't figure out the math on locating the data manually. It is some math error I am making in my head. I know that 10 in hex is 16 in decimal, I know the data is the 16th entry (I counted it, and this is why I am using low numbers in my examples) but I do not know how 40B is determined. 40B in hex to dec is 1035, divided by 51 is ~20 not 16. 40B is the offset shown in the hex editor.

Link to comment
Share on other sites


Here is what the beginning of the address file looks like:

6MYvIro.jpg

EDIT: Typo in this picture, "03 is first, should be 03 is third) :blushing:

Here is where the address file moves past 255 and FILE2 example data:

j74cmMv.jpg

The two questions I have are:

1. Does this flat file database method have a particular name?

2. How do I determine the Offset of an entry using the address file. In this example, 03 00 (from Address file) points to the third entry in FILE2 which starts at offset 8A.
8A to Dec is 138 / 3 = 46. Since the data entries are in 55 byte groups, this is the wrong math. I need to be able to take the entry from the address file and find it in FILE1 or FILE2.
Is the problem that I am doing a decimal conversion in trying to figure this out?

FILE2 reference. First entry starts at offset 0. Second entry starts at offset 45, third entry is at offset 8A.

Link to comment
Share on other sites

First thing, it is a two byte numbering, little endian.

I.e. what you call "01" is a actually "01 00" and it means 0x0001.

The 0101 is actually 257 , if you prefer the sequence is:

0x00FE 254

0x00FF 255

0x0100 256 

0x0101 257

So you can have at the most, since 0x0000 is not used, max 0xFFFF values, i.e. 65535.

It is "queer" how there are holes in the numbering,

I continue to fail to understand half of what you are saying, what (the heck) do you mean by:

Quote

EDIT: Typo in this picture, "03 is first, should be 03 is third) 

How (the heck) are you determining that the entry "Mike" is corresponding to entry #03?

Can you post the first (say) 512 bytes of File 2 (as opposed to an arbitrary small snippet of it starting at 0x0080?) 

You can replace text like "Mike" with "wxyz" if there is any sensitive data.

Loosely, it resembles a lot  a FAT 16:

http://www.ntfs.com/fat-allocation.htm

jaclaz

Link to comment
Share on other sites

I do not know whether or not if the software is actually storing the data in hex. There is no key as to what kind of files these are, and I only have decoded how they work by opening the files in a hex editor, making a change with the program and comparing the changes. I think that the data is not actually in Hex, but I am viewing it in a hex editor. I was unable to view the data properly in Notepad or Scite, but it did show properly in UltraEdit... but it automatically had shown the data in the hex editor mode.

I know Mike is entry #3 because it is the third entry in FILE2. That was my main concern. The ADDRESS file only is storing the position of the data entry in FILE2. BUT in FILE2, the entries have no index value. I can easily look at the ADDRESS file and say "ok I need to find the 200th entry in FILE2" but then I go into FILE2, I would have to manually count the entries (and hope I didn't miscount) to find what is in the 200th entry. What I want to do is determine how to calculate where in file 2 I can read the data, by either the offset shown in the hex editor, or by using a character position/count. So I will need to know this math portion for when I write a program to manage the "deleted" content. So far, my attempts at determining the position of the data in FILE2 from the values in ADDRESS have not worked, Even the example of the Mike I can't even math that properly.

Link to comment
Share on other sites

After reviewing both threads, it appears the high scores (or similar) database for your DOS game has filled up. Data entries are added by appending them to the end of file #2. Entries in file #2 are "deleted" by simply not referencing them from the index file #1. File #2 was never expected to grow through decades of use to anywhere near 50KB. A design flaw has limited that growth to 50991 bytes preventing any more additions or deletions.

My only question is the name of the DOS game. Please attach a directory listing along with full copies of the two database files so jaclaz, others, and/or I can verify your analysis.

Link to comment
Share on other sites

The licensing status is "unknown" as it is one of those products made by a company, who was bought by another company, that was folded into another company, etc. The common thought is that it is "abandonware" and can be found on those websites. The program is called "Major League Manager" by Spinnaker Software. It runs on DOS. The only version online is the "1990" release (has 1989 rosters) but apparently the older versions have not been found.

The insinuation that the designer never expected it to be used "for years" is true... BUT this bug the software has is not due to an extreme length of time it is used. I first encountered this issue within a year of using the software initially back in the early 90s. The game is designed for you to create teams, and to create players for those teams. It also allows you to "delete" players, whether they are created or built-in. If you created 10 teams and the 25 players for each of those teams, you'd hit the bug quite quickly.

I have attached my notes regarding the files. For the purpose of clarity, in this thread "FILE2" is BATTERS and "FILE1" is pitchers.
Also to note: the stock program does not have any teams in custom league. In my notes, "MLM2018" is my active custom version that has about a half dozen slots left in the BATTERS file. "MLM2019" was the next instance I was working on that encountered the EOF situation.

FileDetail.txt

Link to comment
Share on other sites

Can you post the actual files?

 

This:

Quote

The following data:
01 00 02 00 03 00 04 00 05 00 06 00 07 00 08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00
Represents these batters 01-0F (Brady Anderson through Craig Worthington)
Where 00 is the delimiter.

Is not accurate, 00 is - as said above - not a delimiter, it is more likely the higher value byte of a two byte numbering, 

Quote

Team data consists of 15 batters and 10 pitchers. 

Good, this explains why second grouping stops at 0x00A0.

jaclaz

Link to comment
Share on other sites

You are right. It is closer to being a section. 01 00 is the first player in the first "group" of the batters file. Because further along, there is a 01 01, which is the first player int he second "group".

This is only evident in the stock files, which are "in order" like this, where everyone in the first group are in order in the Address file. Files from the updated version are not so nice to look at.

I have attached address and batters from the stock version.

ADDRESS BATTERS

Link to comment
Share on other sites

Ok, in the meantime get TinyHexer from here:

https://www.softpedia.com/get/Others/Miscellaneous/tiny-hexer.shtml

Put the attached ,mps in the \mirkes.de\Tiny Hexer\scripts\Structure Viewer folder and use it on the BATTERS file.

There is a little glitsh as the single quote character in the batter name makes things "go green", but it is only a visual one.

Of course I need BOTH the "stock" and the "modified" files to go on.

jaclaz

 

Batters.jpg

BATTERS.mps

Edited by jaclaz
Link to comment
Share on other sites

This programs certainly looks a bit different. Attached is the Pitchers file, as well as some 3 year old versions of the modified files.

In the 2016 example, players who have a Decimal value of 49 are players who are "deleted" as Team 49 (in the 2016 version) is the "temporary" team that I always move players to for deletion.

PITCHERS ADDRESS2016 BATTERS2016 PITCHERS2016

Link to comment
Share on other sites

You are right, I was *somehow* parsing the same range twice :w00t: :blushing:,

bart-simpson-generator.php?line=I%2Bwon'

but the record size is correct, 50 bytes.

What throws off manually viewing the file is that some entries have 00's, i.e not all entries are "populated".

Attached the corrected Address.MPS, replace the old one with this (no more the "whatever" field ;))

Now, in "Address" Entry # should be the Team #, whilst in "Batters" and "Pitchers" the "Prefix" should be the Team #.

I.e. in Batters, Batter #589  Roy White should belong to Team #42 (value of "Prefix").

And in Address, in Entry # 42 there is (last filled entry in Batters) #589.

As well, in Pitchers, Pitcher #387 Mike Torres should belong to team #42 (value of "Prefix").

And in Address in Entry #42 there is (last filled entry in Pitchers) #387.

jaclaz

 

ADDRESS.mps

Edited by jaclaz
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...