Jump to content

Converting Documents into a dictionary


Recommended Posts

Hi guys!

I have a customer that has created lots of documents with words and their meanings etc...

Does anyone know how he can convert these documents into a dictionary?

bookie32

Link to comment
Share on other sites


Hi Tripredacus:D

It is a long story....

This customer is a writer and has written several books here in Sweden..

In the nineties he started creating word documents with words to become a dictionary....he has been doing it since then and has God knows how many documents written in word that he wants to convert to a dictionary...

I know that you can actually create your own dictionary in Word....-but he didn't go about it that way...

So now he has all these word documents he wants to convert....and then create a pdf of everyting.....

 

booki32 

Link to comment
Share on other sites

I'm sure 90s DOC is competely different than modern DOC/X formats. If he had used any sort of format over the years, it is possible to write a script or program to parse the files and put the information into a single file, or into a database to then generate a single file. I doubt there is any ready-made solutions for what you are looking for.

Link to comment
Share on other sites

The file format (i.e. them being one of the various .doc or .docx formats) is largely irrelevant, as there are many converters to plainer formats such as .txt  or .csv(I have to guess Unicode as Swedish has a lot of "strange" characters), given the intended use, losing the formatting of the text might be not a problem (or it may be one, as usually bold and italic are widely used in dictionaries).

The real issue is that if these .doc's are more "freestyle notes" than anything else it will be tough to write a program/script capable of separating properly the fields.

Essentially a dictionary is structured as a two field database, term/definition or key/value, if there is a meaningful, possibly unique, delimiter between the two, importing/converting the files will be easy to script, still there wil be errors/edge cases and what not.

Then, a dedicated "dictionary/lexicography" tool might be needed (example):

https://tshwanedje.com/tshwanelex/

https://tshwanedje.com/tshwanelex/overview.html

for editing/assembling/formatting.

jaclaz

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...