Saturday 10 October 2015

Playing with the puzzle pieces of the ancient text

My estimated ending for this reading I am in the process of doing is about 2.5 years away. So I thought I would try and reduce the time it takes me. To do this I have invented an algorithm to translate the work I have not yet read based on the work I have already done. This will impose a style as well as glosses on the work I have not yet read. There are a few later books in Aramaic that do not happily submit to my approach, but there we are.

Here it is. Really simple. Take sequences of 5 words that are already 'translated' and wherever those sequences recur, impose the existing translation on them. This works for 1320 words. At the same time, impose the calculated root, and semantic domain already decided for the words that were done by hand (about 37,700 of 300,000 or so, so far). Then do it for four-word sequences, then three, then two. The lower the number of words you look at the higher the probability that the guess will be not quite 'right'. Then put in a guess based on the remaining single words that are not paired with any existing translated tuple of words.

It takes about a minute for the computer to run this algorithm. And I could spin it out to happen every time I change something but there's no need. Every week or so I run it and it reduces again the number of words that are unknown and as I classify and confirm the roughly derived root of each unknown word and classify/reclassify them, the whole process increases in speed.

Before I did this for the first time this past week, I had about 180,000 single word guesses and 83,000 unknowns, and 50% of the time, the guesses were not useful. Now I have 132,000 single word guesses, the same 83,000 unknowns but about 48,000 multi word guesses that are more likely to be acceptable. These should help reading and speed up the writing process. We'll see.

I have written routines to parse the Hebrew character by character, but they are detailed and complex and uncertain. I use them occasionally to analyse where I am. The grammar analyzer shows me where there are prefixes and suffixes and wandering letters that I have not got a handle on, but to reverse it and take into account the sequences and probabilities of tense, aspect, mode, construct and the infinite variations in the use of prepositions seems a long task. Probably easier to read without it.

[There was a bug in my auto-translate routines which I had corrected but not noticed that it wiped out 450 of my previously manually translated words. That's a little over 1%. But the sideswipe showed me that those had been inconsistently translated! Worth the review. Consistency isn't everything. In fact in acrostics it is impossible and with language change over time, it is just not the way language works, but for the puzzler, it is an issue and possibly worth correcting. The bug shows me how to test for consistency and difference as the project progresses.]

I am currently working on Genesis 2 - hah! I left this till quite late since I knew I would be biased. - How do you like this touch (if it remains).

And Yahweh God planted a garden in Eden from confrontation
and he set up there the earthling that he fashioned.

קדם is a set of homonyms - might mean in the East, or might mean of old or might mean confrontation such as happened between the earthling and Yahweh. Happened? Happens all the time to all of us. We are from of old confrontational whether we be from East or West. (In Genesis where the word is used twice in this passage, it has different accentuation in the two cases, the first on the first syllable and the second on the second syllable. See the music.)



No comments:

Post a Comment