Thursday 2 February 2017

Reading project - Status by word guess

This is the state of the incomplete parts of Bob's Bible as of the moment.

My algorithms guess translations as I proceed. I start with sets of 6 Hebrew words in sequence that are completed and compare the remaining sets of 6 that are not done. If I get a match here, the odds are that the translation should be consistent over the phrase. It was important that I not do things piecemeal or this process would fail. I must know what is done and what is not done or I will overwrite my work. (I learned this the hard way.) I then work with 5 words at a time, then 4 and so on down to 1 at a time (1 at a time is sometimes has a better guess than 2 or 3, but is rarely better than 4, 5, or 6 at a time.)

For the books I have not yet completed, here is my status by word guesses excluding the single guesses. You can see that the guesses will make a large difference in typing and lookup time. There are still plenty of decisions to make for each word and phrase. (And I can sure waste a lot of time trying to get all my figures to balance. I must redesign my data at some point. That's a big job - I probably won't get around to it without a significant amount of funding so I could hire some trained folks in Oracle, Hebrew, Unicode, Music, Linguistics, Biblical Studies and so on and so on. I.e. not at present!)

Completed chapters are a long way from these guessed words, but the guesses reduce the translation time for a chapter by an order of magnitude. They also enforce consistency. (But I still spend a lot of time checking consistency and concordance after the fact. I just completed a design of a complete verse checker that will help me see inconsistent phrasing where perhaps I have no excuse for it.)

The system also allows me to be interrupted continuously without losing my place. Interruption is the nature of life.

The progress (above) may suggest that Deuteronomy, Jeremiah, 2 Kings, and 2 Chronicles have a lot of shared words and phrases. I could do more extensive analysis here, but I will leave that till later in the project.

Here is some more detail on the guesses. About 49000 pairs are guessed and 88,500 singles. These would overwhelm the y axis below, so I only showed the patterns of similar phrases for length 3 or longer.  About 23,000 guesses of what I have yet to do can be seen in this graph. Overall by word today, Among the guesses, just over 160,000 are guessed by phrase pattern matching, and just under 36000 are unknown. Of these just over 1000 I have not even assigned a semantic domain to - probably a lot of place and person names among these.




No comments:

Post a Comment