Natural language processing: Russian-English translator

Title:

Creation of an Interactive Russian-English Translation Program

Objective:

The goal of this project is to produce a program which can translate Russian 
sentences into English, using user input to check sentence parsing.

Justification:

Though many translation programs exist, most are made to scan automatically 
and produce immediate output. Rather than correcting a simple mistake in the 
program’s parsing of the source language, the human translator must search for 
and correct all of the resulting mistakes in the translation. By asking the user to 
correct parsing, an interactive program could improve the efficiency of a human 
translator working with a machine.

The program will translate from Russian to English, rather than the reverse, 
because Russian conveys semantic and syntactic information primarily through 
word endings rather than order. Word endings are more rigid and easier to detect 
than word order. Creation of sentences in English, also, is less nuanced than Russian. Russian uses word order to convey subtleties of meaning, rather than requiring specific orders to make sense. The output will be in the form of simple, formulaic English sentences that bluntly convey the same basic information from the Russian input, excluding any information conveyed by word order.

Description:

The program will first perform an analysis of the input text, attempting to parse 
the sentence and interpret meanings where possible. This would probably be best 
achieved using object orientation to store words along with their translation, 
possible alternate meanings, parts of speech, grammatical exceptions, and functions 
to generate forms. The standards of accuracy for the analysis will not be as high as 
for other translators, since the user will be given the opportunity to correct the 
analysis, but should be good enough that the user does not spend too much time 
correcting the parse.

After performing its parse, the program will present its interpretation to the user 
for review. The user will be given the opportunity to correct errors, including 
identification of antecedents, cases of words, relevant semantic information, and 
proper translations of word in context. Using this corrected information, as well as 
translation algorithms based on correspondences between Russian and English 
structures, the program will produce an output in English to convey the same 
information as the input.

Limitations:

Experts have spent years on the problems of natural language processing. The 
chances of producing a professional-quality translator within a year are extremely 
low. Even limiting the complexity of the input language and requesting user input 
to correct the automatic parse, simply presenting the correct fields for verification
without cluttering the screen unnecessarily could be difficult. The dictionary, also,
could pose problems. It must store all of the words, along with their parts of 
speech, grammatical exceptions, and basic semantic information relevant to 
morphology. To make the translations correct, even a relatively small dictionary 
would require a large amount of disk space. In order to complete the project 
within the year, the complexity of both the source and target languages will have 
to be severely limited. The English sentences produced will convey the same 
basic information as the Russian, at the expense of linguistic elegance and 
directly equivalent structures.