|New Mexico Supercomputing Challenge|
Challenge Team Interim Report
One of the major reasons computers are formidable for many is simply that they have a language of their own. When we talk to people to ask them to do something or to get information, do we have to go through manuals and training courses? Why should this be necessary in order to work with computers? I have set out to start the process that may eventually change this. I hope to write a program that can read plain English (and that can later be adapted to read other languages) and understand it in order to gather information and answer questions about what it has gathered. It will break sentences down into components to build an internal description of the meaning of the sentence, which will then be compared with other sentences to find the flow of logic (if any) in what it has been given. When a request for information is made, it will build sentences from relevant information stored in its database.
To break down the sentences, the program will have a dictionary of words and the parts of speech they can take. For example, it could store the word "four" like this: "four:0,4." In this case, the numbers represent the parts of speech the word "four" may take--0 representing noun, and 4 representing adjective. From this and information about sentence structure, it will then assign another number to each word to show its function in the sentence, and the correct number for the part of speech would be selected from the list to go with the function. During reading, these sentences, in their parsed form, will be written to a database. Later, when the database is queried, the program will break the sentence down to determine the nature of the request so it can search the database (possibly using a dictionary of synonyms to avoid missed connections) to find relevant information. Finally, it will assemble this information into sentences that will be output to the user.
Most of the work so far has been planning--determining how to break the sentences down, how to represent them internally, and how to work with the information gathered from the sentences the program reads. I have determined the numbering scheme that will be used to describe words--the numbers representing what part of speech the word has taken and in what part of the sentence it is located. I have also worked on some very crude code to break the sentences down. However, most of the actual coding still needs to be done.
When done, this program should be able to read documents not necessarily written for its comprehension--it should be able to understand English in the form people use. I hope to solve the major problem of getting around grammatical errors--parsers in general expect their tokens to be in the correct order. Also, I hope to find a way around misspellings (possibly relying on the ispell spell checker). Although the program may not immediately be able to read English as well as humans do, it will be a step toward that ultimate goal.
For questions about the Supercomputing Challenge, a 501(c)3 organization, contact us at: consult1516 @ supercomputingchallenge.org
New Mexico Supercomputing Challenge, Inc.
80 Cascabel Street
Los Alamos, New Mexico 87544