Dodona | A Natural Language Question-Answering Bot

Documentation

dodona.py
fuzzystack.py
helper.py
nlp.py
parsetree.py
session.py
webUI.py
xml_parser.py
zephyrUI.py

dodona.py (top)

Everything begins in dodona.py. This file loads Dodona's knowledge database, initializes the Zephyr settings, and enters an infinite loops. In this loop, she receives messages and keeps track of "sessions" - that is, for each user who zephyrs Dodona, she keeps track of a separate conversation. Once she receives a message, she checks the sender, and either creates a new session, or passes the message along to an already existing session.

fuzzystack.py (top)

Documentation coming soon!

helper.py (top)

Documentation coming soon!

nlp.py (top)

This file contains functions which are used by session to find topics and subtopics. nlp.py provides a function to determine the type of a sentence using productions produced by a parse (sentences can be questions, statements, or commands). It uses this sentence type to find the topic through the nouns and prepositional phrases of the sentence's object, which is specific to the sentence type's structure

get_sentence_type(parse) - determines the sentence type recursively, based on the rules the tree is built out of.

If the sentence is headed by either of the nonterminals Ind_Clause_Ques or Ind_Clause_Ques_Aux, then it is a question.
If the sentence is headed by either of the nonterminals Ind_Clause or Ind_Clause_Pl, and the verb phrase of the sentence is of type VP_Inf, then the sentence is a command.
If the sentence is headed by either of the nonterminals Ind_Clause or Ind_Clause_Pl, and the verb phrase is anything but VP_Inf, then the sentence is a statement.

find_PP(parse) - recursively searches the parsetree for the first prepositional phrase.

find_noun(parse[, exceptions]) - recursively searches the parsetree for the first noun or noun phrase which is not in exceptions.

find_compound_noun(parse) - recursively searches the parsetree for the first compound noun (that is, two or more nouns strung together, like "Cambridge City Council").

find_after_verb(parse) - recursively searches the parsetree for the first After_Verb_Tr or After_Verb_In structure. The After_Verb structures are placeholders for any combination of things that can go after a verb. If there are v verb combinations and a combinations of things following the verb, then with the After_Verb phrases, we only have to write v rules, instead of v*a rules.

find_topic(parse[, type]) - finds the topic of a sentence based on whether it is a question, statement, or command, as they all have difference structures, and therefore must be analyzed independly of one another.

Questions
Questions will always begin with a Ind_Clause_Ques or Ind_Clause_Ques_Aux structure. The important substructure of Ind_Clause_Ques and Ind_Clause_Ques will be either VP_3rd, Ind_Clause_Ques_Aux, Interrog_Clause, Ind_Clause_Inf, or Ind_Clause_Inf_3rd.

If the substructure is VP_3rd, then this constitutes a sentence such as "What is your name?", because "is your name" is VP_3rd. We would therefore like to find the object after the verb, found in a prepositional phrase or an After_Verb phrase.

If the substructure is Ind_Clause_Ques or Ind_Clause_Ques_Aux, then it will contain a Ind_Clause_Inf or Ind_Clause_Inf_3rd. See below.

If the substructure is Interrog_Clause, it may contain an After_Verb phrase or a PP, which contains the topic we want, so we search for those phrases.

If the substructure is Ind_Clause_Inf or Ind_Clause_Inf_3rd, then we can simply take the independent clause and treatit as a statement. For example, "do you know about emacs?" contains the independent clause "you know about emacs". However, the verb is an infinitive (it is the auxiliary "do" which gets conjugated). See below for how statements are handled.
Statements
In all cases with statements, the important rule begins with Ind_Clause, Ind_Clause_Pl, Ind_Clause_Inf, or Ind_Clause_Inf_3rd. All of these are of the form NP_# VP_%. Therefore, in all of these cases, we want to look for a prepositional phrase or an After_Verb phrase, either of which could contain the topic.
Commands
Commands stem from only a single rule, Ind_Clause -> VP_Inf. For example, "Tell me about emacs" is simply an infinitive verb phrase. Therefore, we search the tree until we find VP_Inf, and then look in that structure for a prepositional phrase or an After_Verb phrase.

parsetree.py (top)

This is the file in which parsing actually occurs. Dodona uses an EarlyChartParser which operates on the ContextFreeGrammar produced from our custom set of rules. The file contains options to parse whole files, single sentences, or even a single noun phrase. It also has the ability to add more vocabulary to its rule set, recreating both the parser and the grammar to incorporate these changes without necessitation a reboot of Dodona.

__init__(self[, rules_file, vocab_file]) - reads in grammar rules from rules_file (which defaults to rules.gr) and vocab rules from vocab_file (which defaults to vocabulary.gr), and creates self.cfg and self.parser from those rules.

self.rules - a list of grammar rules

self.cfg - of type ContextFreeGrammar, created from self.rules

self.parser - of type EarleyChartParser, created from self.cfg

add_new_vocab_rule(self, rule) - adds a new vocab rule to self.rules and subsequently updates self.cfg and self.parser.

parse_file(self, file) - parses sentences listed in a file.

parse_sent(self, sen) - parses a single sentence, and returns the parse or a list of foreign words.

parse_NP(self, sen) - parses a phrase, beginning with a NP instead of S.

rand_sent(self) - generates a random sentence from self.cfg.

session.py (top)

Each particular conversation is managed here. The class Session is in charge of parsing all zephyrs sent to Dodona to find topics and subtopics, and either returning the requested information or relevant error messages. Session alters its behavior in accordance to the current state of Dodona, treating responses differently, based on whether it just displayed the subtopics of a topic, if it asked about the part of speech of a word, or if it successfully answered a question.

__init__(self, name, topics, bot) - initializes the memory, name, dictionary of topics, parser, and zephyrbot unique to this Session.

self.memory - of type FuzzyStack, initialized with self.name and self.topics.

self.name - the name of the user who initialized this Session.

self.topics - the default dictionary of topics

self.parser - of type Parser, from parsetree.py. Contains useful functions for parsing sentences.

self.bot - of type IO, from zephyrUI.py. Contains useful functions for sending and receiving zephyrs.

question(self) - retrieves the most recent message, states, topic, and data from the memory, and based on their values, decides what to do. It may either attempt to learn a new word, or search for a topic in the message it just received.

clear(self) - clears all information from the memory, and reinitializeds the memory with the set of topics parsed from the XML files.

_learn(self) - begins the learning process. Asks the user what part of speech the word is, and keeps track of the remaining words which we still need to learn about.

_part_of_speech(self, mess, step) - part of Dodona's word-learning algorithm. Learns the part of speech for the word, and either moves to the next step (for example, if the word is a verb, we want to know all conjugations of that verb) or ends the learning process.

_add_new_word(self, word, pos) - adds a new word to Dodona's vocabulary. This method updates vocabulary.gr and refreshes the parsing tools contained in self.parser.

_AI(self, mess[, d, k]) - uses self.parser to parse the message. It then uses functions from nlp.py to find the noun phrase which houses the topic of the sentence. It then calls self._topic to find out if it recognizes the topic, if there is a topic as well, or if she can't determin anything from the sentence.

_topic(self, top[, d, k, ques_word]) - checks the knowledge dictionaries to see if the topic we determined is contained there. Searches for both topics and subtopics, comes up with an answer for the user, and returns it to self._AI. It finds the topic and/or subtopic by breaking the "full" topic into pieces. For example, we might have found "function keys in emacs". This is clearly the topic of the sentence, but Dodona doesn't have any entries in her dictionary for "function keys in emacs", she has an entry for "emacs", and then a subetry for "function keys". In addition, if the full topic were just "about emacs", we would still need to extract "emacs".

_topic does this semantic analysis on the full topic, and extracts subtopics and topics. It looks at all the nouns in the full topic, and tries to match them to keys in the dictionary, hoping to find a topic and subtopic. It is often difficult to tell when you should separate nouns or keep them together, which is why we find all of the nouns and noun phrases. For example, in "copying and pasting with the mouse in emacs", is the subtopic "copying and pasting", "mouse", "copying and pasting with the mouse", "copying", or "pasting"? The answer is "copying and pasting with the mouse", but depending on how the sentence was parse, that answer might not be so obvious.

If the method is not able to find a topic and subtopic from the nouns, then it searches for just a topic out of the nouns. If it can't match anything there, either, then we can't determine what the user is asking.

webUI.py (top)

Documentation coming soon!

xml_parser.py (top)

Documentation coming soon!

zephyrUI.py (top)

Takes care of the interface to zephyr. Most importantly, it defines:

__init__(self[, c]) - initializes the python-zephyr utilities, and sets self.cls to c (which defaults to dodona-test).

self.cls - the class on which messages are sent and recieved.

send(self, mess[, name]) - sends a string mess to the class self.cls, optionally prefixing the message with name.

receive_from_subs(self[, return_sender]) - receives messages from the class self.cls which are not from itself or empty. If return_sender is specified, then return the sender of the message in addition to the message body, otherwise just return the message body.