Zork – Interpreter

One of the biggest challenges of this project is trying to create a flexible system that can understand the English language. How do I get the computer to understand that certain words relate to certain actions and objects when there are so many possible configurations of words to use?

Overview

The code in this lesson may be a little hard to follow without a proper initial overview. Our goal will be to create a class which can “understand” a user’s typed input by converting it into an “interpretation”. This means it will have some idea of an action that needs to be performed, as well as what, if any, objects to perform the action upon or with.

There is a large variety of sentences that can be understood by Zork. Here are a few templates and examples:

  • {ACTION}
    • “north”, exits in cardinal directions, or abbreviations like “n”
    • “save”, “load”
  • {ACTION} {TARGET} where the target can be referenced by a noun, or more specifically referenced by including adjectives
    • “open the small mailbox”, or simply “open mailbox”
    • “read the leaflet”
  • {ACTION} {TARGETS} where the targets can be separated by commas and / or “and”
    • “take the elvish sword and brass lantern”, or “take sword and lantern”
    • “take egg, sword and lantern”
  • {ACTION} {TARGET} {SPECIFIER} where the specifier could be something like an adverb or a preposition, and the target and specifier can appear in any order
    • “turn lantern on”
    • “turn off lantern”
  • {ACTION} {TARGET(S)} {SPECIFIER} {TARGET}
    • “put the treasure in the case”, or “pour the water on the ground”
    • “light candles with match”
    • “take the sandwich and garlic from the sack”

Our database currently has a list of “nouns” and “adjectives” which will be useful in determining what objects a user might be intending to refer to. The rest of the sentence doesn’t yet have a clear way to be understood. To solve this, any system which I create will register key words with the interpreter so it can better split apart and map a sentence. The system will provide “commands” which are usually verbs like “take” or “read” but could also include things like the direction a user wants to go “NE”, “West”, “Up” etc. It will also need to provide any sort of specifier that it needs in order to work with the command which will be something like an adverb or preposition. At this point the interpreter will have all of the vocabulary necessary to complete an interpretation.

To map a sentence I follow a particular order. First I look for the command of the sentence. If no command is found then I dont have any system registered which can understand what to do with the input. The input is considered invalid and I abort early. If the command is found I remove it from the sentence so I am only left with the remaining words to map out.

Next I look for a specifier word. If it is found I then split the sentence based on the range before the specifier and the range after the specifier. Otherwise I just have one range to consider.

Using the remaining ranges I split the words into a target or targets if there is anything to split on such as a comma or joining word like “and”. For each target I can then check for adjectives to help filter the possible number of candidates to return.

Let’s follow the above steps on an example sentence:

  1. “take sandwich and garlic from sack” – typed into UI by user
  2. “sandwich and garlic from sack” – “take” is recognized as a command and removed
  3. [“sandwich and garlic”, “sack”] – “from” is recognized as a specifier and the sentence is split into two target ranges
  4. [“sandwich”, “garlic”] – the first target range is split further by the word “and”

What we are left with is a command: “take”, a specifier: “from”, two primary targets: “sandwich” and “garlic”, and a secondary target: “sack”. The interpreter will then look up all of the targets in the database for candidates. These are any entities that match the “noun” and “adjectives” that were found when determining what the target was.

Note that there can be zero, one or more than one candidate match for any given target. For example, the user may say “open window” but the database will find multiple entities which can be described as a window – some boarded windows along the side of the house as well as a window in the back of the house which is not boarded. It will be up to the implementing system to decide how to filter and validate the candidates that are supplied.

User Action

I mentioned before that my systems would register certain commands and specifiers with the interpreter so that it has a complete set of vocabulary to work with. The registration process will be based on the following protocol:

import Foundation

protocol UserAction {
    var commands: Set<String> { get }
    var specifiers: Set<String> { get }
    func handle(interpretation: Interpretation)
}

Interpretation

The final result returned by our interpreter is stored as a new object called an interpretation. This will include a variety of information that can be useful to the systems, including the break down of the command, specifier, primary and secondary targets.

import Foundation

class Interpretation {
    static let invalid = Interpretation(command: "", specifier: "", primary: [], secondary: [])
    
    let command: String
    let specifier: String
    let primary: [Target]
    let secondary: [Target]
    
    init(command: String, specifier: String, primary: [Target], secondary: [Target]) {
        self.command = command
        self.specifier = specifier
        self.primary = primary
        self.secondary = secondary
    }
}

Target

The interpretation’s arrays for primary targets and secondary targets are also a new object which holds a variety of relevant information. We include the original user’s input which is handy in the event that there are no potential matches or for when we want to know how a user referred to an entity. We also include an array of the entities which matched a database fetch based on the nouns and adjectives provided.

I also include a “match” and “error” field which can be used in iterative filtering and validating for the entity which was provided for a particular action. The match will always be the single entity which best fits the intent of the sentence, or will be nil when it is unclear or there are not matches. In those cases an error message should be available which can be presented to the user. Note that I only allow the error to be set while the current value is nil. This way the first iteration of filtering or validation to fail is the the one whose error message will be stored.

import Foundation

class Target {
    let userInput: String
    var candidates: [Entity]
    var match: Entity? = .None
    var error: String? {
        get {
            return _error
        }
        set(newValue) {
            _error = _error ?? newValue
        }
    }
    private var _error: String? = .None
    
    init(userInput: String, candidates: [Entity]) {
        self.userInput = userInput
        self.candidates = candidates
    }
}

Noun

Let’s go ahead and add a model which relates to the Noun table in our database. We will be using it in the interpreter in order to know what words actually map to a game entity. The structure of this model and its extension should already be familiar to you based on the previous lesson where we created models for the entity and component, etc.

import Foundation
import SQLite

struct Noun {
    
    // MARK: - Fields
    static let component_id: Int64 = 6
    static let table = Table("Nouns")
    static let noun_id_column = Expression<Int64>("id")
    static let text_column = Expression<String>("text")
    let row: Row
    
    // MARK: - Properties
    var id: Int64 {
        get {
            return row[Noun.noun_id_column]
        }
    }
    
    var text: String {
        get {
            return row[Noun.text_column]
        }
    }
}

extension Entity {
    func getNouns() -> [Noun] {
        let components = getComponents(Noun.component_id)
        let ids = components.map({ $0.dataID })
        let table = Noun.table.filter(ids.contains(Noun.noun_id_column))
        let result = DataManager.instance.prepare(table).map({ Noun(row: $0) })
        return result
    }
}

Adjective

We will also need a model for the adjective table:

import Foundation
import SQLite

struct Adjective {
    
    // MARK: - Fields
    static let component_id: Int64 = 11
    static let table = Table("Adjectives")
    static let adjective_id_column = Expression<Int64>("id")
    static let text_column = Expression<String>("text")
    let row: Row
    
    // MARK: - Properties
    var id: Int64 {
        get {
            return row[Adjective.adjective_id_column]
        }
    }
    
    var text: String {
        get {
            return row[Adjective.text_column]
        }
    }
}

extension Entity {
    func getAdjectives() -> [Adjective] {
        let components = getComponents(Adjective.component_id)
        let ids = components.map({ $0.dataID })
        let table = Adjective.table.filter(ids.contains(Adjective.adjective_id_column))
        let result = DataManager.instance.prepare(table).map({ Adjective(row: $0) })
        return result
    }
}

Label System

Here I decided to create a separate system for fetching nouns and adjectives. I could have just used static extension methods as I did for the entity, but I thought I would show another work flow which might feel more true to the ECS architecture pattern.

import Foundation
import SQLite

class LabelSystem {
    class func fetchNoun(text: String) -> Noun? {
        let textNoCase = Expression<String>("text").collate(.Nocase)
        let table = Noun.table.filter(textNoCase == text)
        guard let match = DataManager.instance.prepare(table).first else { return .None }
        let result = Noun(row: match)
        return result
    }
    
    class func fetchAdjective(text: String) -> Adjective? {
        let textNoCase = Expression<String>("text").collate(.Nocase)
        let table = Adjective.table.filter(textNoCase == text)
        guard let match = DataManager.instance.prepare(table).first else { return .None }
        let result = Adjective(row: match)
        return result
    }
    
    class func fetchEntitiesForNoun(noun: Noun) -> [Entity] {
        let result = Entity.fetchAllByComponentID(Noun.component_id, dataID: noun.id)
        return result
    }
}

Interpreter System

The overview does a good job explaining what happens in this system, but there are a few fine points to discuss:

import Foundation

class InterpreterSystem {
    
    // MARK: - Singleton
    static let instance = InterpreterSystem()
    
    // TODO: - ADD CODE HERE
}

This class is implemented as a singleton. Although there will only be one place that needs to convert user input into an interpretation, there will be many classes registering user actions. The class will also hold its own fields so the singleton pattern worked well for me.

// MARK: - Fields
private var verbs: Set<String> = []
private lazy var verbRegex: String = {
    return "\\b\(self.verbs.joinWithSeparator("\\b|\\b"))\\b"
}()

private var prepositions: Set<String> = []
private lazy var prepositionRegex: String = {
    return "\\b\(self.prepositions.joinWithSeparator("\\b|\\b"))\\b"
}()

The interpreter will maintain a set of verbs (actions or commands) and prepositions (the specifiers which could also be adverbs). I named them using parts of speech here because I will also be referring to nouns and adjectives while mapping sentences so it may provide a better context to understand than commands and specifiers even if it isn’t technically accurate based on its potential contents. By using a set for each I dont have to worry that more than one system may use the same command or specifier, I will only need to store one of each.

I also included a lazy property which creates a regex for both sets of words. Since it is lazy it wont be created until I actually start trying to interpret user input, which shouldn’t be able to happen until after all of the systems have had a chance to setup and register properly. The regex looks for any of the words using the word boundary. For example I want to find “look” as an action but not “looking” which might be an adjective for “looking glass”. Likewise I want to find “take” as an action but not “stake” which could be a noun representing an entity target.

private var handlers: [UserAction] = []

// MARK: - Public
func register(action: UserAction) {
    verbs.unionInPlace(action.commands)
    prepositions.unionInPlace(action.specifiers)
    handlers.append(action)
}

I expose a public method for the systems to register user actions, and store them in a private field that is an array of all registered handlers.

func handleUserInput(input: String) {
    let result = interpret(input)
    
    guard result !== Interpretation.invalid else {
        LoggingSystem.instance.addLog("I don't understand")
        return
    }
    
    for handler in handlers {
        handler.handle(result)
    }
}

Another public method is exposed in order to actually try to create an interpretation object from a string of input. When the interpretation is invalid (such as when no command was recognized) then the interpreter aborts early and provides a message to the user that it didn’t understand what to do. Otherwise, it loops through all of the registered handlers giving them an opportunity to react to the new interpretation.

// MARK: - Private
private func interpret(input: String) -> Interpretation {
    var text = input.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
    
    var command: String = ""
    var specifier: String = ""
    var primary: [Target] = []
    var secondary: [Target] = []
    
    let searchOptions: NSStringCompareOptions = [NSStringCompareOptions.RegularExpressionSearch, NSStringCompareOptions.CaseInsensitiveSearch]
    guard let commandRange = text.rangeOfString(verbRegex, options: searchOptions) else {
        return Interpretation.invalid
    }
    
    command = text.substringWithRange(commandRange).uppercaseString
    
    text.removeRange(commandRange)
    text = text.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
    
    if let specifierRange = text.rangeOfString(prepositionRegex, options: searchOptions) {
        specifier = text.substringWithRange(specifierRange).uppercaseString
        
        var elements: [String] = []
        
        let preText = text.substringWithRange(text.startIndex ..< specifierRange.startIndex)
        if preText.characters.count > 0 {
            elements.append(preText)
        }
        
        let postText = text.substringWithRange(specifierRange.endIndex ..< text.endIndex)
        if postText.characters.count > 0 {
            elements.append(postText)
        }
        
        if let preTargets = elements.first {
            primary = splitTargets(preTargets)
        }
        
        if let postTargets = elements.last where elements.first != elements.last {
            secondary = splitTargets(postTargets)
        }
    } else {
        primary = splitTargets(text)
    }
    
    return Interpretation(command: command, specifier: specifier, primary: primary, secondary: secondary)
}

This somewhat lengthy method starts the process of mapping the sentence just like I mentioned in the overview. It starts out by finding the command and specifier and splitting the target ranges based on a specifier being found.

private func splitTargets(text: String) -> [Target] {
    guard let regex = try? NSRegularExpression(pattern: "\\b(?i)AND\\b", options: []) else { return [] }
    let nsString = text as NSString
    let splitter = ","
    let modifiedString = regex.stringByReplacingMatchesInString(text, options: [], range: NSRange(location: 0, length: nsString.length), withTemplate: splitter)
    let entries = modifiedString.componentsSeparatedByString(splitter).map({ $0.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet()) }).filter({ $0 != "" })
    var items: [Target] = []
    for entry in entries {
        let item = mapTarget(entry)
        items.append(item)
    }
    return items
}

The “splitTargets” method takes the range of a sentence which appears before or after a specifier and splits it into potential targets. It is able to handle splitting on commas or on the word “and” but does so by replacing occurrences of the word “and” with a comma and then splitting the string into an array based on the comma character.

private func mapTarget(input: String) -> Target {
    var words = input.componentsSeparatedByCharactersInSet(NSCharacterSet.whitespaceCharacterSet()).filter({ $0.characters.count > 0 })
    
    var noun: Noun?
    for (index, element) in words.enumerate() {
        if let result = LabelSystem.fetchNoun(element) {
            noun = result
            words.removeAtIndex(index)
            break
        }
    }
    
    guard let matchingNoun = noun else { return Target(userInput: input, candidates: []) }
    let entities = LabelSystem.fetchEntitiesForNoun(matchingNoun)
    
    var adjectiveIDs: [Int64] = []
    for word in words {
        if let adjective = LabelSystem.fetchAdjective(word) {
            adjectiveIDs.append(adjective.id)
        }
    }
    
    let candidates = entities.filter { (entity) -> Bool in
        let entityAdjectiveIDs = entity.getAdjectives().map({ $0.id })
        for adjectiveID in adjectiveIDs {
            if !entityAdjectiveIDs.contains(adjectiveID) {
                return false
            }
        }
        return true
    }
    
    return Target(userInput: input, candidates: candidates)
}

Now that we have filtered the sentence down to what we believe is a single target, we use this method to break up the remaining words and attempt to find matches in our database. First we look for a matching noun because there should be exactly one. If we don’t find a matching noun then we can abort early with no potential candidates. Otherwise we perform a fetch for all of the entities in the database that share that noun.

Next, any remaining words are considered adjectives which describe our target. We loop through the words to see which of them appear as registered adjectives in our database. If we find a matching adjective then we loop through our candidates and remove any which do not also share that adjective.

An entity may include adjectives but we only provide filtering when the user does. For example if the only word mentioned is “dagger” then we might be able to return a “rusty dagger” and “jeweled dagger” from the database. If the user specifies “rusty dagger” and the system recognized “rusty” and the “jeweled dagger” is NOT rusty, then it will be removed as a potential candidate and only the single “rusty dagger” will be returned. If I use a word that is not registered such as “the” or “snotty” then no candidates will be filtered based on those words. For example, “take the snotty dagger” would end up interpreting “the” and “snotty” as adjectives but not find them in the database so any “dagger” would still be a potential candidate.

You are welcome to change this pattern if you like. You could force all adjectives to be found and matched such that “snotty dagger” would not return a match even if there were daggers, simply because the game didn’t understand “snotty” – the system which handles the command might end up responding with something like “I don’t see a snotty dagger.” or some other snarky reply. You might also choose to ignore common words such as “the” so you didn’t have to add it to every entity.

Demo

This has been a pretty long lesson, but I never feel satisfied without some sort of demo to show what we’ve accomplished. So for this demo, we will implement a sample “User Action” that allows us to see how various sentences will be diagrammed into the various parts such as the command, specifier and targets.

To begin, replace the previous demo code in the “GameViewController” with the following snippet. I have created a short user action which includes a variety of sample commands and specifiers so you can try out a good variety of sentences with just this single action:

// TODO: Delete demo code
extension GameViewController: UserAction {
    
    var commands: Set<String> {
        get {
            return ["TAKE", "PUT", "OPEN", "READ", "TURN", "NORTH", "NE"];
        }
    }
    
    var specifiers: Set<String> {
        get {
            return ["IN", "OUT", "FROM", "ON", "OFF"];
        }
    }
    
    func handle(interpretation: Interpretation) {
        LoggingSystem.instance.addLog("command: \(interpretation.command)")
        if interpretation.specifier.characters.count > 0 {
            LoggingSystem.instance.addLog("specifier: \(interpretation.specifier)")
        }
        if interpretation.primary.count > 0 {
            LoggingSystem.instance.addLog("found \(interpretation.primary.count) primary target(s):")
            for target in interpretation.primary {
                LoggingSystem.instance.addLog("  * \(target.userInput) has \(target.candidates.count) candidate(s)")
            }
        }
        if interpretation.secondary.count > 0 {
            LoggingSystem.instance.addLog("found \(interpretation.secondary.count) secondary target(s):")
            for target in interpretation.secondary {
                LoggingSystem.instance.addLog("  * \(target.userInput) has \(target.candidates.count) candidate(s)")
            }
        }
    }
}

We will also need to remove the statement in the “textFieldShouldReturn” method that called our demo method:

// REPLACE THIS:
examine(input)

// WITH THIS:
InterpreterSystem.instance.handleUserInput(input)

Finally we will need to register our demo user action with the interpreter system so that it has some vocabulary to work with. Add the following snippet to the end of the “viewDidLoad” method:

InterpreterSystem.instance.register(self)

That’s it, go ahead and build and run and try it out for yourself. Here are a few sentences I tried:

Zork Interpreter Demo photo ECS_Zork_04_01_Demo_zpsia92dncq.png

Summary

In this lesson we created a system which can interpret a string of text that is input by the user into something a game can work with. We found ways to map the sentence into commands, specifiers and targets which actually pointed to game objects in our database. Then we finished up with a quick demo that showed how our interpreter would respond to any sentence you want to throw at it. Sentences which it understands are broken down and described in the logs.

Don’t forget that this project is accompanied by a repository HERE. There will be one commit per lesson so you can see exactly how the project would have been cofigured and how the code would look at each step.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s