SpellBinder Frequently Asked QuestionsΒΆ

  • How can we characterize the audience for chat bots?

    A long time ago we wrote about 3 types of clients chatting with ALICE: A, B and C.

    Group A are the abusive people, who talk to the bot with aggressive language, insults and sexually explicit language. About 10% of the clients fall into this group.

    B people are “average” people who seem to enjoy chatting with the Alice. They comprise about 80% of the clients.

    Group C people are “critics” or “computer scientists” or “cognitive scientists” who try the bot briefly, and then walk away with a dismissive comment like “it’s random” or “it’s just like ELIZA”. They are aggressive in a different way, the “Turing Test Judges” who ask the bot socially awkward questions like “Which is bigger, a 747 or my big toe?’ They comprise about 10% of the clients.

  • Are SpellBinder bots “random”?

    1. We’ve been hearing the “random response” criticism of ALICE since day one.
    2. A lot of hard work went into SpellBinder to make sure the results are not “random”. For example, if you ask Fake Kirk “What is your name” he says his name, which is not a random response. And thousands of other examples exist like this one.
    3. The definition of “random” is in the eye of the beholder. One person’s lack of linkage is another person’s suspension of disbelief.
    4. One objective thing we can measure is conversation length. If people are chatting with Fake Captain Kirk for 20, 40, or 80 interactions, and apparently having fun, who is to say it’s so unresponsive? For them, anyway?
    5. As we say in the marketing literature, SpellBinder is intended to create a good “first draft” of the bot. The quality of these bots has already been improved by applying existing AIML tools.
    6. The more transcripts available, the better the bot Spellbinder will create. And if the transcript was a directed interview (asking Superbot style questions), it would do even better. In 72 episodes of Star Trek, Captain Kirk has 6000 lines of “usable” dialogue that results in a bot with about 2000 categories, or 10x ELIZA or 1/60th ALICE More dialog means more categories and a better bot.
    7. Some of the chatters have challenged Fake Kirk with questions like “What is a phaser?” Nowhere in any Star Trek episode does any character ask what a phaser is, and no one explains it. It is just understood. So first draft Fake Kirk can only give a vague response to a question like that (“What is warp speed”, “What is beaming”, “What is NCC1701”). During the targeting phase, we can of course put in all the commonsense Star Trek knowledge. But it would be a profound A.I. that could watch Star Trek and then answer “What is a phaser?”
    8. To me the test of a “random” bot should be something like: if the bot replies were all crammed into one category with <pattern>*</pattern> then it would be totally random. Test it on 1000 inputs. Now, put those same 1000 inputs into another bot. Are a significant portion of the responses more accurate? Then it is not “random”.
    9. We would easily merge Fake Kirk with an existing bot, say ALICE, and produce one that can answer many silly questions like “Where is Japan”. Maybe in the short term some people would have a better experience, but so far I’ve resisted, if only to run the experiment of creating a new, unique personality from scratch.
    10. Finally, it is random! That is, each AIML category in the final resulting bot may contain a <random> list of responses, so you won’t likely get repeated responses for many of the same inputs.