< Back to Blog

Using PubNub: A Real-world Example of Building a Talking Chat That Displays Emotions

Using PubNub

During the past seven years, PubNub — a startup founded in California in 2010 — has evolved into something they call a global Data Stream Network (DSN). At its base, it’s an IaaS-solution intended to fulfill demand in the field of real-time messaging.

Currently, Distillery is one of four PubNub certified development partners. We are calling this out not to boast about it, but rather to help share information about using PubNub, and possible ways of implementing PubNub. To do this, we’ll use the example of a demo project that Distillery created in order to obtain the aforementioned partnership.

If you want to see the code right away (C# and JavaScript), you can do so by opening our GitHub repository. For those of you interested in learning more about the potential of PubNub implementation, please continue on to the article below.

Overview of PubNub’s Capabilities

PubNub offers a few main service categories:

  • Real-time messaging. This API involves a publish/subscribe system based on a global infrastructure that consists of 15 points-of-presence with the stated latency rate of 250ms. It supports low-latency messaging between any number of devices and users anywhere in the world. It offers a range of excellent capabilities, including support of channels with high loads, data compression, and automatic bundling for messages in case of bad connection quality.
  • Presence. This API is used to track the current state of clients, from obtaining information about basic items such as “online/offline” status to receiving notifications when your contact types you a message.
  • Functions. Previously, this feature was named BLOCKS, but it recently passed through rebranding. (To be more precise, it’s still passing through this complex process). This is a set of scripts written in JavaScript which run on PubNub servers. Such scripts are used to filter, aggregate, or transform data, as well as communicate with third-party services, as you will see below.
  • Some of PubNub’s other service categories include Storage & Playback, Stream Controller, and Mobile Push.

To implement all these features, PubNub offers more than 70 SDKs which cover the majority of the most popular programming languages, including IoT solutions built on Arduino, Raspberry Pi, and even Samsung Smart TV. (The full list is available here).

Creating Our PubNub Project

Now that you have enough theoretical knowledge, we can switch our focus to the practical work. Becoming a PubNub partner starts with completing a project on PubNub using the following test task: “Use two different SDKs and the following features: Presence, PAM, and 1 Block.” PAM stands for PubNub Access Manager. It’s a plug-in for the security framework that allows you to control access to a channel on the application level, channel level, or user level.

Given that the task’s requirements are pretty basic and beige, one has a lot of free space for imagination. Our imagination led us to an idea that is not really useful, let alone incredibly interesting. We decided to develop a talking chat. In order to make it more fun, however, we decided not only to use a text-to-speech translator, but also to allow the display of verbal emotions.

The concept of our application is incredibly simple: it’s a website with only two pages. First, the user comes to an initial login page that has no real authentication features. To be redirected to the chat page, the user must choose a nickname and a working mode (“Full” or “Read Only”). The chat page has a separate window for the chat messages, including such system messages as “John joined the channel,” a text input field, and a drop-down list of emotions. When you receive new messages from other users, the messages are converted into speech, and the speech conveys an emotion that was chosen by the sender.

We decided to use the basic BLOCK by IBM Watson, which requires only minimal adjustments (mostly, setting the voice’s parameters). At the time of this article’s publication, only three voices supported emotions: en-US_AllisonVoice (female), en-US_LisaVoice (female), and en-US_MichaelVoice (male). Just a couple of months ago, however, only Allison was able to read texts with emotions. So it’s obvious that, slowly, changes and updates are taking place.

Now let’s look at the server code, which is so simple that it’s almost primitive:

public class HomeController : Controller
{
    public ActionResult Login()
    {
        return View();
    }

    [HttpPost]
    public ActionResult Main(LoginDTO loginDTO)
    {
        String chatChannel = ConfigurationHelper.ChatChannel;
        String textToSpeechChannel = ConfigurationHelper.TextToSpeechChannel;
        String authKey = loginDTO.Username + DateTime.Now.Ticks.ToString();

        var chatManager = new ChatManager();
            
        if (loginDTO.ReadAccessOnly)
        {
            chatManager.GrantUserReadAccessToChannel(authKey, chatChannel);
        }
        else
        {
            chatManager.GrantUserReadWriteAccessToChannel(authKey, chatChannel);
        }

        chatManager.GrantUserReadWriteAccessToChannel(authKey, textToSpeechChannel);

        var authDTO = new AuthDTO()
        {
            PublishKey = ConfigurationHelper.PubNubPublishKey,
            SubscribeKey = ConfigurationHelper.PubNubSubscribeKey,
            AuthKey = authKey,
            Username = loginDTO.Username,
            ChatChannel = chatChannel,
            TextToSpeechChannel = textToSpeechChannel
        };

        return View(authDTO);
    }
}

The “Main” controller method receives the DTO from the login form, obtains channel information from the configuration parameters (one channel is for the chat, and another is for communication with IBM Watson), identifies the access level by calling the corresponding “ChatManager” class object’s methods, and sends all the gathered information to the page. The rest happens in the front-end part of the solution. In order to give you the full picture, we’ll show the “ChatManager” class listing that encapsulates all the interactions with the PubNub SDK:

public class ChatManager
{
    private const String PRESENCE_CHANNEL_SUFFIX = "-pnpres";

    private Pubnub pubnub;

    public ChatManager()
    {
        var pnConfiguration = new PNConfiguration();
        pnConfiguration.PublishKey = ConfigurationHelper.PubNubPublishKey;
        pnConfiguration.SubscribeKey = ConfigurationHelper.PubNubSubscribeKey;
        pnConfiguration.SecretKey = ConfigurationHelper.PubNubSecretKey;
        pnConfiguration.Secure = true;

        pubnub = new Pubnub(pnConfiguration);
    }

    public void ForbidPublicAccessToChannel(String channel)
    {
        pubnub.Grant()
            .Channels(new String[] { channel })
            .Read(false)
            .Write(false)
            .Async(new AccessGrantResult());
    }

    public void GrantUserReadAccessToChannel(String userAuthKey, String channel)
    {
        pubnub.Grant()
            .Channels(new String[] { channel, channel + PRESENCE_CHANNEL_SUFFIX })
            .AuthKeys(new String[] { userAuthKey })
            .Read(true)
            .Write(false)
            .Async(new AccessGrantResult());
    }

    public void GrantUserReadWriteAccessToChannel(String userAuthKey, String channel)
    {
        pubnub.Grant()
            .Channels(new String[] { channel, channel + PRESENCE_CHANNEL_SUFFIX })
            .AuthKeys(new String[] { userAuthKey })
            .Read(true)
            .Write(true)
            .Async(new AccessGrantResult());
    }
}

It’s worth noting that you have to look at the “PRESENCE_CHANNEL_SUFFIX” constant. The issue is that “Presence” uses a separate channel for its messages, which (according to the current convention) uses the name of the current channel with a “-pnpres” suffix. Please note, however, that the “PubNub Access Manager” code represented in the form of the “Grant” function call requires explicit mention of information about the Presence channel to define the access level.

var pubnub;
var chatChannel;
var textToSpeechChannel;
var username;

function init(publishKey, subscribeKey, authKey, username, chatChannel, textToSpeechChannel) {
    pubnub = new PubNub({
        publishKey: publishKey,
        subscribeKey: subscribeKey,
        authKey: authKey,
        uuid: username
    });

    this.username = username;
    this.chatChannel = chatChannel;
    this.textToSpeechChannel = textToSpeechChannel;

    addListener();
    subscribe();
}

The first thing we have to do in the JavaScript code is to initialize the corresponding SDK. In order to make it easier and simpler, we decided to use global variables for certain pieces of data. After initialization, we have to add a listener for events that we’re interested in and subscribe to the chat channel, the Presence channel, and the IBM Watson channel. Let’s start with the subscription:

function subscribe() {
    pubnub.subscribe({
        channels: [chatChannel, textToSpeechChannel],
        withPresence: true
    });
}

If the “subscribe” method code is pretty obvious, it’s a bit more complex in the case of “addListener”:

function addListener() {
    pubnub.addListener({
        status: function (statusEvent) {
            if (statusEvent.category === "PNConnectedCategory") {
                getOnlineUsers();
            }
        },
        message: function (message) {
            if (message.channel === chatChannel) {
                var jsonMessage = JSON.parse(message.message);
                var chat = document.getElementById("chat");
                if (chat.value !== "") {
                    chat.value = chat.value + "\n";
                    chat.scrollTop = chat.scrollHeight;
                }

                chat.value = chat.value + jsonMessage.Username + ": " + 
                    jsonMessage.Message;
            }
            else if (message.channel === textToSpeechChannel) {
                if (message.publisher !== username) {
                    var audio = new Audio(message.message.speech);
                    audio.play();
                }
            }
        },
        presence: function (presenceEvent) {
            if (presenceEvent.channel === chatChannel) {
                if (presenceEvent.action === 'join') {
                    if (!UserIsOnTheList(presenceEvent.uuid)) {
                        AddUserToList(presenceEvent.uuid);
                    }

                    PutStatusToChat(presenceEvent.uuid, 
                        "joins the channel");
                }
                else if (presenceEvent.action === 'timeout') {
                    if (UserIsOnTheList(presenceEvent.uuid)) {
                        RemoveUserFromList(presenceEvent.uuid);
                    }

                    PutStatusToChat(presenceEvent.uuid, 
                        "was disconnected due to timeout");
                }
            }
        }
    });
}

First, in order to catch the moment when the current user connects to the channel, we have to subscribe to the “PNConnectedCategory” event. It’s important to use this very event, because we need to get the list of all the users only once, while the Presence-event “join” will be activated every time a new client connects to the chat.

Second, after fetching the information about the new message, we have to check the channel that receives it. Depending on the channel, there are two possible options: we either have to show simple text using concatenation, or we have to initialize the “Audio” object using the link sent by IBM Watson in order to load and play the audio file.

Another interesting thing happens when we try to send a message:

function publish(message) {
    var jsonMessage = {
        "Username": username,
        "Message": message
    };

    var publishConfig = {
        channel: chatChannel,
        message: JSON.stringify(jsonMessage)
    };

    pubnub.publish(publishConfig);

    var emotedText = '';

    var selectedEmotion = iconSelect.getSelectedValue();

    if (selectedEmotion !== "") {
        emotedText += '';
    }

    emotedText += message;

    if (selectedEmotion !== "") {
        emotedText += '';
    }

    emotedText += '';

    jsonMessage = {
        "text": emotedText
    };

    publishConfig = {
        channel: textToSpeechChannel,
        message: jsonMessage
    };

    pubnub.publish(publishConfig);
}

First, we have to create the message itself, define its configuration (in order to send it to the SDK), and only then initialize the sending procedure. The further we go, the more interesting it gets. To transform our text into speech, we have to send it to the IBM Watson channel. Also, in order to define the emotional coloring of the speech, we have to use Speech Synthesis Markup Language (SSML). And to be more detailed, we have to use the tag. As you’ve probably guessed, however, if the user has “Read Only” permission, the message will be blocked by PAM and will never reach its recipient.

Additional Real-world Examples of PubNub Implementation

After reviewing the wide range of apps and services available which have taken PubNub onboard, it’s worth highlighting Insteon’s concept of the smart home or the family event planning mobile app by CURAGO.

Conclusion

Though our demo project kept things pretty simple, PubNub’s various capabilities can be used in potentially infinite ways. (Again, the full version of the code can be found on GitHub.) For yet another example of using PubNub to develop real-time messaging capabilities in an app, please check out this blog.

Would you like your app to integrate with PubNub? As a certified development partner, Distillery can help. Let us know!
 

About the Author

Sergei Prokopenko, Distillery’s Chief Information Officer, has been a member of Distillery’s technical staff since 2009. As CIO, Sergei provides leadership for the continued development of an innovative, robust, and secure information technology environment throughout the company. Prior to becoming CIO in 2015, he was one of Distillery’s lead software developers, responsible for developing and maintaining IT solutions of varying scale.


BACK TO TOP >