Making a New Amazon Skill for The Echo

I was prompted to dive back into the Amazon Alexa Skill development by the new Alexa Hackster.io contest.

I quickly classified the development of Alexa Skills into two major categories. They don’t break them down this way, but it makes more sense to think of their development by the requirements of the developer and setup.

  1. Not user specific – Generic information and interactions available to everyone
  2. User specific – Needs information from the user like configurations, links to their devices, or specific user instances.

The non-user-specific Alexa skills are fairly simple to create and for this guide I will be creating one using Node.js in an Amazon Lambda function (their version of a cloud run process) combined with an Amazon Skill.

  1. Set up a new Amazon Lambda function (Needs to be in N.Virginia in the top right of your dashboard screen)
  2. Set up a new Amazon Skill
  3. Grab the Application Id from the newly created Amazon Skill and replace the part in the AlexaSkill.js file relating to the ApplicationId.
  4. Then upload the zip file to your Amazon Lambda function.
  5. Then define the Voice Interface using the two files in the speechAssets folder. (IntentSchema.json and SampleUtterances.txt)
  6. In your lambda function, go to the Event Sources tab, and add the “Alexa Skills Kit” Event source.
  7. Then copy your Amazon Lambda’s ARN (Amazon Resource Name) and past that into the Endpoint textbox in the Configurations Tab of your Alexa Skill – something like arn:aws:lambda-us-east-2:9081209381:function:hello-world

The user-specific Alexa skills require an endpoint that allows the user to login to your own authentication service. This in turn will require a web app and a database of stored information per user. Stay tuned for more information on how to make that happen.

At the end of my investigation of the non-user-specific Alexa skill, I created and deployed a Wind Reporting service to Alexa that allows me to use the Echo to find out what the current Wind conditions are in my city before I take my drone out for flights.

linea_2ec(0)_512

Check out the Hackster.io post for the Alexa Wind Reporting Service: https://www.hackster.io/anthony-ngu/alexa-wind-reporting-service-7aada2

Publishing an Alexa Skill

Some things to keep in mind if you are making your own Alexa Skill for global publishing.

  • You will need icons for your Alexa Skill
  • You will need to process the generic Help intent
  • You will need to process the generic Stop / Cancel intents

 

Starting Development with Amazon Echo

Here’s a simple guide on how to create a Node.js app hosted in Azure that will handle your Amazon Echo‘s API calls.

amazon-echo

  1. You will want to download and install Node.js if you haven’t already.
  2. Download the code from the repository here.
  3. Create an Azure account if you haven’t already and create a new web app.
  4. Using FTP, Git, or whichever method you would like, get the code into the location for your new azure web app.
  5. Join the Amazon Developer program for the Echo and create a new Echo app. (Note: In order to use this while in development on your Echo, the account needs to be the same one that the Echo is linked to)
  6. In your App information tab:
    1. Fill out your “App Name”. This will act as your official app name.
    2. Fill out your “Spoken Name”. You will want this to be short and simple to say in order to give it the easiest time to recognize.
    3. Give your “App Version” which will need to match the info you hand back through the API.
    4. Give your “App Endpoint” which will be your Azure webapp’s URL + the api endpoint. (Example: “https://echotest.azurewebsites.net/api/echo”)
  7. In your Interaction Model:
    1. Fill out your “Intent Schema”. The intent is the name of the function, slots are parameters, and the type when “literal” will give you back the speech-to-text recognized word. More info on this here.
    2. Fill out your “Spoken Utterances”. They should be tab separated between the intent and the sample phrases. Something interesting to note is that they suggest that you provide a sample for every number of literal device phrases from min to max. (In my case from 1-3 words, thus the repetitions.) It also does not like it when you have multiple of the same literals anywhere in the file.. More info on this here.
  8. After this, set your app to be ready for testing and you are on your way!
  9. Call Alexa with your Spoken Name by saying “Alexa, open {YourSpokenAppNameHere}”
  10. Now you can say the commands that you’ve designated in both your Nodejs web app and your Amazon app declarations for your response!

If you want to make it your own, you will need to modify the Node.js back-end to respond according to the requests that you allow while also altering your intent schema and spoken utterances.

 

Amazon Echo: Should I buy?

amazon_echoThe Amazon Echo is a home automation maker’s dream. It provides an easy way to use voice recognition to interact with your devices, but there are a couple things you should know about developing for it before you buy one and start.

  1. Despite it being called an Echo “App”, your development will take place in a web-service hosted in the cloud that can answer it’s calls. What the Echo will do is translate what it hears into text and then hand it off to your service by calling your API with a package that contains the information.
  2. Creating an “app” with Amazon for the Echo requires you to fill out an “Interaction Model” which consists of an “intent schema” and “sample utterances” as well as program your web-service.
    • The “intent schema” is pretty straightforward and you basically create a JSON array of “intent” which contain a name and “slots” which are used like parameters and you must define the type.
    • The “sample utterances” are a list of the “intent name” and potential sample phrases.

Making it talk to a web service hosted in Azure using Node.js turned out to be fairly trivial and I was able to get a basic implementation hooked into the OpenSmartHub that I have been developing in less than a couple hours. I even created a sample in a github repository for those who want simple instructions and an easy place to start.

It really is amazing to see it come together and interact with your voice commands in a custom scenario that you have developed, but still has a long way to go in order to improve it’s voice recognition. It works really well with the pre-programmed functions, but there aren’t that many that I find particularly useful in an every day scenario and it doesn’t do well with brands or non-dictionary words. For example, it recognizes “Pandora” because it’s a vital part of their pre-programmed functionality, but it doesn’t recognize “Yamaha” or “Wemo” well.

Another thing that I’ve noticed is that it can sometimes mix up the singular and plural versions of words when converting text-to-speech. (For example, mine would sometimes hear “lights” when I say “light”)

Overall, I think it’s going to only improve from here and I think it’s worthwhile to invest into in order to integrate voice-recognition and voice commands into your homemade projects!

Open Smart Hub

Ever since February 22 when I entered the Hackster Hardware Weekend in Seattle, I’ve had a growing passion for the open source side of home automation. What started as a simple idea to automate the closing and opening of windows became something bigger than I ever imagined.

The Hackster.io Hardware Weekend was how the Open Smart Hub was born. I started with a hacked together hub that could run on the Intel Edison and automate a servo to act as the window opening mechanism based on WeatherUnderground API information or light/motion from a Spark.io Core (now named Particle.io). Once the event finished I realized that my implementation couldn’t scale and was horribly confusing to recreate.

I began to research the implementations that were available to the public. What were the open source options? What were the professional products? How did they succeed or fail to solve the problem? My conclusion was that the home automation space was cluttered with all the different companies, organizations, products, and applications. What we as consumers and I as a programmer needed was a simple platform to expand, integrate, and customize my personalized home automation experience. IFTTT is a great alternative but it is impossible to add your own devices, actions, functions, etc. There is no communal collaboration! If you added a device and someone else wanted to use the same sort of device, they would have to recreate it themselves.

That is when I began to reimplement the Open Smart Hub with a modular design. I chose Node.js as my platform because of it’s low barrier to entry for programmers, abundant tutorials, and abundant library of open source modules. The core of the new implementation is the configuration file that declares the available device types (think WeMo switches, Hue light bulbs, Nest, Weather Underground data, etc.) as well as a user’s stored scenarios and devices. I chose an implementation where you could fully own and have the ability to control everything. After all it’s your home!

The implementation is split into two parts, a local hub run on a Raspberry Pi 2 within your home network which handles all the interaction between your devices and an online hub that gives you an accessible UI from anywhere.

Here’s a demo of a basic scenario implementation:

If you would like to check out the details or learn more check out the Hackster.io project: http://www.hackster.io/anthony-ngu/open-source-home-hub

Amazon Echo Has Promise for the Future

I got an invite to buy an Amazon Echo a while ago, but didn’t want to purchase one because it didn’t seem particularly useful. It’s only advantage to me was the SDK for speech that might be of use in the future.

After seeing the initial intro video and how scripted the commands had to be, I couldn’t justify the purchase.

If I wanted to get answers to questions I have, I would just pull out my phone and type it out rather than dealing with Speech-To-Text inadequacies when asking a long question. If I wanted to add something to a list, I would write it down on my notes or use my phone for the same reason.

I could play music using a voice command, but to be limited to my Amazon Music Library, Prime Music, or Pandora? No thanks, my audio receiver will do a better job with the audio quality in my home anyways.

On top of that, waiting for a delivery date a couple months later? No thanks.

The Turning Point

That was it, I forgot all about the product until recently. That was when I saw a couple Youtube videos showcasing hacks of the system to configure voice commands for other things! Now this is where it really gains some useful functionality.

Imagine using the mic array and speaker in the Echo to pick up your voice commands and give you audio feedback to commands you create yourself. As a developer, this would have unlimited possibilities in the home! My heart skipped a beat once I saw someone using it for these purposes despite the lack of an official SDK and I immediately started imagining the improvements I could make to my current projects in the home automation space and quickly came up with a couple scenarios that I “need” it for.

There are the typical scenarios like turning on or off appliances and lights in your home, but then there are bigger home automation scenarios where you would communicate with the Echo like you would a personal assistant.

Imagine waking up in the morning and talking to Echo and having it relay specific things you care about like the weather, news, calendar updates, family updates, etc. while also having it turn on the shower so it’s running at your preferred temperature by the time you jump in. Have it make your coffee so that when you get out of the shower, it’s ready. No need to preset things the day before, or stick to a generic schedule. It’s all voice activated.

Now imagine coming home from a day at work and asking it to turn on a specific “mood” for your home, like “summer breeze” that would open your blinds, open your windows, put on some light music, turn on just the right amount of lighting, etc. Have your home work for you!

It looks like Amazon is starting to see the value of this use case with the Echo too, because they recently announced an update that would allow their default voice commands to work with WeMo switches and Hue lights, but those are just basic scenarios.

After all my excitement about being able to create custom commands, I decided to purchase one (despite the couple months I’ll have to wait to finally receive it).

Still a Couple Faults

  • Works well for one room or an open-concept home, but you’ll need separate ones for each room if you want it to work everywhere. (Or maybe an extension of it in other rooms?)
  • From the demos online where people use it, it looks like their speech recognition system isn’t up to par with most of the other speech recognition system, yet.

Add Speech Recognition Easily!

Just found out about a cool HTML5 Web API that is built into Google Chrome.

microphonedisabled

Since it is vendor prefixed, this API will only work if a user’s browser supports it (for now Chrome). The cool thing about it though is that in order to use it on your site, all you need to do is write some javascript that checks for “webkitSpeechRecognition” and then creates an instance of it, and then use it. Combine this with your IoT devices and you’ve got voice enabled commands for your interacting with your devices!

This tutorial walks you through the basics of how it gets integrated:
http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API

This is the full sample: https://github.com/GoogleChrome/webplatform-samples