Building an Alexa Skill as a Web Service on Heroku

I love playing around with new technologies; I am truly a tinkerer at heart. As a consultant and custom software developer, I frequently have to answer questions like, “What does this latest iOS update mean for our app? Can we leverage any new features?” or, “What benefits can we gain from this new Bluetooth standard?” Keeping up with the latest languages, platforms, IoT gadgets, etc., can be a daunting task, but it can also be really fun if you take it in small bites.

One of the most important skills that we professionals can have, regardless of our profession, is the ability to teach ourselves new things. Honing that skill is never a waste of time.

A photo of Amazon's Echo Dot
Echo Dot

I’ve been hearing a lot of buzz about Amazon’s Alexa, and I decided it was time to give it a try. I asked my wife for an Echo Dot for Christmas. She graciously complied, and within a couple hours of unboxing it, my mind started to churn through possible Alexa Skills that I could write. I started to think about some old web applications that I had written in the past and wondered how difficult it would be to make an Alexa Skill that could interact with one of them. I set out on my mission: to make an Alexa Skill that is implemented as a web service hosted on Heroku.

Since this was just an experiment, I didn’t want to mess with an existing application just yet. I decided to build something new, completely from scratch. My goal was to make a Skill that would read me the latest blog post published on Atomic Spin! It turned out to be much simpler than I thought.

Starting The Alexa Skill

Alexa Skill Checklist
Alexa Skill Checklist

The first step in making a custom Alexa Skill is creating an Amazon developer account. Once you’re signed in, click the Alexa tab at the top of the Developer Console, choose ‘Get Started’ under the Alexa Skills Kit, and click ‘Add New Skill.’ On the left side of the screen, you will see a checklist which shows you all of the configuration steps you need to publish your Skill. I am not going to cover every option here since Amazon’s documentation is quite good, but I will point out some of the interesting pieces.

Configuring your Skill

Skill information

One of the first decisions you’ll need to make is what to use for your Skill’s ‘invocation name.’ This is the keyword that people will have to say to Alexa to signal that they want to interact with your Skill. For example, I choose ‘Atomic Spin’ for my Skill’s invocation name so that people can say phrases like, “Alexa, ask Atomic Spin to read the latest post.” Every directive (a.k.a utterance) that users give to your Skill must be preceded by your invocation name.

Interaction model

Intent schema

The intent schema is where you define the structure of the capabilities that your Skill will provide. In my case, the structure is really simple, but if your Skill will require users to specify options or choices, the schema will get more complex. My intent schema just defines a single intent for getting the latest post.


{
  "intents": [
    { "intent": “getLatestSpinPost" }
  ]
}

Sample utterances

Sample utterances are the actual phrases that people can speak to interact with your Skill. Since experimental discovery is a big part of using Alexa, people will be phrasing requests in many different ways. In order to maximize usability, you’ll want to document as many variations of utterances as you can think of. For my Skill, I added entries like:

getLatestSpinPost to read the latest post
getLatestSpinPost for the latest post
getLatestSpinPost today's post
getLatestSpinPost to read the today's post

Configuration

Service endpoint type

AWS Lambda services are Amazon’s preferred way of hosting Alexa Skills, but since we’re not using a web service, we’ll need to select the ‘HTTPS’ option. Amazon requires your service to support HTTPS and respond on the standard HTTPS port 443.

Once you choose your service’s location (North America or Europe), you must enter the full URL for the endpoint where you want requests to be sent. I’ll get into setting up the web service a little later, but for now, I set my Skill’s endpoint to https://alexa-atomic-spin.herokuapp.com/latest-post.

In hindsight, I really should have used something more generic than latest-post for the route because this same route will be used for all requests made to my Skill. If I later add another feature, like the ability to search for posts by title, those requests will go through this same endpoint.

SSL certificate

All web services that support HTTPS must have a valid SSL certificate. The source of your certificate will vary depending on where you’re actually hosting your service and the server’s setup, but for services like Heroku, they provide a wildcard certificate that you can use. If you wish to use the wildcard certificate, just choose the option “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority.”

Test

That’s basically all the configuration you need to set up a simple Alexa Skill. The last thing you need to do before publishing is to test it out. The Test tab allows you to submit example utterances which will be sent to your actual web service. Pretty cool, eh? Of course, we haven’t made the web service yet, so let’s go to that now.

Web Service Setup

Ruby + Sinatra

I love writing apps in Ruby, but of course you can use any language that is supported by Heroku. Sinatra is a really nice, light-weight web framework for Ruby that you can use to build web applications with very little code. I chose to use Sinatra for my Atomic Spin Alexa Skill. You can access the full source of this project at Github – alexa-atomic-spin.

All requests made to my service by Alexa will come through the same route: a POST request to ‘/lastest-post.’ I defined a route in my application that looks like this:


post '/latest-post' do
  verification_success = settings.cert_verifier.verify!(
    request.env["HTTP_SIGNATURECERTCHAINURL"],
    request.env['HTTP_SIGNATURE'],
    request.body.read
  )
  raise "Cert validation failed" unless verification_success

  post = Spin.latest_post
  ssml = post_to_ssml(post)
  make_ssml_response(ssml)
end

Certificate verification

The first really important thing that happens here is the verification of the request signature. All requests that Alexa makes to your Skill will be signed with a valid signing certificate. In order to verify that the request is legitimately from Alexa, and not from an malicious attack, you need to verify the certificate URL and the signature. I found a Ruby gem, alexa_verifier, that does just that, so I use it to perform the verification. If you are writing your service in Java, you can use a function that Amazon provides in the Alexa Skill kit do to this verification. If you do not verify the certificate, Amazon will not accept your Skill submission.

Constructing a post object

Once the certificate has been verified, I can actually do the work to get the latest Spin post. I made a class called "Spin" that encapsulates all the work of making the request to the WordPress API. It gets the content of the latest post and constructs a post object that includes the title, author, and an array of strings which contain sections of body text. You can see the full source code on GitHub.

Alexa responses

Amazon’s Alexa service expects the response our Skill returns to be a JSON object with a specific structure. There are a lot of options for what your response can contain. One option allows you to specify a response formatted with SSML, which is Amazon’s Speech Synthesis Markup Language. SSML allows you to specify things like phonetic pronunciation of words, spelling out words, pauses of various lengths, etc.

I made a function called post_to_ssml that takes the blog post information, formats it with proper SSML tags, and inserts breaks between the paragraphs so that it sounds more natural when it’s spoken to the user.


def post_to_ssml(post)
result = ""
result << "#{post[:title]} by #{post[:author]}"
result = post[:body_sections].inject(result) do |memo, section|
memo << "#{section} "
end
result << ""
end

Finally, another function, make_ssml_response, constructs the actual JSON object containing the SSML text. See the Alexa Skill documentation for more information on constructing responses.


def make_ssml_response(text)
{
"version" => "1.0",
"sessionAttributes" => { },
"response" => {
"outputSpeech" => {
"type" => "SSML",
"ssml" => text
},
"shouldEndSession" => true
}
}.to_json
end

That’s it! Our web service is complete. All I had to do was create a free project on Heroku, push my application to it, and boom–I have a fully functioning Alexa Skill defined as a web service hosted on Heroku. I can test it in the Amazon Developer portal, and I can see that my Skill returns a valid response. I can even test it on my Echo Dot that is signed in with the same Amazon account I used to create my Skill!

Submission…Rejected

At the time of writing this, my Skill hasn’t actually been accepted by the Amazon review process. Because my Skill contains the Atomic Object logo, which is a registered trademark owned by “Atomic Object LLC,” Amazon wants proof that I have permission to use said trademark. I am currently working on that. I will update this post with my results.

[Update 01/23/17]

Submission Accepted!

Well after jumping through some hoops I finally was able to get this skill accepted. As I mentioned earlier, Amazon initially rejected my submission because I hadn't submitted proof that I had permission to use a trademark owned by Atomic Object; our logo. Props to Amazon for being responsible and doing their part to protect our intellectual property! Unfortunately, I found the process of submitting such proof to be rather frustrating. The rejection email indicated that I should submit my proof via their Contact form on the developer website and if I need to upload an attachment I should send it to a developer support email address.

First off, this is confusing because how could I possibly provide this proof without uploading a document? Would they accept proof in the form of plain text entered in the contact form? Additionally, the fact that I have to send a document to a someone via email creates a race condition in the submission process. After I send the email, how much time do I need to wait before I can submit my app again? Here's how it went for me:

Day One

- Emailed PDF of permission to use IP
- Submitted comment in Contact form telling them I had sent them and email and listed my Alexa skill application ID and skill name
- Submitted my skill

Day Two

- Received rejection email for my submission again indicating that I needed to submit proof of permission to use trademark
- Received acknowledgement that my email had been received and the document was under review
- Received acknowledgement that my document had been accepted and that my skill should be approved soon
- I messaged them back telling them it had already been rejected
- Received response asking me to resubmit my skill
- I re-submitted my skill

Day Three

- App Approved!

Hopefully, in the future, Amazon will just provide a way to upload your proof of permission to use trademarks with the submission of your skill. That would have made this process much smoother. If you hoping to use a registered trademark in your submission, I would suggest that you submit your proof early in the process so that you don't get held up with this dance at the end.

Try it out!

If you have an Echo or other Alexa enabled device please try out my skill! You can enable it by saying "Alexa, enable Atomic Spin". Then, just say, "Alexa, ask Atomic Spin for the latest post", or "... today's post", to have Alexa read it to you.