Cloudbusting Content: 2013

Sunday, May 5, 2013

Idempotence

Idempotence. The subtle killer. The heartbreak. The agony. But it's not what you think.

It's a painfully simple principle that sounds, on its face, like no more than common sense. Like a health problem, it can be the source of a lot of pain, if ignored. And yet some web services violate it.

An idempotent operation is one that can be executed multiple times without changing the result of the initial execution. Examples of idempotent mathematical operations would be x * 0 = 0, or |(|x|)| = |x|.

It seems rather obvious that a read-only RESTful web service request should behave in a similar fashion. For example, if you went to an ATM and requested a balance inquiry, you wouldn't expect the inquiry to change any data in your account. In RESTful Web services: The basics, Alex Rodriguez maintains that REST APIs that use an HTTP method inappropriate for their intended purpose constitutes poor design.

By "the intended purpose" Rodriguez means the appropriate CRUD operation effected by each HTTP method:

POST creates a resource.
GET retrieves a resource.
PUT updates a resource, that is, changes its state.
DELETE removes or deletes a resource.

We expect GET to be read-only and therefore idempotent. That is, it only retrieves resource data and never changes it. To be clear, "resource data" means data stored and maintained by a web service, for example, contact information stored in a database. It doesn't mean metadata--data used only by the web service to operate--for example, a counter that tracks how many times a resource was viewed, or the timestamp when the resource was last accessed. GET could change metadata and still be considered idempotent with respect to the underlying resources.

GET is meant to retrieve resource data, not to create objects or change their state. Here are a few ways in which it could be abused:

Abusing GET for POST

It's hard to imagine why someone would use GET to create a new resource--after all, that's what POST is for. But you could do it with a request like this:

GET /addressbook/contact?first_name=Arthur HTTP/1.1

An unintended consequence could be that someone would invoke this request thinking that it will query the database and return the contact record(s) having the first name 'Arthur'. Instead, the service will create a new record for 'Arthur'.

Instead of using GET to create a record, the appropriate request would use POST to send the data in the request body:

POST /addressbook HTTP/1.1
Host: address_server< ?xml version="1.0"?>
  <contact>
    <first_name>Arthur</first_name>
  </contact>

Abusing GET for PUT

You could also use GET to update an existing record--which you really should do using PUT. But let's look at how you might make the request with GET:

GET /addressbook/contact?first_name=Arthur&new_first_name=Michael HTTP/1.1

This looks like a strange GET request, but you might expect it to retrieve a record with fields matching the strings 'Arthur' and 'Michael'.

Instead of using GET to update a record, the appropriate request would use PUT to send the new data in the request body:

PUT /addressbook/Arthur HTTP/1.1
Host: address_server<?xml version="1.0"?>
  <contact>
    <first_name>Michael</first_name>
  </contact>

Heresy and bad practice

Such unorthodox uses of HTTP requests constitute bad practice in the following ways:

They violate idempotence. You should be able to call GET an unlimited number of times, and expect that no record data will ever be changed.

They present misleading expectations. If you make a GET request, you expect it only to retrieve data without changing the state of the data. Creating a record or updating a record is not the expected function of a GET request.

They pass data as parameters instead of as structured data, which is a major advantage of REST. The above case uses GET to update an existing record uses REST in an almost SOAP-like manner, by passing data via parameters. Passing data in this way might be appropriate for some applications, such as query parameters in search engines, but it's a problem with hierarchically structured data. Instead, you should pass structured data (such as XML or JSON) in the request body of a POST or PUT method.

But there are cases where you might not use the "appropriate" HTTP method for its intended function. I've seen this implemented in a few APIs. You can use POST to initiate a state-changing operation. I'm not a fan of doing this. Yes, it can work if implemented carefully. Perhaps I'm a purist, but it breaks the integrity of the intended use of each HTTP method for its intended CRUD operation. However, this implementation makes sense if:

You intend to create a unique new object each time POST is called, or

You know that the service will only be called once for each set of data sent in the request.

For example, you might have a service that authenticates a user, then sends the user a one-time passcode (nonce). Any objects that you might have created are discarded. In this case, POST would make sense, because it created "virtual" resource data that was created and deleted in the process.

The Man from C.R.U.D.

While there are strong arguments for making it a best practice to hew to the "intended purpose" of the HTTP methods for each CRUD operation, inflexibly insisting on such rigidity might earn you the moniker "The Man from C.R.U.D." I believe there are exceptional cases that are sound implementations that are possibly even more appropriate than the orthodox usage.

The guidelines are reasonable and clear: never implement methods in a way that their expected use might be misinterpreted, and certainly never implement requests that alter resource data where idempotence is expected.

Monday, April 15, 2013

Hosting a website on AWS

Did you know that you can use CloudFormation to create and host a website on AWS?

I've been playing with EC2 to get the feel for AWS and in the process stumbled on a fun demo you can try--plus a few gotchas you might want to know before you try it.

I found a cool video, How to Launch a Website in 10 Minutes, on the AWS channel on YouTube. Basically, you create an S3 bucket to hold the website's resources, then you use CloudFormation to create an EC2 and an RDS instance, and install WordPress (and content) in the bucket. Very slick. It was a nicely done quick intro; however, I ran into a few issues:

It's not under the free usage tier. The online docs (but not the video) warn you: this does cost money. The app also warns you when you create the EC2 and RDS instances. You're supposed to delete the instances as soon as you're done with the tutorial.
The video tells you to select Upload a Template File (for the WordPress installation), but didn't say where to get the file. Apparently there are 2 or 3 different ways to specify a template. This can be confusing for a newbie. The online AWS tutorial is clear on this point, though.

It was a simple procedure--here are the basic steps:

In the S3 tab, create a bucket and name it appropriately for your blog, e.g. amazing-blog.
In the EC2 tab, if you do not already have an available Linux instance, create one.
Upload all image files or other resources your blog will need to the bucket you just created.
In the CloudFormation tab, click Create new stack. In the Create Stack dialog, provide the name of the bucket you created for the blog as the stack name.
In Template, select Provide a Template URL, then type or paste the URL for the sample WordPress template, then click Continue. (This procedure is from the AWS docs; it's not noted in the video.)

https://s3.amazonaws.com/cloudformation-templates-us-east-1/WordPress_Single_Instance_With_RDS.template

The template contains several AWS resources including a LoadBalancer, an Amazon Relational Database Service DB Instance, and an Auto Scaling group.

In the Parameters section of the dialog, provide the key name of an existing EC2 instance.

Cool idea, but is it worth it?

I left the WordPress website up for a few days to create fake content and mess with formatting. Then I noticed the meter was ticking on the EC2 and RDS instances. I did get charged a bit. To be fair, I was warned. The docs warn you about this, and instruct you to delete the resources as soon as you're done. The management console indicated an estimated monthly cost of $110. I'm not certain that the service comes to that much in real life usage, but it could be a very expensive blog!

IMO this demo was done more of a proof of concept--and a worthy idea. It could be worthwhile if your website required considerable web resources--for example, you had a large database of streaming audio files, or a catalog that would require a sizable relational database. But for your own personal blog, or even a small business website as demonstrated in the YouTube video, it's probably not a cost-effective option.

I admit I was taken with the idea of spinning up your own virtual hosting site, and I wish that AWS would offer a scaled down economy plan (not free but cheap...) for those who just want to spin up and discard experimental websites easily.

Wednesday, April 3, 2013

XMLHttpRequest2 in HTML5

I'm just now learning about HTML5; at first I shrugged it off as an inevitable evolutionary fusion of HTML 4 and XHTML 1.1. Yet it's not just a markup language; it's more an extended hybrid language that embraces and supports several application programming interfaces (APIs) for web applications. It's an open format that would allow development of cross-platform web or mobile applications.

For me personally, the most interesting inclusion is XMLHttpRequest support. Specifically, XMLHttpRequest Level 2.

As the HTML5 Rocks blog points out, it's being included to allow AJAX to work with HTML5 APIs such as File System API, Web Audio API, and these would support binary file transfer and streaming media. It's all spelled out in this excellent post: New Tricks in XMLHttpRequest2.

What's surprising to me is the lack of mention of the implication for RESTful web services. It seems that XMLHttpRequest support would be a major step forward in streamlining interoperability with web services, because--in theory--you'd be able to make RESTful requests directly from a web page. How would this implementation look in actual use? I'm thinking AJAX-like web page elements that can be updated by web services in real time.

Friday, March 15, 2013

Last.fm web services

This article is intended for readers with a programming background who are interested in REST web services. This text assumes the audience is familiar with the basics of REST web services programming.

Many web developers add streaming audio to websites using web services from one of the well-known streaming web players (AKA Internet radio), such as Spotify, Pandora, or Last.fm. Those who want to access the famous Pandora app will be disappointed, however. A bit of searching the web showed that Pandora once offered an open source API, but for whatever reason, deprecated it in 2012.

However, Last.fm offers a REST API that lets web developers access the functionality of its Scrobbler application. What's Scrobbler?

Scrobbler is the Last.fm client application, with which users can play Last.fm radio stations. It automatically fills users' libraries and updates them with what they've been listening to on their computers. Users can also rate music by "loving" and "banning" tracks, assemble playlists, and apply tags to tracks and albums.

What it does

The Last.fm REST API lets web developers use the same functionality in their own web or mobile applications, streaming music from Last.fm and using Last.fm data. The API goes a few steps further, offering tools for gathering statistics and accessing data by various collections (by group, by genre, by artist, etc).

It provides methods that let you operate on the following objects (which Last.fm calls "packages"):

Album - Access data for an album

Artist - Access data for an artist

Auth - Fetch a session key or request token

Chart - Access chart info for artists or tracks

Event - Attend or share a live audio event

Geo - ISO 3166-1 country and metro data for use in the other web services

Group - Access data for a group

Library - Maintains tracks/albums in a user's library

Playlist - Access a user's playlist data

Radio - Access metadata or streaming audio from a Last.fm radio station

Tag - Access search tags

Tasteometer - Access Tasteometer data (a rating score used on Last.fm)

Track - Access data for a track

User - Access user data Venue - Access data for a venue for events

How it works

In the Last.fm API, there are two ways to make requests: write operations (using POST) and read operations (using GET).

Write operations

To access a write service, you submit a request as an HTTP POST request to the service endpoint. You send all parameters in the request body. In order to perform write requests, you first have to authenticate a user with the API (not discussed here).

Example: How to "love" a track

(This service is described in the Last.fm REST API documentation at http://www.last.fm/api/show/track.love.)

The track.love method is a write operation and uses HTTP POST. You should send parameters (including the method parameter) in the POST request body to the service endpoint at http://ws.audioscrobbler.com/2.0/.

Parameters

track (Required) : A track name (utf8 encoded)
artist (Required) : An artist name (utf8 encoded)
api_key (Required) : A Last.fm API key.
api_sig (Required) : A Last.fm method signature. See authentication for more information.
sk (Required) : A session key generated by authenticating a user via the authentication protocol.

Request Body
In a write operation, params are sent in the request body of HTTP POST requests as named arguments using a structure as follows:

<methodCall>
<methodName>track.love</methodName>
<params>
<param>
   <value>
    <struct>

     <member>
      <name>track</name>
      <value>
       <string>Ring of Fire</string>
      </value>
     </member>

     <member>
      <name>artist</name>
      <value>
       <string>Johnny Cash</string>
      </value>
     </member>

     <member>
      <name>api_key</name>
      <value>
       <string>AA25BB53CC45DD85EE22FF30GG67HH64</string>
      </value>
     </member>

     <member>
      <name>api_sig</name>
      <value>
       <string>BB89CC55DD80EE11FF41GG07HH18II99</string>
      </value>
     </member>

     <member>
      <name>sk</name>
      <value>
       <string>CC12DD45EE67FF24GG56HH88II36ZZ00</string>
      </value>
     </member>

    </struct>
   </value>
</param>
</params>
</methodCall>
Read operations

To access a read service, submit a request as an HTTP GET request to the service endpoint. You send all parameters in the request URI. You don't have to authenticate a user with the API to perform read operations.

Example: How to get metadata info on a track

The track.getInfo method is a read operation that uses HTTP GET.

http://ws.audioscrobbler.com/2.0/method=track.getInfo
&track="Ring of Fire"
&artist="Johnny Cash"
&username="MrHat"
&api_key=AA25BB53CC45DD85EE22FF30GG67HH64

Parameters

mbid (Optional) : The musicbrainz id for the track
track (Required (unless mbid)] : The track name
artist (Required (unless mbid)] : The artist name
username (Optional) : The username for the context of the request. If supplied, the user's playcount for this track and whether they have loved the track is included in the response.
autocorrect[0|1] (Optional) : Transform misspelled artist and track names into correct artist and track names, returning the correct version instead. The corrected artist and track name will be returned in the response.
api_key (Required) : A Last.fm API key.

Sample Response

<track>
<id>1019999</id>
<name>Ring of Fire</name>
<mbid/>
<url>http://www.last.fm/music/JohnnyCash/_/RingOfFire</url>
<duration>240000</duration>
<streamable fulltrack="1">1</streamable>
<listeners>69572</listeners>
<playcount>281445</playcount>
<artist>
    <name>Johnny Cash</name>
    <mbid>bfcc6d75-a6a5-4bc6-8282-47aec8531818</mbid>
    <url>http://www.last.fm/music/JohnnyCash</url>
</artist>
<album position="1">
    <artist>Johnny Cash</artist>
    <title>Ring of Fire</title>
    <mbid>61bf0388-b8a9-48f4-81d1-7eb02706dfb0</mbid>
    <url>http://www.last.fm/music/JohnnyCash/RingOfFire</url>
    <image size="small">http://userserveak.last.fm/serve/34/8674593.jpg</image>
    <image size="medium">http://userserveak.last.fm/serve/64/8674593.jpg</image>
    <image size="large">http://userserveak.last.fm/serve/126/8674593.jpg</image>
</album>
<toptags>
    <tag>
      <name>Country</name>
      <url>http://www.last.fm/tag/country</url>
    </tag>
    ...
</toptags>
<wiki>
    <published>Sun, 27 Jul 2008 15:44:58 +0000</published>
    <summary>...</summary>
    <content>...</content>
</wiki>
</track>

Summary

Website: Last.fm Web Services - http://www.last.fm/api

Documentation: http://www.last.fm/api/intro

Service endpoint: http://ws.audioscrobbler.com/2.0/

Supported HTTP operations: GET, POST

Usage fees: Free; must register for an API account

Comments:
The API is well-conceived and full-featured. Perhaps the most appealing aspect is the simplicity of method calls. The difference in how parameters are transmitted in GET requests (in the URI) versus POST requests (in the request body) is a distinctive feature that developers need to understand for this particular API. The only criticism I have for the documentation is that it should explain exactly how parameters are transmitted in the request body. In particular, the documentation should provide explicit code examples of how parameters are specified in GET and POST requests.

Sunday, January 20, 2013

REST for the Rest of Us

This article is intended to be an explanation of REST for a non-technical audience. In particular, it's meant for people who perhaps have heard the ubiquitous phrase "software as a service" but who are otherwise not familiar with the working of the Internet.

I'd like to give credit for helping me sort out my thoughts on this article to a semi-famous essay How I Explained REST to My Wife. It takes a more roundabout conversational approach, but covers quite a few deep insights into REST.

The first thing you should know about the web is that there's no web. In reality there are millions of computers around the world connected by a sophisticated addressing system. All the content on these computers--web pages, images, digital video, streaming audio--as well as certain processing tasks, are resources.

The term "web" is simply a metaphor to emphasize that all these computers are connected and discoverable. The architecture of this "web" is based on REST, which stands for REpresentational State Transfer. It's a somewhat obscure way of saying that the state (like a snapshot) of the data that describes a resource can be represented and transferred in standard formats. It sounds obvious, but the web needs basic rules.

REST is simple, because it's not a programming language, library, or platform, but a design, an architecture. All it means is that you can access a resource on the web with a request, it's RESTful. So the term RESTful can cover a large swath of exceedingly different web services. It's often hard to say exactly what makes an application RESTful web service. This is how I'd explain it to a lay person.

Until fairly recently, the resources people wanted were typically files: web pages (HTML files), images (JPG and GIF files), audio (MP3 and WAV files). However, more recently, resources have been tasks such as reading activity feeds (e.g. Facebook posts), or database searches (finding products in online catalogs), or accessing local weather data. Resources can also be computing services, like file storage and backup, or performing large computations in the cloud. These latter services are what is commonly referred to as "software as a service."

However, thinking in terms of data or processing tasks can be a little abstract, so for the purpose of explanation, I like to think of resources as pizza. After all, if you can order a pizza on the web, it must be a resource.

Every resource on the web has an address called a URL. You can think of it as the street address or phone number of the pizzeria. It starts out specifying a more general area, then becomes more specific--just like a phone number: area code, local exchange code, and last four digits of the specific phone.

Here are the operations you can perform on resources on the web:

You can POST a pizza by ordering it--thus creating one. The purpose of POST is to create a new resource.

You can PUT pepperoni, or other toppings on the pizza, by changing an existing order. The purpose of PUT is to change or update data in an existing resource.

You can GET a pizza when your order is ready. The purpose of GET is to retrieve a resource.

You can DELETE a pizza by canceling the order. The purpose of DELETE is to remove a resource.

These four verbs are the only operations used in RESTful web services. REST services use one of these verbs plus the URL to access data or a processing service.

Whenever you click a link on a webpage, the web browser makes a GET request on the address contained in that link. The address is also called a URL or Uniform Resource Locator--because its purpose it to locate resources.

Here's a very simple example. Suppose a friend sends you an email with a link to a YouTube video. If your friend clicked "share," YouTube provides a short version of the URL:

http://youtu.be/o0MIFHLIzZY

Let's break this down: The first part of the URL (youtu.be) is a shorthand address that redirects the browser to the YouTube home. Then there is a slash, followed by a cryptic code (o0MIFHLIzZY). This is the unique resource ID of the video. (That is, the video is a resource, and every resource needs a unique ID.)

By clicking this link, you're directly accessing (that is, GETting) the resource by its direct address.

But there are many ways to access data on the web. YouTube also allows you to access the same video this way:

http://www.youtube.com/watch?v=o0MIFHLIzZY

Using this form, you provide the regular URL for YouTube (www.youtube.com), then a slash followed by 'watch'. This accesses the watch service, an application that plays a video if you provide an ID.

The way you provide parameters in a URL is to add a '?' followed by parameter_name=parameter_value. In this case it's 'v= o0MIFHLIzZY'; the parameter name is 'v' and the value is the resource ID.

Specifying parameters is one way to provide information to web services. It's the typical way to provide information to web search applications such as Google search.