2 Comments

Web Application Architecture from 10,000 Feet, Part 2 – Persistent Data & Relational Databases

In part 1 of this series, I briefly discussed the difference between the client and the server. In part 2, I’m going to focus on how to structure your server side code.

If you’re making a single-page app—which you should be, if you want an app that’s as responsive as your typical desktop application—your backend “only” has a few concerns (heh):

  • storing persistent data
  • manipulating the persistent data (creating, updating, and deleting records)
  • delivering the persistent data via HTTP in a way that can be understood by the code on your frontend
  • manipulating the data in response to HTTP requests
  • authentication: a topic that could and has filled several books in its own right. This includes making sure that the requests you’re receiving contain valid data, providing some kind of authentication for users, and limiting what users can see and do depending on their permissions in your app.

Relational Database 101

At their cores, most web applications are built around objects—data structures that are defined as having several named qualities. For example: a blog, at its most basic level, has Users and Posts. Each User has several properties: username, password, bio. A post has a body, a creation date, and an author.

Parts & Pieces

Generally, these objects are stored in tables by an SQL database (your framework of choice may or may not abstract the actual SQL away from you). Rails, which in my opinion is the most newbie-friendly of the current server side options, calls these Models and will allow you to define them by defining a class. But if you get your server off the ground without relying on an ORM (Object Relational Mapping, aka something that keeps you from writing SQL queries) you’ll need to learn the nuts and bolts of maintaining a database yourself.

Tables have a number of predefined columns. Columns are typed; think variable types like int, doubles, strings (specifics will vary based on the specific SQL database you choose). Columns can have indexes on them that further constrain what values are present there: whether or not the column can be null, whether or not the value in that column can appear in any other row in that same column in the entire table, whether or not the values in that column need to be present in another column of another table.

If it’s not already clear, each row in your table is a particular record. All tables should (this is going to be up to you to enforce) have a primary key column, usually named “id”, with a single, unique number that serves as an identifier for that particular row of data. Every column in the table can be mutable, except for that one.

Relationships

The key to relational databases is, not surprisingly, relations. Going back to our worn out blog example, let’s talk about posts. Each post has an author. An author can write multiple posts, but only one user can publish a post at a time. This is called a one-to-many relationship—one user for many posts—and is easy to represent with a relational database model. Simply create a column in your posts table named author_id. Store the id of the User who authored your post there.

For example, let’s say that Oscar is the only user of your blog (ouch!). He has an id of 1. He makes a couple of posts, each of which has its own id (preferably generated by the database itself). The posts also have a column called author_id, and for each of them the column contains 1. That makes it easy to connect posts back to their authors, and makes it easy to do things such as “get all the posts by Oscar” and “Find all the posts written by someone who registered this May”.

(In case the above wasn’t as clear as mud: there are alternatives to storing persistent data beyond the relational table/column methods of SQL. Google “noSQL” if you’re interested in more. There are also other types of relationships, such as many-to-many, that require more complex structures to relate row A to row B.)

Creating an API

Some quick background knowledge: It’s possible to make an HTTP request using different verbs—that is, it’s possible to make multiple different types of requests to a particular URL, all other things being equal. There are four HTTP verbs: GET, PUT, POST, and DELETE. All requests made via your browser’s address bar are GET requests, but it’s possible to use all four verbs using JavaScript.

A true RESTful API has a stringent definition that I’m not going to delve into here—I’d recommend starting your reading with this excellent StackOverflow answer if you’re interested in the details—but in general, an API is a server with endpoints structured around the resources in your web application’s data model that respond to each of the four HTTP verbs in predefined ways. The server itself is stateless—a given request will receive the same response, every time. (A true RESTful API should also return metadata revealing all of its endpoints and their locations. Ironically, most so-called “RESTful” APIs don’t meet the stringent definition of what RESTful means, because they don’t return any of that metadata—but that’s a bit out of scope for this post.)

The endpoints (URLs) are structured based around resources—those same models that you’ve distilled into your database:

<resource>/: A collection of a particular resource.
<resource>/<resource id>/: The resource with a particular id.

These resources can then be nested. For example, users/22/posts/ would refer to all the posts belonging to the user with an id of 22.

These endpoints are then combined with the various HTTP verbs to have the following meanings:

  • GET: Give me information about this resource.
  • POST: I’m creating a new resource of this type.
  • UPDATE: I’m modifying a resource.
  • DELETE: I’m deleting a resource.

It goes back to what I was talking about in the relational database section—in most cases you should be thinking of (and structuring) your backend in terms of your data model. Endpoints without an id (for example, posts/ for a typical microblog) refer to all of the objects in that collection. Whereas endpoints with an id, such as posts/42/, refer to the specific object in the collection with that id.

All of the HTTP verbs are interpreted by the API as acting upon the object(s) specified by the URL. So, an API would respond to GET posts/ with a collection of information about every post; GET posts/42/ would respond with information about post #42, specifically; PUT posts/42/ would update that post to have the new text described in the request; POST posts/ would create a new post entirely; and DELETE posts/42/ would delete that same item.

The benefit of structuring your server like this is that the same backend can support web and mobile simultaneously, and your backend is completely divorced from all of your UI. Separation of concerns makes maintaining your database exponentially easier. Most of the truly terrible web application code out there is terrible because the same function that adds users to the database also handles generating the HTML (and sometimes, even JavaScript). That kind of tight coupling makes testing near impossible, it makes the application harder to change, and it causes tons of hard-to-find bugs down the line.

Combine the two, API and your database, and you’ll have a functioning server. In part 3 I’ll touch more on front end development.


This is the second in a series on understanding SPA architecture:

  1. Client-Side vs. Server-Side
  2. Persistent Data & Relational Databases
  3. Frontend Development