HTTP

Notes

HTTP is everywhere! You may already know that the term is related to the Internet, but just how? That's what this presentation will explore.

First, note that HTTP is just an acronym, for HyperText Transfer Protocol. Let's start by breaking down the meaning of each part of this acronym.

HyperText is really just regular text taken to the next level. Ever click on a Google search result to visit that web page? Or a table displaying some information? These are all features of a webpage that are, well, beyond what you could achieve with just regular text. And so when we talk about HTTP, that's the kind of content we're dealing with: the kinds of interactive elements you see on webpages.

On a side note, the most common way to specify this HyperText is HTML, or HyperText Markup Language. Check out the page on that for more information! A snippet of HTML is shown in the slideshow.

Transfer refers to the exchange of this HyperText between a client - like your web browser - and a server. When you type in "cs50.net" to your browser's address bar, your web browser prepares a request, which it sends to CS50's web server. CS50's web server takes a look at this request, decides what it needs to do, and issues a response to your browser with the data for the webpage.

The exact structure of both the request from the client to the server and the response from the server to the client is specified by this protocol. A protocol is just "a set of conventions" [1]. If you've said "You're welcome" to somebody who thanked you, you've engaged in a protocol.

Putting this altogether, HTTP is just a set of conventions for exchanging the kind of rich HyperText content that we saw earlier. It specifies how both clients, like your browser, and servers, like the one CS50 has, should talk to each other.

[1] http://www.merriam-webster.com/dictionary/protocol

Let's look at some examples of requests and responses to get a feel for HTTP.

Here's an example of an HTTP request, with some labeled portions.

This HTTP request is a GET request - as specified by the

Important to point out are the lines that make up the set of HTTP headers for this request (starting at the line identifying the User-Agent). HTTP headers allow for further customization of the request; that is, telling the server more information. Here, the User-Agent header specifies a descriptive name for the client used to make the request. We made this request with a program called curl. If we made the request with Google Chrome, you would see something identifying that Chrome was being used. With that in mind, can you figure out how this website works?

The Host header identifies the domain name that we want to get a page from; here, we were aiming for Apple's website. This allows for a single server to host websites for multiple domain names - virtual hosting.

[1] For more information on these labeled portions, take a look at http://cs61.seas.harvard.edu/wiki/2012/HTTP.

Here's a sample HTTP response. You can also view such a response using curl, or Google Chrome. Notice that it's structured similarly to the HTTP request shown earlier.

The status code is just an integer that conveys information about how successfully the request was fulfilled. 200 means that every went well. You may have seen a 404 error when you visit a website. That means that the page was not found. 403 means that you are forbidden from accessing the page you tried to access.

[1] See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for a list of status codes.

Slides ( / )

study50 slide
study50 slide
study50 slide
study50 slide
study50 slide

Telnet

Within the HTTP request-response paradigm, a web browser plays a very important role for the end user. Not only does the browser create HTTP requests, sending cookies and headers appropriately, but it also parses HTTP responses, displaying beautiful, interactive websites that were originally just sent as 1s and 0s over your network cable.

In a sense, the web browser provides a layer over HTTP for the user: the browser manages the communication so that the user doesn’t have to. But what if we had to do this ourselves?

Enter telnet, a command-line utility that, according to Wikipedia, "provide[s] a bidirectional interactive text-oriented communication facility." For our purposes, we can use telnet to send HTTP requests that we wrote ourselves. It will then show us the HTTP responses that the server sends back.

So let’s try this out with the HarvardFood API since, according to its documentation, "you can query it using simple HTTP GET requests." Cool! (For the remainder of this lab, refer to the API’s documentation.)

To start telnet and specify that you are connecting to food.cs50.net, enter telnet food.cs50.net 80 at your Appliance’s terminal. If all goes well, you should see something like this:

francis@appliance:~ telnet food.cs50.net 80
Trying 140.247.63.236...
Connecting to hs.cs50.net.
Escape character is '^]'.

And now you are free to enter your request! Let’s start by entering a simple request, something like we saw in lecture:

GET / HTTP/1.1
Host: food.cs50.net

Type this at the prompt and press enter twice. You should see a response from the server like this:

HTTP/1.1 302 Moved Temporarily
Date: Tue, 18 Jun 2013 20:24:34 GMT
Server: Apache
Location: https://manual.cs50.net/HarvardFood_API
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8

Connection closed by foreign host.
Some questions:
  • Visit http://food.cs50.net in a browser. What kind of behavior is exhibited when you visit this website, and is it consistent with the response you get in telnet? What header(s) indicate(s) this?

  • Using the API documentation, figure out what URLs will provide nutrition facts for 1 portion of recipe 117003.

  • Request the data in CSV and JSON format using telnet.

    • What are the differences in the headers sent in the response, and how does this affect the behavior that you see when visiting these URLs in the browser?

    • Why is it necessary to enter the Host header in the request (as of HTTP/1.1)?

Videos

study50 video thumbnail

Monday, Week 7

An introduction to HTTP.
study50 video thumbnail

David on HTTP