HyperText Transfer Protocol (HTTP)

HyperText Transfer Protocol (HTTP) defines how resources on the web are transferred between applications.

The web, or World Wide Web, is a service that can be accessed via Internet. It’s a vast system of resources navigable by means of URL (Uniform Resource Locator). Applications primary interact with the resources which make up the web using HTTP.

HTTP is called a request response protocol because it follow the following model: a client make a request to a server and waits for a response.

Both request and response are text messages (or strings) that follow a standard format so that other machine can understand and interpret them. We often call the result of the response resources.

Statelessness

A stateless protocol is a protocol where each request/response pair are completely independent of the previous one. HTTP is a stateless protocol.

In HTTP context, it means the server does not need to know the last good state. If a request break, no cleanup is done; this make HTTP resilient, distributed and hard to control, but also difficult to build stateful application upon.

See Stateful Web Applications# for a way to make sateteful web applications out of HTTP.

Difference Between URI and URL

A web page address such as https://www.eff.org/pages/surveillance-self-defense is known as a Uniform Resource Locator, or URL. It’s like a physical address but for visiting web page.

A Uniform Resource Identifier, or URI, on the other hand specifies how resources are located. URL is the most frequently used part of the general concept of URI.

This is confusing, however, and the RFCs regarding the two terms are of no help. In short, though:

  • URI = URL + URN
  • URI identifies
  • URL locates
  • Locators (URL) are also identifiers (URI) but not necessary the other way around

The terms are not interchangeable but according to a W3C Note we should use URI when speaking about scheme and URL when speaking precisely about web page addresses.

See also this video for more information and examples.

URL Components

Let’s analyze the following URL and its components: https://www.eff.org:443/updates?type=case

  • https: the scheme. Written before a colon and two forward slashes (://). It tells the web client which protocol to use to access the resource. Often referred to as the protocol which is not totally wrong, but the correct term as an URL component is scheme.
  • www.eff.org: the host. Where the resources is at. Different from the domain, although this example uses a domain and a subdomain.
  • :443: the port. Can be omitted if it’s one of the default one (80 for HTTP and 443 for HTTPS, like in this case)
  • /updates: the path. Shows which resource within the host is being requested. Optional to make a complete HTTP request.
  • ?type=case: the query string made up of query parameters. Used to send data to the server. Also optional.

When accessing the URL above, we are redirected to https://www.eff.org/cases. This has nothing to do with how URL works; it’s simply the result of what the framework for this particular website decides to send back.

Unless something else is specified, port 80 will be assumed for every HTTP requests and port 443 will be assumed for every HTTPS request.

Query Strings/Parameters

Assume the following URL: https://www.eff.org/events/list?type=event&offset=0

Let’s break the query string down:

Query String Component Description
? Marks the start of the query string
type=event Parameter name/value pair
& Marks where another parameter will be added
offset=0 Another parameter name/value pair

Query strings are passed in through the URL. For this reason they are only used in HTTP GET requests. As they are visible in the URL, they should not be used for sensitive information such as username and password. Note also that query strings have a maximum length and space/special characters can’t be used (they need to be URL/percentage encoded).

URL Encoding

URL can only accept characters from the standard 128-character ASCII character set. Everything else need to be encoded. URL encoding replace non-conforming characters with a % symbol followed by two hexadecimal digits that represent the ASCII Code of the character.

Other than characters that are not in the ASCII character set, those considered unsafe, like characters used to write HTML tag < and > should also be encoded. Characters reserved for URL scheme that we already saw earlier, and characters for the query strings should also be encoded.

HTTP Request Method

HTTP Request Methods are a verb used to tell the server what action to perform on a resource.

GET Requests

GET requests are the most common requests. When entering an address in a browser bar or clicking a link, the browser is making a GET request on our behalf.

The response from a GET request can be anything. If it’s HTML and that HTML references other resources, the browser will automatically request those resources (a pure HTTP tool like curl will not).

POST Requests

GET are great to retrieve or ask for information from a server, but to send or submit data we often use POST.

POST is often used to send information through a form. Though most POST request could be done with GET, it is not wise to use the later with sensitive information as it will reveal the information in the query parameter. POST can also send bigger files, which GET cannot because of the size limitation of its requests.

Security

See HTTP Security#.