Web server protection: How the web works
As of January 2020, there are almost 4.54 billion people around the world that are active internet users. This means that the internet is reaching just past 59 percent of the world’s population.
Taking this data one level deeper, we can see that China, India and the United States lead the world when it comes to the number of internet users, while at least 95 percent of Northern Europe’s population is able to connect to the worldwide internet.
Although the internet as we know it today did not exist for many communities until the last 10 to 20 years, a world without it seems foreign and unimaginable. This system is able to connect billions of people together and deliver them nearly instantaneous access to the world’s ideas, art, music, science and everything in between.
Many credit what developed into the World Wide Web to Leonard Kleinrock, who wrote and participated in the U.S. Defense Department’s Advanced Research Projects Agency Network (ARPANET). ARPANET funded and founded many of the protocols (such as TCP/IP) and network systems (like packet switching) that are still used for the internet today, beginning around 1969. This was followed by the establishment of the first commercial internet service provider (ISP) in 1974.
Introduction to the web
So what did ARPANET do and how did it form the backbone for the internet that we know today? Similarly, how does the internet allow you to visit this website and read this article?
At a high level, the entire process begins with a computer, a device able to process our language and convert it into ones and zeros that can be transmitted to other computers. Information is displayed as pixels on a screen with the help of an operating system like Windows, Linux, or macOS and a browser like Google Chrome or Safari.
Broken down to its very basic components, the internet is a system through which one person, using a computer, is able to request information from another computer, usually created by a person on the other end. That request for information and the information sent back itself are both broken down into packets — or groups of specifically ordered ones and zeros known as bytes — and sent over wires to be made visible on the computer of the person requesting it.
ISPs facilitate this process by using standard language, protocols and systems that are understood by other ISPs that further connect people with one another around the world. As such, each ISP, using a system coordinated with other global ISPs, assigns each computer (or network of locally connected computers) with an Internet Protocol (IP) address, allowing requests for information to be sent specifically to one user and directed to another specific source for information.
Now, scale this model of one computer using a network connected to many other networks and having their requests routed to its destination out of many billions of computers and servers doing the same thing, and you have the internet as we know it today.
Client-server model overview
The structure defined above can also be described as the client-server model of network architecture. At its basic foundation, the client-server model is exactly what is described above; a person, through their computer (a client) requests information from another computer that is holding information or files (a server). In other words, a client is a computer or workstation that is running applications and operating systems that allow individuals and other applications to interact with them.
Servers are more powerful computers that are dedicated toward managing other devices to perform functions that range from access files, email, printing, navigating the web or creating new content through applications. Put another way, clients use servers to obtain access to information, files or other devices.
The client-server model describes how a server provides resources and other services to clients, often in a one-to-many arrangement, with a single server able to handle many clients simultaneously. In practice, when a client requests a connection to a server, the server then accepts the connection request, establishes the connection or a certain protocol — i.e., SMTP for email — and then provides the requested information or service. The same can be true for internet browsing, streaming online music or internet gaming; applications providing these services use servers to connect to many users across the internet.
There is a point, however, when a server cannot handle the amount of traffic that is requested of it. This is when many online services choose to distribute client requests across many servers, which is also known as distributed computing, facilitated through a practice called load balancing.
The client server model is the opposite of the “peer-to-peer” or P2P model, where clients connect directly when one another to share information and resources.
The domain name system: Finding other computers
So how does your computer know how to send a request to another computer and where on the world-wide network to find its destination? As mentioned before, every computer linked to the internet is assigned a unique address that identifies it to the rest of the world, the IP address. Under the current IPv4 model, each IP address is made up of a set of four digits, each separated by dots, such as: 22.214.171.124.
This sequence of numbers is fine for computers, but for human users, we are used to web addresses such as google.com, espn.com or infosecinstitute.com. These are known as domain names. Our internet browsers use the Domain Name System, which compiles all of the world’s IP addresses, to translate the domain names into their IP addresses so that browsers can load the information from the appropriate servers.
This process not only prevents users from having to remember number sequences for specific websites but ensures that a client and the server claiming to be associated with a certain network is, in fact, a real member. The DNS system uses a series of processes ranging from a root name server to the authoritative nameserver to ensure the information requested and the IP addresses associated are accessible and reliable.
Connecting to other computers with TCP/IP
With an IP address in hand for your internet browser to direct its requests, another ARPANET-based tool steps in to help define how the data between the two systems should occur: Transmission Control Protocol (TCP) and Internet Protocol (IP) systems. The TCP/IP protocols are the communication mechanisms that allow for orderly and secure information sharing. For example, a user — through the same device and network — can browse the internet and receive email at the same time, thanks to the routing, management and segmentation of internet services through protocols like TCP/IP.
Continuing the example, these two applications (the web browser and an email client) use different ports on your network, send separate traffic to different destinations and establish reliable connections with two different services, using protocols defined by TCP.
In the web browser part of the example, when your browser requests a TCP connection with a web server, it first asks the web server for permission to make a connection. The web server then accepts this connection request — known as a 200 OK message — or provides an error message back if the information requested is no longer there, a 404 Document Not Found message. Throughout this process, the IP address’s only job is to direct the data from device to device; the TCP model focuses on finishing the request for data and presenting the information back to your browser.
The larger TCP model can be further broken down into four pieces, or layers, each with their own functions:
- Application layer: Allows access to network resources and presents information to users
- Transport layer: Provides reliable message delivery
- Internet layer: Moves packets of data from the source to its destination
- Network interface: Transmits the data between two devices
The application layer
The application layer of the TCP/IP model focuses on the applications that people and other systems interact with. This is where data is input or received by end users or processed by other applications to perform other functions. Examples can include file creation and transfer, email, log-in screens, browsers and so on.
A discussion about the application layer and internet browsing can’t go any further without defining HTTP, which stands for HyperText Transfer Protocol. HTTP is the protocol used by the world wide web to define how messages between two client devices are created and transmitted, as well as defining the various responses and actions that web servers can take.
In other words, HTTP facilitates activity at the application layer, sending requests from clients (in this case web browsers) for web pages and images, for example, and providing them in a format that web servers can act upon. Once the transaction is complete, an HTTP session or connection is closed.
For example, when a user presses enter after typing in a web address into a browser, the following occurs:
- The browser connects to a local domain name server and pulls back the IP address associated with the domain’s web server
- The user’s web browser then connects to the web server and executes an HTTP call to pull back the code and files associated with the domain’s web page content
- If the connection is established and the server can find the requested information, then the web page will be presented in the user’s browser. If the web page cannot be found, the server will send an HTTP “Page Not Found” 404 error message
- The web browser then closes the initial HTTP connection. If additional files, images, links, content or other elements are required or requested by the user, follow-up connections are established with the web server
The transport layer
The transport layer focuses on the mechanism by which data is sent from a source device to a destination system across one or many networks. Features that are managed at this level include the speed to which data is sent, how much data is sent between two devices, the accuracy and reliability of the data sent and the sequencing of events. Finally, this layer also is where applications are able to log successful data transmission for user and application awareness.
The internet layer
The internet or network layer is where specific data segments, known as packets, are monitored and managed. This layer facilitates the creation of packet source and destination information — also known as the packet header — as well as the division of larger data into smaller packages and their security and traceability.
At a more practical level, this layer also helps to ensure that no matter which route a packet takes across a network between the source and destination, that the packets are sent successfully and in the proper order.
The network interface layer
The Network Interface Layer is the final layer of the TCP/IP model and it defines how data is sent across the network infrastructure. These details are used by networking hardware devices to direct and organize communications, helping to ensure the reliability of the connection and making sure that data is sent and received in its proper sequence.
This layer also helps devices to control the order and flow of data so that connections are balanced, decreasing the probability that a portion of the network is overpowered and subsequently errors out. Finally, this layer also assists network managers with configuring network devices into a larger local infrastructure, which makes adding new devices easy.
Securing web traffic through HTTPS
As concerns about securing internet browsing and data sharing have risen, the use of HTTPS (a secure version of HTTP) has become the norm for even average online activities. Using the HTTPS protocol in the web address in the browser routes the request to the web server’s secure TCP/IP port number, 443, rather than the default HTTP web server port, 80.
When this occurs, the web connection (or session) and its data are encapsulated in additional security protocol using Secure Socket Layer, SSL. This helps to prevent online activity from moving “in the clear” where eavesdroppers can intercept it.
The HTTPS protocol and the SSL encryption are facilitated with the user’s browser requesting to see the web server’s digital certificate, which is managed by a certificate-issuing body (e.g., Verisign) that verifies the authenticity of the organization and its web server. Once the user accepts this certificate as trustworthy, data is encrypted and decrypted between the two parties using a private and public key pairing between the client and the web server.
Bringing it all together
By no means is this a comprehensive look into the mechanics of how the internet works or the interfaces, policies or systems that makes it up, but this article does provide a broad understanding of its various facets. However, as with everything else in technology, how long will this status quo last? For example, the current version of the IP address is slowly evolving into IPv6, allowing for an almost unimaginable number of connected devices around the world.
And after that? Only time will tell. In just a few decades, the internet has come a long way from the ARPANET to a medium that billions rely on for information, entertainment, business and so much more.
- Client-Server Overview, MDN web docs
- TCP/IP Protocol Architecture Model, docs.oracle.com
- Internet History Timeline: ARPANET to the World Wide Web, Live Science