Sunday, September 15, 2024

Increasing Web Resilience – Communications of the ACM

Computer scienceIncreasing Web Resilience – Communications of the ACM


For almost as long as there has been a public Internet, there have been projects devoted to “Connecting the Unconnected.” This name supports the idea that connectivity is a binary condition: an individual is either “connected” or “unconnected.” The further assumption seems to be that being connected is a persistent condition; once connected, an individual can then participate fully in the online world.

Technically this is correct, Internet connectivity is binary. The specification of the Internet Protocol (IP) does not include any discussion of successful delivery metrics or guarantees, so any Internet link which ever has a non-zero rate of successful datagram delivery is “connected.” However, from a practical point of view, almost all Internet traffic is carried using the Transmission Control Protocol (TCP) layered on top of IP, and implementations of TCP do include timers which can enforce minimum rates of delivery.

What is called “a TCP connection” is in fact a pair of data structures at two endpoints that store the values of a number of variables which characterize the state of communication between them. These values include the sequence numbers that have been sent or received, enabling retransmission of lost datagrams. It also includes the timers that are used to determine when retransmission is necessary and when the entire connection is not able to deliver data to be useful. When this last value becomes zero, a somewhat arbitrary decision can be made to declare the connection “broken” and return an error to any applications attempting to use it.

TCP timers provide a very simple example of how the resilience of Internet applications is in fact reduced by making assumptions that do not always hold true. An Internet connection which fails to deliver packets for a length of time that exceeds the TCP timer duration will “fail.” If the disconnection is intermittent, a timer with a longer duration might be able to maintain the connection. For a TCP timer to implement a short duration reflects a logically stronger assumption about the network than implementing a longer timer would. The weakest possible assumption would be to set the TCP timer to “infinity,” meaning that it never expires. In other words, the weaker the assumption made by TCP, the more resilient the connection.

LiteLoad is an undergraduate research project at the University of Tennessee’s EECS Department that seeks to leverage this principle more broadly, to increase the resilience of the World Wide Web by reducing the logical strength of the assumptions made by content and service developers. LiteLoad is initially focusing on assumptions made in HyperText Markup Language (HTML) authoring and in sessions implemented at the application layer. In later stages, it will turn to the implementation of Web browsers and servers and the way that HyperText Transport Protocol (HTTP) is used by them.

  • Object Size. Many Web applications make assumptions that are associated with broadband connectivity. Broadband is currently defined by the federal government as 100 Mbps of capacity available for download (from the network) and 20 Mbps for upload (to the network). This assumption is reflected in the inclusion of large objects in the elements required for successful loading of a Web page or use of an application. Thus, a simple step that developers can take to avoid this strong assumption is to limit the size of required objects.
  • Number of servers. A less obvious assumption made in Web authoring is that required objects can be downloaded from a large number of different servers located in many different topological regions of the Internet. The browser must establish a TCP connection to every server that contributes an object to a Web page and initiate download using HTTP. TCP connection setup requires a complex protocol that goes through a number of intermediate states, and each connection has a nonzero chance of “hanging” during this process. Using a large number of different servers makes an implicit assumption that the network and all the servers will function smoothly and quickly. One simple step developers can take to avoid this strong assumption is to limit the number of servers; another is to avoid unnecessarily accessing backend services (e.g., databases) if they increase the chance of delay or failure.
  • Application sessions. Web applications create their own sessions in the form of state maintained by applications invoked through the Web server. These sessions have their own timeouts, and they also can require that the client continue to connect to the same server for the duration of the application session. These sessions may be characterized by user logins, by accumulation of information using Web forms, or by the creation of a shopping basket. Modern implementations of Web services often use replication of server functionality in many geographically distributed datacenters maintained by Content Delivery Networks or Distributed Cloud operators. Web “cookies,” pointers sent to the browser and returned with each successive HTML request, often are used to maintain the link between browser and server state. If this link is lost, the application session can be lost along with its accumulated state, requiring the end user to start over from the beginning. Web applications that maintain application sessions in this way are making an assumption that the network (and perhaps the server infrastructure) will be stable enough to allow such links to be maintained. A step that developers can take to avoid this strong assumption is to limit the use of application state and work on developing a more resilient way to implement it.

Figure 1: Anecdotal experiments in the rural area around the Smoky Mountains show the number of major provider mobile Internet connections (vertical axis) achieved at various levels of access quality (horizontal axis). Colored bars indicate areas advertised as offering different levels of mobile connectivity. Note the occurrence of very poor or failed connections in areas advertised as offering 5G.

The LiteLoad project is measuring the impact on Web resilience of such strong assumptions in Web pages and applications used for delivering critical services. LiteLoad is working to quantify the impact of poor connectivity in rural areas near the Great Smoky Mountains National Park (see Figure 1), Tennessee, as a testing ground. The project is also collecting anecdotal information about the impact of connectivity challenges on the ability of end users to obtain critical services online. This has included stories about the lack of accurate civil defense information available during past wildfires in the Smoky Mountains area due to disruption of mobile phone and Internet connectivity.

The impact of making weaker assumptions about Internet connectivity in the implementation of critical services will be to enable service to reach the previously unconnected. Additionally, it can also increase the utility of available connectivity during periods of disruption, high traffic, or intermittent connectivity. It could, in principle, enable new forms of connectivity that do not make strong assumptions of continual connectivity with strong characteristics. This may facilitate the use of a variety of alternative technologies to provide critical services, possibly including transmitters mounted on satellites, drone airplanes, or balloons, or even taking advantage of data carried in storage devices mounted on terrestrial vehicles. This would require a wider reassessment of the necessity and value of the strong assumptions currently made by the developers of broadband applications.

The LiteLoad project is supported by the National Science Foundation under Grant Number 2125288.

Micah D. Beck, University of Tennessee

Micah D. Beck (mbeck@utk.edu) is an associate professor at the Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA.

Brenna Bentley

Brenna Bentley  (bbentle4@vols.utk.edu) is an undergraduate researcher at the University of Tennessee, Knoxville, majoring in Computer Science.

Researcher Anika Roskowski

Anika Roskowski (anikakroskowski@gmail.com) is an undergraduate researcher at the University of Tennessee, Knoxville, majoring in Computer Engineering with a special focus in low-level systems and policy.

Check out our other content

Check out other tags:

Most Popular Articles