In modern protocol design, protocols are "layered". Layering is a design principle which divides the protocol design into a number of smaller parts, each of accomplishes a particular sub-task, and interacts with the other parts of the protocol only in a small number of well-defined ways.
For example, one layer might describe how to encode text (with ASCII, say), while another describes how to inquire for messages (with the Internet's simple mail transfer protocol, for example), while another may detect and retry errors (with the Internet's transmission control protocol), another handles addressing (say with IP, the internet protocol), another handles the encapsulation of that data into a stream of bits (for example, with the point-to-point protocol), and another handles the electrical encoding of the bits, (with a V.42 modem, for example).
Layering allows the parts of a protocol to be designed and tested without a combinatorial explosion of cases, keeping each design relatively simple. Layering also permits familiar protocols to be adapted to unusual circumstances. For example, the mail protocol above can be adapted to send messages to aircraft. Just change the V.42 modem protocol to the INMARS LAPD data protocol used by the international marine radio satellites.
The reference model usually used for layering is the OSI seven layer model, which can be applied to any protocol, not just the OSI protocols. In particular, the Internet protocol can be analysed using the OSI model.
It is a truism that communication media are always faulty. The conventional measure of quality is the number of failed bits per bit transmitted. This has the wonderful feature of being a dimensionless figure of merit that can be compared across any speed or type of communication media.
In telephony, failure rates of 10-4 bit per bit are faulty (they interfere with telephone conversations), while 10-5 bit per bit or more should be dealt with by routine maintenance (they can be heard).
Communication systems correct errors by selectively resending bad parts of a message. For example, in TCP (the internet's Transmission Control Protocol), messages are divided into packets, each of which has a checksum. When a checksum is bad, the packet is discarded. When a packet is lost, the receiver acknowledges all of the packets up to, but not including the failed packet. Eventually, the sender sees that too much time has elapsed without an acknowledgement, so it resends all of the packets that have not been acknowledged. At the same time, the sender backs off its rate of sending, in case the packet loss was caused by saturation of the path between sender and receiver. (Note: this is an over-simplification: see TCP and congestion collapse for more detail)
In general, the performance of TCP is severely degraded in conditions of high packet loss (more than 0.1%), due to the need to resend packets repeatedly. For this reason, TCP/IP connections are typically either run on highly reliable fiber networks, or over a lower-level protocol with added error-detection and correction features (such as modem links with ARQ). These connections typically have uncorrected bit error rates of 10-9 to 10-12, ensuring high TCP/IP performance.
Another form of network failure is topological failure, which a communications link is cut. Most modern communication protocols periodically send messages to test a link. In phones, a framing bit is sent every 24 bits on T1 lines. In phone systems, when "sync is lost", fail-safe mechanisms reroute the signals around the failing equipment.
In packet switched networks, the equivalent functions are performed using router update messages to detect loss of connectivity.
Protocol Layering
Error Detection and Correction
Resiliency
Further reading