Network communication, sockets: Task D

WARNING: This information might be out of date. Last update: winter term 2016 The goal of this project is to try basic algorithms from network communication. In this project it is important to take care about error detection and resolve conflict situations.

The first two projects are targeting the protocol stack. The protocol stack is useful for dividing the complex task to more simple tasks. Layers in the protocol stack have well defined interface and each layer can completely rely on the other layers. This approach should be used also in other tasks where it is not explicitly stated (Task 1). The last two project focus on message routing and delivery.

A group of up to 2 students can work on an assignment.

In case your application is multithreaded (has any parallel parts) you have to take care of correct thread synchronization.

General requirements

The program must be well-arranged, easily readable, appropriately commented and of course correctly functioning.
Program terminating with 'Segmentation fault' or similar severe failure cannot be accepted.
When specified, your program must simulate certain network errors.
All your programs must print debugging information to standard error output: all relevant events – disconnection, routing info, error in received telegrams, etc. It is recommended to print all information about internal states of your program to the standard error output too. There must be an option to turn the verbose debugging off.
All programs must use POSIX standard for communication and must compile under an UNIX system. Do not use any other libraries.
Source files must be compilable by the make command (a Makefile is created).
It should be possible to correctly shutdown the application. This applies for clients reading data from stdin (an EOF should terminate the application). The daemons or server can run forever, however they should react correctly to signal (at least SIGINT).
Application should not be computationally demanding when not really necessary (not even in exceptional state). The waiting loops are unacceptable (in case the error condition is detectable). Use appropriate blocking function. The use of waiting functions (sleep, usleep) is not considered a correct solution and can result in point penalty. For the same reasons you should not use blocking functions with timeouts (unless special cases, such as message delivery acknowledgment timeout).
Deliver messages without any delay. Delay is allowed in case of (artificial) communication error.
When communicating via the TCP stream socket, do not make assumption that what you send by one call with send would be received by one call to recv function.
You should stick to good correct programming style: appropriate function length, constant values at one place, not repeating block of similar code, etc.
Remember to correctly format (indent) your codes. In case of low readability of your code you will be asked to correct it. A good editor can do it for you.

Specification notes

No specifications are exhausting and does not have to describe each detail. This is to mimic the specification for a project you get in the real world. It is up to you to come with an elegant and rigorous solution of missing specifications.

In case of any doubts do not hesitate to contact your teacher. A real-world client rarely accepts poorly functioning solution with an excuse about specification…

Implementation note

The goal of the tasks is to:

teach you how to create an application communicating via IP sockets,
make you try to implement some basic methods related to network communication (that are often implemented in protocol stacks, kernel drivers, etc.)

The socket you create should be considered as a communication channel (a pipe used for reading/writing data) with no security. Unless stated otherwise, you should treat an open socket as an unreliable channel (with data loss/corruption possible) – even in case it's a TCP channel.

It is advisable to strictly separate socket communication layer (containing the socket and ev. error generator) and the layer implementing the main task – ideally by implementing in separate files. See the ISO/OSI model for an inspiration.

Additonal implementation notes

Implement the tasks in C (or C+ +) language.
Do not use any other libraries (than basic system ones). In most cases you will need only the pthread library.
Create a Makefile that allows compilation of your project: If the task has client and server part, the parts should be compilable by calling make server or make client respectively. If there is only one executable binary, compile it by calling make.

Testing

The correct implementation is not a trivial task. Remember to design helper application or scripts and testing data that helps you test correct function of the application, even for the rare exceptions.

Implement your test algorithms so that you test more sophisticated input. Manual entering an input for debugging is never sufficient and does not cover majority of cases. Wrong implementation often shows with an excessive data (load).

Reservation

Use the Upload System for reservation. You can solve the task individually or with a colleague. The implementation has to be demonstrated in personal by both members of the group (if applicable).

You have to upload the final version of the source codes to the Upload System. If you don't you get no assessment. Upload ONLY the source codes (*.c, *.h, Makefile) in an archive. Uploading unnecessary binary files will result in point penalty.

Evaluation

Perfectly correctly functional task (that complies with all the points above and the specification and that correctly handles error states) should be rated by maximum number of points. Depending on the maturity of the application functionality (and the code), the points will be deducted.

In case of incorrectly functioning application (segfault termination, not complying with the requirements, …) the application might (and probably will) be rejected.

You can (and possibly will) be asked for showing the application on two (or more, if applicable) machines. Usually a redirection of a text file to stdin will be tested.

Variant 1: Established connection

Socket type	TCP (stream)
Communication	Bidirectional
Nodes	1 server, 1 client

Implement acknowledged connection based on the reference ISO/OSI model that creates the following protocol stack. It is an example of secured communication where client and server are exchanging data and must detect if the same client connected after a disconnection (that the same relation is maintained) and the data drop-out. The relation consists of the connection establishing phase when the identity of communication parties is verified and to synchronization of sequence marks. During connection establishment the server awaits connection and the client initiates the connection.

For connection (your physical layer) use a stream socket (TCP).

The description of protocol stack layer follows (overview and details):

Data link layer	Sends and receives 'telegrams' via the TCP stream socket.
Transport layer	Adds error protection.
Session layer	Establishes and maintains relation.
Application layer	Reads from `stdin` and passes the lines to the session layer.

Data link layer

Link layer uses TCP socket communication. In this communication only characters '0'–'9' and 'a'–'f' and comma character ',' are allowed. Each byte to be transported must be converted into a tuple representing its hex value. The bytes (tuples) are separated by the comma character ','. The start and the end of telegram is indicated by two comma characters: ',,'.

Example: telegram with size 6 bytes and values: 0x04, 0x0d, 0x55, 0x32, 0x32, 0x58 is represented as the following string:

,,04,0d,55,32,32,58,,

Transport layer

The transport layer performs an error protection. It adds two bytes at the beginning of the telegram:

length of the telegram, and
a checksum.

length	checksum	original telegram bytes

The checksum is computed as bitwise XOR of all hex values (excl. commas) in telegram. Receiving a new telegram makes the transport layer perform the check of the checksum and only the telegrams with correct checksum are passed to relation layer. Telegrams with incorrect checksum are discarded.

Session layer

The session layer establishes and maintains a relation (session). Within the session it checks for message loss. Both server and client have a name for identifying to the other party. When exchanging data packets, the parties exchange a sequence number that server increments when sending a message. This allows detection of lost telegrams. Sequence number is a 16-bit value (binary). The byte order is described in the telegram format figures (see below).

The session layer uses two types of telegrams:

T0 for connection establishment
T1 for data communication
T2 for connection termination

Telegram T0 uses the following format:

0x10	sequence_high	sequence_low	name [1st byte]	name [2nd byte]	name […]

The client establishes connection by sending the T0 telegram with a sequence value of 0 and it's name. Server responds with a telegram T0 with a starting sequence value that it randomly generates for each incoming connection.

In case the telegram is lost (client does not receive response from the server within 1 sec, the client re-sends the T0 telegram.

Telegram T1 uses the following format:

0x20	sequence_high	sequence_low	data [1st byte]	data [2nd byte]	data […]

The sequential number is incremented by the server with each telegram sent. The client sends the sequential number last received from the server. It is obvious that within the relation the telegrams from client to server and from server to client should alternate: no party can sent two telegrams in a row without receiving telegram from the other party.

In case of telegram error the client waits for 1 sec for a reply from the server. In case of no reply the session is cancelled and the clients tries to renew it by sending T0.

When a wrong sequential number is received or if the telegram T0 is received then the session is canceled (it means that 2 new telegrams T0 need to be transmitted with new sequential numbers). Remember that the socket is not closed physically – it is a part of the data link layer.

Telegram T2 uses the following format:

0x30	sequence_high	sequence_low

Either client or server can decide to end the connection. In such case client sends T2 and waits for T2 from the server. In case of timeout (1 s) re-sends the T2 once more and waits again for 1 s. The closes the socket even in case the T2 from the server did not arrive.

If a server wants to terminate the session it waits for a data telegram from the client and answers with T2. Client also responds with T2. If server gets no T2 response, waits for 1 s and re-sends the T2. Waits once-more for the response and closes the connection even in case the message did not arrive.

Example of a connection set-up and communication:

Client

Server

0x10

0x00

C

L

I

E

N

T

1

0x10

0x55

0x99

S

E

R

V

E

R

5

0x20

0x55

0x99

M

E

S

G

E

1

0x20

0x55

0x9A

M

E

S

G

E

5

0x20

0x55

0x9A

M

E

S

G

E

2

0x20

0x55

0x9B

M

E

S

G

E

6

0x20

0x55

0x9B

M

E

S

G

E

2

0x30

0x55

0x9C

0x30

0x55

0x9C

Application Layer

Reads from the standard input (stdin) and passes each line to the session layer as a data telegram. Data received from the session layer are output to the standard output (stdout). The input lines can be cut to the max length selected (32 characters at minimal).

The client sends telegram(s) only in case it has input data available. Otherwise it waits. The server responds to the incoming telegram with no delay. If it has input data from stdin it sends the data, otherwise it sends response (telegram) with zero data size.

General requirements for Variant 1

Correctly separate the layers. Separate them to different functions (and even better: files). If you do not follow this requirement, the task might be rejected.

Make sure the layer does what is it supposed to. Nothing more, nothing less.

The client has 3 mandatory arguments: IP_server, PORT_server, name The server has 2 mandatory arguments: PORT_server, name

The server accepts only one incoming connection. After client connection all other clients are rejected until the connection is closed. Then it waits for a new client. Client connects to a running server. If the server is not running,client exits and prints an error message.

Both server and client read data from standard input (separated by the EOL = 0x0A character) and if possible, send data to the other party. Otherwise waits. It is obvious that the application (both of them) waits for 2 types of events: stdin and socket. Therefore the implementation needs the use of threads or the use of select function.

In case of client input termination (EOF character) the client sends T2 telegram and closes the socket after acknowledgment (in case of an error, re-sends the T2 after 1 second and waits another 1 s for a reply; then closes). The server continues running and waits for a new connection. In case of EOF at the server side and running TCP connection the server also sends T2 telegram response to the next data request from the client. The process of re-sending and closing is the same as the client (resending after 1 s, waiting 1s and terminating).

Both client and server must be resistant to any error in connection. It has to behave correctly in case of the incorrect checksum, in case of data layer error and in case of session layer. Telegram with an error is always discarded. In such case it is not an acknowledged transport, therefore the lost data are not transmitted again. The problem of lost packet handles another (higher) layer.

Because the communication is realized via the stream (TCP) socket you have to simulate the communication channel error(s). We recommend implementing the error simulation below the data layer – in the physical layer, just before sending data to the socket (or right after the reception. The error probability has to be user selected. For the debugging (and testing purposes) we suggest to implement a switch to activate and a parameter to set the error probability.

Variant 2: Acknowledged connection

Socket type	UDP (datagram)
Communication	Unidirectional
Nodes	1 server, 1 client

Implement acknowledged connection using the following algorithm: Selective Repeat ARQ for N=8.

For better understanding please watch Video or use the interactive page http://www.ccs-labs.org/teaching/rn/animations/gbn_sr/ showing the algorithm simulation.

The client reads data from stdin and sends data packets using an unreliable channel. The server acknowledges acceptation of individual packets. Data that are not acknowledged within a time limit (or “acknowledged” as errorneous) are sent by the client again so that the server outputs the same data as there are on the input of the server.

The client application opens a socket connection to the server (using the IP and port_number arguments) and sends messages that are read from stdin (1 line of input = 1 message = 1 packet). The lines are separated by the end of line character (EOL=0x10). The message size is (for simplification) limited to 255 characters. After receiving the EOF character (End of File), the socket connection and client application is terminated (after receiving the last ACK) and the server continues running, waiting for a new client to connect. The server can be terminated by the SIGINT signal (Ctrl+C).

The client adds a header in from of each message. The header contains two bytes of 16-bit sequence number and one byte containing message length. At the end it adds 1-byte check sum (computed as an XOR of all bytes of the original message):

sequence_high	sequence_low	length	data (length bytes)	checksum

For message sending use the UDP (Datagram) protocol that realizes your physical and data layer. In this layer (right before sending data to the UDP socket) implement a mechanism of error simulation that with (a specified probability) changes a (random) byte in the datagram, or does not send (drops) the whole datagram.

The server application receives datagram and sends back an acknowledgment about their reception. The accepted messages are displayed (in correct order) to the standard output (stdout). Implement the dropping of acknowledgments in a way similar to the way the client drops datagrams.

The client is started with 2 parameters: server_IP, server_port The server has one parameter: server_port

Selective Repeat ARQ algorithm details

The algorithm works with a shifting window over a sequence of the datagrams to be sent:

…	D2	D3	D4	D5	D6	D7	D8	D9	D10	D11	D12	D13	D14	D15	D16	…
Acknowledged			Sent, waiting for Ack								Waiting

Data that get within the window can be sent. In case the acknowledgment for the first datagram delivery in the window is received, the windows shifts by one datagram (allowing the sending of the next datagram).

This approach allows higher data transmission efficiency when compared to the naive approach when waiting for acknowledgment for each datagram sent.

The client must remember for each datagram the time the datagram was sent. In case the acknowledgment is not received within a defined time interval, the indicated datagram must be transmitted again.

Server should use the same structure as the client that allows the incoming datagrams to be correctly ordered.

Variant 3: Mail server

Socket type	TCP (stream)
Communication	Bidirectional
Nodes	1 server, multiple clients

Your task is to implement two standalone programs: client and a server. The client connects to the server and logs in using it's name. The connected client can send messages to other clients (using their name). The messages are read from client's stdin in the following form:

name:message

(terminated by the newline character (EOL=0x0A). The messages read are with no delay sent to the server.

The server delivers the messages to the clients with no delay. If the message is for a client that was never logged in, the server drops the message and informs the sender. If the recipient is not available in the moment, the server saves the message and informs the sender. Server can remember only a limited number of messages (i.e. 16) and can drop messages above the limit (has to inform the client about it).

The maximum number of clients connected can be limited (for simplification), the server must allow simultaneous connection of 32 clients in minimum. When the client connects, all stored messages are delivered to the client. When the client terminates, it must send the server a message informing it about disconnection.

You should consider the communication channel unreliable: the messages can be lost. (Although the TCP protocol guarantees the delivery, for the purpose of this task you will simulate the failures.) Consider only the case when the message is lost (as a whole). The communication channel dropping will be simulated by a special layer that drops the message with a given probability. For debugging purposes it is a good practice to implement the probability input using a commandline argument. In case the message is lost (and retransfered), the delay in delivery is acceptable.

The users have the guarantee of message delivery: Implement the acknowledging mechanism and re-transmitting of unacknowledged messages.

Use the stream socket (TCP) for communication between client and server. Design the communication protocol yourself so that the protocol allows all necessary functions.

The client has three mandatory arguments: server_IP, server_port, name The server has one mandatory argument: server_port

Variant 4: Dynamic routing

This variant might be more difficult for implementation. This should be solved by advanced programmers.

Socket type	UDP (datagram)
Communication	Bidirectional
Nodes	multiple nodes

You have to implement an application that can be server and client in one time. Each running application (node) accepts connection from other nodes that try to connect so that the communication network allows message transmission between any (two) nodes. In this network no central point is present, all nodes are equal.

Each node in the network is identified by its unique ID (1-byte unsigned number: 1 to 255). The network topology is specified by the configuration file: a text file with two type of lines:

node: physical placement of the node in the network
link: definition of a link between two nodes

The node line structure is:

node <ID> <port> <ip_address>

where the ID is the node ID, port is the port where the node is running and the ip_address is an optional address of the node (for the case the nodes run on different machines).

The link line structure is:

link <ID_client> <ID_server>

where the ID_client is the node ID that initiates (establishes) the connection and ID_server is the node ID to which the client connects.

Each application instance is run with one mandatory parameter (a node ID). Using this ID the application can find relevant lines in the configuration file.

Each instance must allow at leas 4 connections as a server and 4 connections as a client. All the connections established are bi-directional and no longer is there any differentiation between client or server.

The program uses one of the known |routing algorithms, or your own approach.

The program reads messages from stdin in the following format:

ID:message

where ID is the ID of target node. The message is a text string that can be of limited size for simplification (e.g. 128 bytes). Your task is to design (and implement) a communication protocol for the nodes that allows message delivery and delivery of helper information for the routing.

To demonstrate the ability of dynamic routing it is crucial for the network to operate even when the nodes are connecting and disconnecting. This means when the node is not reachable, you need to try the connection in regular intervals (e.g. 1 s) for case the node appears online.

Use the datagram socket for communication (UDP). Using the UDP protocol you do not have to search for the starts/ends of datagrams sent between nodes. In addition, you need only one socket for communication with all nodes connected.

The system must guarantee the message delivery in case there exists a route between sender and recipient. One message should in no case be delivered multiple times (when sent by multiple paths for example). Messages that cannot be delivered are dropped. It is also allowed to drop messages that do not fit in the program buffer (in case of the network overload).

If there is no possibility to deliver the message (no route exists), the system should inform the sender about the undeliveable message (if you do not implement this, the task is acceptable, but point penalty will result).

For the parsing of configuration file we have prepared a function parseRouteConfiguration:

int parseRouteConfiguration(const char * fileName, int localId, int * connectionCount, TConnection * connections)

You can download it via the route_cfg_parser.zip file that contains the .h file with header and a demonstration program (with a sample cfg file).

The function reads a list of connections for the current node (specified by the localID parameter from a file. For each connection you get ID, port and eventually the IP address in the following form:

typedef struct {
  int id;
  int port;
  char ip_address[IP_ADDRESS_MAX_LENGTH];
} TConnection;

Note that the code can be compiled under both C and C++.

For testing your application prepare an interesting set of different configurations (with an appropriate size).

Variant 5: Backup server

Socket type	TCP (stream)
Communication	Bidirectional
Nodes	2 servers, multiple clients

Design a system with two identical servers allowing saving and retrieval of data. Both servers are synchronized so that they output the same data no matter to which server the client connects. In case of an inconsistency (different data on each server) the more recent data are valid. The client applications are designed so that they try to connect to one of the servers and in case the server is not available, connects to the other server. More clients can connect to any of the server in the same time.

The client reads from stdin two types of commands:

set for saving data
get for data retrieval

The command format is the following:

set <ID> <DATA>
get <ID>

where the ID is a number (1–32) and DATA is any text string (the first two spaces in the set command separate the command and the ID argument; the other spaces are part of the data). The commands are separated by the end of line (EOL=0x0A).

The set command sends the data to the server and the server saves them using the proper ID. The get command retrieves the data from the server and outputs them to the stdout. Data can be limited (for simplification) to the maximum of 255 chars.

The user must be notified in case the data cannot be stored: e.g. the server (the client is currently connected to) is not available.

The client reads commands coming to the stdin. In case of receiving the end of file (EOF), the client terminates.

In case the server is terminated (including abrupt termination, connection lost without correct termination), the client should try to connect to the second server (and finish the operation that was not successful due to the connection interruption). If no server is available, the client can terminate with an error.

Servers must save data to the disk for the case both servers are terminated. No data can be lost. You should also guarantee that the client always (if possible) receives the most recent data (no matter which server got the last update). This also means that no data saved can be overwritten by any older version.

Client-to-server and server-to-server communication uses TCP sockets. Design your own synchronization protocols. In case you need it you can assume that both servers have the clock synchronized.

The server has three parameters: port, other_server_port, second_server_ip (both servers need to know the IP and port of the other server)

The client has four parameters: server_port, server_ip, backup_server_port, backup_server_ip

Variant 6: TBA

Warning: May be updated without prior notice.

Not available for 2014.

Variant 7: FTP server/client

Socket type	TCP (stream)
Communication	Bidirectional
Nodes	1 server, multiple clients

Warning: May be updated without prior notice. Check the page later.

Implement a sub-part of the FTP protocol, both server and client, both active an passive mode. An advantage of this task is that you can refer to an existing standard and you have already tools to test against (both server/client).

In this task we require you to write a test suite to make sure you code works correctly (and that it covers all requirements). The output of the show should view the status of implementation of the requirements stated below. Implement the test(s) so that it can easily work with both:

Your implementation
Standard implementation (Linxux FTP client/server)

Task requirements

The server must work with standard tools (ftp and/or lftp command). Also, test with a web browser¹⁾.
The client must work with an existing ftp server (i.e.: ftp://ftp.kernel.org)
Implement anonymous authentication only
Implement both active and passive mode
Implement data transfer mode that does not alter the data in transfer.
Use existing FTP server return codes
Required functionality:
- Client
  - Store/retrieve a file
  - Rename/Delete file (on the server)
  - Show size/modification time of the file (on the server)
  - Change local directory
  - Change remote directory (on the server)
  - Open a connection, log-in (anonymous)
  - Create/delete/list directory (on the server)
  - Print help (available commands with one-line usage info)
  - Print working directory (on the server)
  - Get (a list of) supported commands from the server
  - Use active/passive mode transfer
- Server
  - Similar list to support client requests.
  - Server must support simultaneous connection of multiple clients (use threaded/forked server version)²⁾
  - Implement countermeasure(s) against the Directory traversal attack

Hint: An (incomplete) list of commands you should take a look at (implement) follows³⁾:

account (ACCT)
ascii
binary
bye
cd (CWD)
cdup (CDUP)
delete (DELE)
disconnect/close
get
help (HELP)
lcd
ls (LIST)
mkdir (MKD)
mode (MODE)
modtime (MDTM)
open
put (STOR)
pwd (PWD)
quit (QUIT)
recv (RETR)
rename (RNFR/RNTO)
rmdir (RMD)
send (STOR)
size (SIZE)
system (SYST)
user (USER)

verbose
NOOP
PASV
PORT
(?)SITE

Hint: man ftp

¹⁾

There are many web browsers out there, pick one that complies with the standard.

²⁾

or even better, a thread-pool-ed version

³⁾

taken from the manual page of the ftp command, FTP commands in parentheses

Table of Contents

Network communication, sockets: Task D

General requirements

Specification notes

Implementation note

Additonal implementation notes

Testing

Reservation

Evaluation

Variant 1: Established connection

Data link layer

Transport layer

Session layer

Application Layer

General requirements for Variant 1

Variant 2: Acknowledged connection

Selective Repeat ARQ algorithm details

Variant 3: Mail server

Variant 4: Dynamic routing

Variant 5: Backup server

Variant 6: TBA

Variant 7: FTP server/client

Task requirements