Introduction to Network Programming in UNIX & LINUX

Download Report

Transcript Introduction to Network Programming in UNIX & LINUX

Inter-Host Communication.
Berkeley Sockets.
Berkeley Sockets
This is API (Application Program Interface) for different Communication Protocol Suites
(TCP/IP, Unix Domain, XNS, etc)
Socket API contains the set of system calls for establishing of network connection and transfer of
data:
© D.Zinchin [[email protected]]
socket()
create endpoint
bind()
bind address
listen()
specify request queue
accept()
wait for connection
connect()
connect to peer
read(), write(),
recv(), send(),
recvfrom(), sendto(),
recvmsg(), sendmsg()
transfer data
close(), shutdown()
terminate connection
Introduction to Network Programming in UNIX & LINUX
3-1
5-Tuple Association and Socket Address
Generic Model
Association
= { Protocol, Port A, Address A, Port B , Address B }
Socket Address
= { Protocol, Port, Address }
TCP/IP Model
TCP Association
= {TCP, TCP Port A, IP Address A, TCP Port B, IP Address B }
UDP Association
= {UDP, UDP Port A, IP Address A, UDP Port B, IP Address B }
Socket Address
= {TCP/UDP, Port, IP Address }
Note:
In IPv4 and IPv6 the Socket Address has different format because of
different length of IP Address ( 32 bits for IPv4 and 128 bit for IPv6)
Unix Domain
The Unix Domain protocols are not an actual protocol suite, but a way of performing client / server
Communication on a single host using the same API that is used for clients and servers on different hosts.
The Unix Domain protocols are an alternative to the inter-process communication (IPC). There are two protocols:
- UNIXSTR
Stream Protocol (analog of TCP)
- UNIXDG
Datagram Protocol (analog of UDP)
The Unix Domain socket binding is provided to file path.
Unix Domain Association
= { UNIXSTR / UNIXDG, File Path A, 0, File Path B, 0 }
Socket Address
= { UNIXSTR / UNIXDG, File Path }
# netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address
Foreign Address
State
tcp
0
0 *:bootpc
*:*
LISTEN
tcp
0
0 *:x11
*:*
LISTEN
tcp
0
0 *:2055
*:*
LISTEN
tcp
0
0 Knoppix:2055
Knoppix:44992
ESTABLISHED
tcp
0
0 Knoppix:44992
Knoppix:2055
ESTABLISHED
udp
0
0 *:2055
*:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags
Type
State
I-Node Path
unix 2
[ ACC ]
STREAM
LISTENING
10226
/var/run/dbus/system_bus_socket
unix 2
[ ACC ]
STREAM
LISTENING
13345
/ramdisk/tmp/ksocket-knoppix/kdeinit__0
unix 2
[ ]
DGRAM
2318
@/org/kernel/udev/udevd
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-2
Socket Address Representation
Generic Socket Address Structure
A socket address structure is always passed by reference when passed as an argument to any socket functions.
All these functions are declared to accept the following generic socket address structure:
#include <sys/socket.h>
struct sockaddr {
sa_family_t sa_family; /* address family: AF_LOCAL, AF_INET, AF_INET6, … */
char sa_data[14];
/* protocol-specific address */
};
IPv4 Socket Address Structure
#include <netinet/in.h>
struct in_addr {
in_addr_t s_addr;
};
struct sockaddr_in {
sa_family_t
sin_family;
in_port_t
sin_port;
struct in_addr sin_addr;
char
sin_zero[8];
};
Note:
/* 32-bit IPv4 address, network byte ordered */
/* AF_INET */
/* 16-bit TCP or UDP port number, network byte ordered */
/* 32-bit IPv4 address, network byte ordered */
/* unused */
Due to the standard,
the fields of Socket
Address structure
would be filled with
Network byte order.
This is big-endian
order, when upper
byte has higher
address in memory.
Unix Domain Address Structure
#include <sys/un.h>
struct sockaddr_un {
sa_family_t
sun_family;
/* AF_LOCAL */
char
sun_path[108]; /* null-terminated pathname */
};
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-3
Socket Address Encoding Functions
Byte Ordering Functions
#include <netinet/in.h>
uint16_t htons(uint16_t host16bitvalue) ;
/* Host TO Network Short converter */
uint32_t htonl(uint32_t host32bitvalue) ;
/* Host TO Network Long converter */
uint16_t ntohs(uint16_t net16bitvalue) ;
/* Network TO Host Short converter */
uint32_t ntohl(uint32_t net32bitvalue) ;
/* Network TO Host Long converter */
Byte Manipulation Functions
#include <strings.h>
void bzero(void *dest, size_t nbytes);
/* places n null bytes in the string dest */
void bcopy(const void *src, void *dest, size_t nbytes);
/* copies n bytes from src to dest */
int bcmp(const void *ptr1, const void *ptr2, size_t nbytes); /* returns 0 if strings are identical, 1 otherwise */
Address Conversion Functions
These functions convert IP Address from decimal dotted notation to integer and vice versa.
#include <arpa/inet.h>
int inet_aton(const char *strptr, struct in_addr *addrptr); /* Address TO Network. Return:1-success / 0-error */
in_addr_t inet_addr(const char *strptr);
/* Deprecated, returns INADDR_NONE on error */
int inet_pton(int af, const char *strptr, void *addrptr);
/* Presentation TO Network. Analog of inet_aton(),
supporting different address families */
char *inet_ntoa(struct in_addr inaddr);
const char *inet_ntop(int af, const void *addrptr,
char *strptr, size_t strlen);
© D.Zinchin [[email protected]]
/* Network TO Address. Return: pointer to string */
/* Network TO Presentation. Analog of inet_ntoa(),
supporting different address families */
Introduction to Network Programming in UNIX & LINUX
3-4
Create The Socket
#include <sys/types.h>
#include <sys/socket.h>
int socket (int family, int type, int protocol);
• Creates the Socket - the endpoint for communication
• Parameter family specifies Address (or Protocol) Family:
AF_INET
IPv4
AF_INET6
IPv6
AF_LOCAL (or AF_UNIX)
Unix Domain
Note:
Each AF_... constant has corresponded
PF_... constant of the same value, which
could be used in the same way.
• Parameter type specifies the type of socket:
SOCK_DGRAM
datagram socket
SOCK_STREAM
stream socket
SOCK_RAW
raw socket (exit to Network Layer)
• Parameter protocol defines the protocol.
If 0, SOCK_DGRAM UDP,
SOCK_STREAM  TCP.
Other values:
IPPROTO_UDP
IPPROTO_TCP
IPPROTO_ICMP (used only with SOCK_RAW type)
IPPROTO_RAW (used only with SOCK_RAW type)
• Returns Socket Descriptor on success, -1 on error (errno specifies the error)
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-5
Bind: Assign the Local Address to Socket
#include <sys/types.h>
#include <sys/socket.h>
int bind (int sockfd, const struct sockaddr * addr, socklen_t addrlen);
• Assigns the local IP Address and Port, specified by addr parameter, to the socket.
Non-specific IP Address is specified by constant INADDR_ANY, non-specific port could be specified by value 0.
• Server binds well-known IP Address and Port.
If INADDR_ANY and specific Port is specified, the Server will listen on this Port from any of host’s IP Addresses.
• Client binds the specific or non-specific IP Address and Port.
• Returns 0 on success, -1 on error
Example: Server Socket Binding
#define SERV_HOST_ADDR “145.9.112.75”
#define SERV_TCP_PORT 5678
struct sockaddr_in servAddr;
int sockfd;
struct sockaddr_in cliAddr;
int sockfd;
/* create socket */
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {perror….;}
/* create socket */
sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {perror….;}
/* prepare Server binding */
bzero((char*)&servAddr, sizeof(servAddr));
servAddr.sin_family = AF_INET;
If (inet_aton(SERV_HOST_ADDR, &servAddr.sin_addr) == 0)
{perror…;}
servAddr.sin_port = htons(SERV_TCP_PORT);
/* prepare Client binding */
bzero((char*)&cliAddr, sizeof(cliAddr));
cliAddr.sin_family = AF_INET;
cliAddr.sin_addr.s_addr = htonl(INADDR_ANY);
cliAddr.sin_port = htons(0);
/* bind the socket */
if(bind(sockfd, (struct sockaddr*)&servAddr,
sizeof(servAddr)) < 0)
{perror….;}
/* bind the socket */
if(bind(sockfd, (struct sockaddr*)&cliAddr,
sizeof(cliAddr)) < 0)
{perror….;}
© D.Zinchin [[email protected]]
Example: Client Socket Binding
Introduction to Network Programming in UNIX & LINUX
3-6
Connect: Assign the Foreign Address to Socket
#include <sys/types.h>
#include <sys/socket.h>
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
• Used to specify foreign Address.
• For connection-oriented protocols (TCP, socket type = SOCK_STREAM) this call is used only by Client.
It establishes actual connection with Server, using address, specified by addr parameter.
(TCP sends SYN segment, waits for ACK segment and analyzes the cause of possible connection failure).
• For connectionless protocols (UDP, socket type = SOCK_DGRAM) this call is optional.
It could be used by Server or by Client to store the already known foreign Address and to use it for following
datagram sending / receiving. In this case foreign Address would not be specified for each sending datagram
and would not be extracted from each receiving datagram.
For connectionless protocol the actual connection with Server is not established.
“Connection refused” error could be identified only after next system call, really sending the data.
• Before calling connect() Client does not have to perform bind() call. In this case connect() call will assign also
the local address to the socket (as it done by bind(), called by Client with INADDR_ANY and port 0 ).
• Connection-oriented sockets can successfully connect() only once.
Connectionless sockets can use connect() multiple times to change their association.
Connectionless sockets can dissolve the association, specifying AF_UNSPEC family in call to connect()
(or NULL pointer to address structure on some UNIX systems – see ‘man connect’).
• Returns 0 on success, -1 in case of error.
If connect() fails, connection-oriented the socket is no longer usable and must be closed.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-7
Send Data Through Socket
#include <sys/types.h>
#include <sys/socket.h>
ssize_t write
( int sockdescr, const void* buf, size_t buflen);
ssize_t send
( int sockdescr, const void* buf, size_t buflen, int flags);
ssize_t sendto ( int sockdescr, const void* buf, size_t buflen, int flags, const struct sockaddr *to, int tolen);
struct msghdr {
void
* msg_name;
/* optional address */
socklen_t
msg_namelen; /* size of address */
struct iovec * msg_iov;
/* scatter/gather array – handles the list of memory fragments
to be read/written by single system call*/
size_t
msg_iovlen;
/* number of elements in msg_iov */
void
* msg_control;
/* ancillary data*/
socklen_t
msg_controllen; /* ancillary data buffer len */
};
ssize_t sendmsg( int sockdescr, const struct msghdr *msg,
int flags);
• These calls are used to transmit a message buf of length buflen to another transport end-point.
• Calls send() and write() may be used only when the socket is in a connected state
(Note! On some UNIX systems sending of empty datagram on connected UDP socket
is impossible with write() call. The call send() would be used instead. )
• Calls sendto() and sendmsg() may be used at any time.
• The target address is specified by parameter to of length tolen .
• If the message is too long to pass atomically through the underlying protocol, then the error EMSGSIZE is
returned, and the message is not transmitted.
• If the socket does not have enough buffer space available to hold the message being sent, send() blocks,
unless the socket has been placed in non-blocking I/O mode (see fcntl).
• The flags parameter is formed from the bitwise OR of zero or more of the following:
MSG_OOB
Send "out-of-band" data. Supported only by SOCK_STREAM sockets of AF_INET (AF_INET6) families.
MSG_DONTROUTE
The SO_DONTROUTE option is turned on for the duration of the operation.
It is used only by diagnostic or routing programs.
• The call sendmsg() call uses a msghdr structure to minimize the number of directly supplied parameters.
• These calls return the number of bytes sent, or -1 if an error occurred.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-8
Receive Data from Socket
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/uio.h>
ssize_t read
(int sockdescr, void *buf, size_t buflen);
ssize_t recv
( int sockdescr, void *buf, size_t buflen, int flags);
ssize_t recvfrom ( int sockdescr, void *buf, size_t buflen, int flags, struct sockaddr *from, int *fromlen);
ssize_t recvmsg ( int sockdescr, struct msghdr *msg, int flags);
• These calls are used to receive message from another socket to buffer buf of length buflen.
• Calls read(), recv() may be used only on a connected socket
• Calls recvfrom() and recvmsg() may be used to receive data on a socket whether it is in a connected state or not.
• If parameter from is not a NULL pointer, the source address of the message is filled in.
• Parameter fromlen is a value-result parameter, initialized to the size of the buffer from, and modified on return to
indicate the actual size of the address stored there.
• If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of
socket the message is received from.
• If no messages are available at the socket, the receive call waits for a message to arrive, unless the socket is nonblocking, in which case -1 is returned with errno set to EWOULDBLOCK. ( =EAGAIN. See fcntl() )
• The flags parameter is formed by ORing one or more of the following:
MSG_OOB
Read any "out-of-band" data present on the socket rather than the regular "in-band" data.
MSG_PEEK
"Peek" at the data present on the socket; the data is returned, but not consumed,
so that a subsequent receive operation will see the same data.
• The call recvmsg() call uses a msghdr structure to minimize the number of directly supplied parameters.
• These calls return the number of bytes received, or -1 if an error occurred.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-9
Get Local and Foreign Address of the Socket
#include <sys/socket.h>
int getsockname(int sockfd, struct sockaddr *localaddr, socklen_t *addrlen);
int getpeername(int sockfd, struct sockaddr *peeraddr, socklen_t *addrlen);
• These functions used to extract the local and foreign IP Address and Port of the Socket
• Return 0 on success, -1 on error.
Close the Socket
int close
(int sockdescr);
int shutdown (int sockdescr, int how);
• These calls close the socket connection.
• Call shutdown() call shuts down all or part of a full-duplex connection.
If how is 0, then further receives will be disallowed.
If how is 1, then further sends will be disallowed.
If how is 2, then further sends and receives will be disallowed.
• The system returns from these calls immediately, but in case of TCP protocol the kernel still tries to send
already queued data (if SO_LINGER socket option is not specified).
• These calls return 0 on success , -1 in case of failure.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-10
UDP Socket Example 1. Iterative Server
UDP Client
UDP Server
socket( )
socket( )
well-known
port
sendto( )
data (request)
bind( )
recvfrom( )
blocks until
datagram received
from client
…process request…
recvfrom( )
…process reply…
data (reply)
sendto( )
end session ?
no
yes
no
end session ?
stop server ?
yes
yes
close( )
close( )
exit( )
exit( )
© D.Zinchin [[email protected]]
no
Introduction to Network Programming in UNIX & LINUX
3-11
UDP Socket Example 2. Using connect().
UDP Client
UDP Server
socket( )
socket( )
well-known
port
connect( )
send( )
data (request)
bind( )
recvfrom( )
blocks until
datagram received
from client
connect( )
…process request…
recv( )
…process reply…
no
end session ?
yes
close( )
data (reply)
send( )
no
end session ?
yes
connect(..AF_UNSPEC or NULL.. )
stop server ?
yes
no
close( )
exit( )
exit( )
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-12
UDP Socket Example 3. Concurrent Server.
UDP Server
UDP Client
socket( )
socket( )
well-known
port
bind( )
data (request)
recvfrom( )
sendto( )
fork( )
subserver
no
close parent
socket
recvfrom(..MSG_PEEK.. )
data (1st reply)
connect( )
blocks until
datagram received
from client
socket( )
is parent ?
yes
stop server ?
yes
close( )
connect( )
exit( )
close( )
no
…process request…
recv( )
no
data (reply)
send( )
…process reply…
end session ?
yes
close( )
end session ?
yes
close( )
exit( )
no
exit( )
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-13
UDP Examples Synchronization Problems
Problem 1 End of response notification
Client would know, when the last portion of data arrived from Server.
Possible Solution
Last datagram sent by Server would be empty.
Problem 2 Connection timeout
Client not always could receive “connection refused” error message even if it uses connect() system call.
Possible Solution
The connection timeout would be handled by Client.
Problem 3 Lost and disordered datagrams
Some of datagrams could be lost because of:
- Socket buffer overflow, when Server sends more quickly than Client is able to receive and process
- Because of network problems
- Because of unexpected termination of peer process
The datagrams could be received by Client in another order, than they were sent by the Server, because
each datagram could have its own route.
Possible Solution
Implementing of sequence control and flow by means of Acknowledgement with Retransmission algorithms
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-14
TFTP – Trivial File Transfer Protocol
(Example of UDP-based standard application protocol)
TFTP is standard UDP-based application protocol, providing simple method of transferring files between two hosts.
Developed and standardized in 1981. It is much smaller and easier than FTP (File Transfer Protocol).
Unlike FTP, the TFTP provides only file transfer and does not provide user authentication, directory listing, etc.
Because of simplicity, TFTP could be used for bootstrap of LAN workstations.
TFTP Message Formats
1
file name
0
transfer mode
0
read request
(RRQ)
2 bytes
2
file name
3
block #
0
transfer mode
0
TFTP supports 2 transfer modes:
“octet” and “netascii”
write request
(WRQ)
data
data up to 512 bytes
2 bytes
4
block #
5
errcode
acknowledgement
(WRQ)
error message
0
error
2 bytes
All 2-bytes fields: opcode, block#, errcode are stored in
Network Byte Order
© D.Zinchin [[email protected]]
Error codes:
1- File not found
2- Access violation
3- Disk full
4- Illegal FTP operation
5- Unknown port
6- File already exists
7- No such user
Introduction to Network Programming in UNIX & LINUX
3-15
TFTP Transfer Scenarios
Client
(receiver)
Server
(sender)
File
RRQ
data, block #1
• TFTP Server is concurrent.
• TFTP Server well-known port is 69/udp
• TFTP Client initiates connection, sending
RRQ (read from Server) or WRQ (write to Server)
ACK, block #1
data, block #2
ACK, block #2
data, block #3
. . . . . . . . .
first datagram.
• Server main process spawns child Sub-Server
process to handle Client request. Main process
then returns to listening on well-known port.
Client
(sender)
Server
(receiver)
File
WRQ
ACK, block #0
• Sub-Server creates new socket and binds it with
unique local port. All following responses to the
Client then sent from this port.
• Client reads Sub-Server address from 1st response
data, block #1
ACK, block #1
data, block #2
ACK, block #2
. . . . . . . . .
and then uses it for all following session.
• The data block shorter than 512 bytes is
recognized as the last data portion.
TFTP Server is standard UNIX daemon.
TFTP Client is realized as standard UNIX utility tftp
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-16
“Sorcerer’s Apprentice Syndrome”
Sorcerer's Apprentice Syndrome (SAS) is a particularly bad network protocol flaw in the original versions of TFTP.
Occurred in case of packet delay, which was not taken into account when the protocol was designed.
It was named after the “Sorcerer's Apprentice” segment of Walt Disney motion picture Fantasia.
Packet delay led to growing number of duplicated packets with following “chain reaction”, congestive collapse of
network and transfer failure.
Fixed Design:
Original Bad Design:
• Both Server and Receiver use timeout with retransmission
• Receiving duplicated acknowledgement, Server retransmits the data
send DATA(n)
• Receiver does not have retransmission timer
• Sender ignores duplicated acknowledgement
send DATA(n)
receive DATA(n)
send ACK(n)
(time out)
retransmit DATA(n)
receive ACK(n)
send DATA(n+1)
receive DATA(n)
send ACK(n)
(time out)
retransmit DATA(n)
receive DATA(n) (duplicate)
send ACK(n) (duplicate)
receive ACK(n)
send DATA(n+1)
receive ACK(n) (duplicate)
send DATA(n+1) (duplicate)
receive DATA(n+1)
send ACK(n+1)
receive ACK(n) (duplicate)
(don’t send anything)
receive ACK(n+1)
send DATA(n+2)
receive DATA(n+1) (duplicate)
send ACK(n+1) (duplicate)
receive ACK(n+1)
send DATA(n+2)
receive ACK(n+1) (duplicate)
send DATA(n+2) (duplicate)
receive DATA(n+2)
send ACK(n+2)
...
...
© D.Zinchin [[email protected]]
receive DATA(n) (duplicate)
send ACK(n) (duplicate)
receive DATA(n+1)
send ACK(n+1)
receive DATA(n+2)
send ACK(n+2)
...
Introduction to Network Programming in UNIX & LINUX
...
3-17
Listen, Accept - TCP Server-Specific System Calls
#include <sys/socket.h>
server
int listen (int sockfd, int backlog);
• Assigns the length of the queue
for TCP connection requests.
• Parameter backlog specifies the number of
pending requests queued by system.
• If queue full, Client performs connection request
retransmission.
• Returns 0 on success, -1 on error
completed connection queue
(ESTABLISHED state)
TCP
connection
queues
incomplete connection queue
(SYN_RCVD state)
#include <sys/socket.h>
int accept (int sockfd, struct sockaddr *cliaddr,
socklen_t *addrlen);
arriving
SYN
3-Way handshake
and TCP connection queues
• Returns the next completed connection from
the front of the completed connection queue.
client
• If the completed connection queue is empty,
connect called
the process is put to sleep
RTT
• The cliaddr and addrlen parameters are used
(round-trip
time)
to return the address of connected Client.
• Returns new socket descriptor automatically
connect returns
created by the kernel. Returns -1 in case of error.
• Usually a Server creates only one listening socket,
which then exists for the all lifetime of the server.
The kernel creates one connected socket for each accept()-ed client connection.
When a Server (or concurrent Sub-Server) is finished serving a given client,
the connected socket is closed.
© D.Zinchin [[email protected]]
sum of both queues
cannot exceed backlog
Introduction to Network Programming in UNIX & LINUX
server
create entry on incomplete queue
RTT
(round-trip
time)
entry moved to completed queue
accept returns
3-18
TCP Socket Examples. Iterative & Concurrent Server.
TCP Client
TCP Server
socket( )
listenSock = socket(...)
bind (listenSock,…)
listen (listenSock,…)
connect( )
connection establishment
acceptSock = accept (listenSock,…)
blocks until
connection request
from client
concurrent server only
fork( )
yes
is parent ?
no
close (listenSock)
send( )
data (request)
close (acceptSock)
recv (acceptSock,…)
blocks until
data received
from client
…process request…
recv( )
…process reply…
no
end session ?
yes
close( )
data (reply)
send (acceptSock,…)
end session ?
yes
stop server ?
yes
close(listenSock)
no
close (acceptSock)
no
exit( )
iterative server only
exit( )
concurrent sub-server only
exit( )
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-19
UDP and TCP Server Comparison
UDP Server
TCP Server
Readiness to get
request from Client
•Executes recvfrom()
• Executes listen().
• Executes accept() to get already
established connection from queue
Content of Message
from Client
• All messages:
Application data with Client Address
• The same call recvfrom() for all the
messages from Client
• 1st message: Connection request
Next messages: Application data
• Separate call recv() to read the
Application data
Concurrent Server
Implementation
• Creates the separate socket
on another port
• Accepts from kernel new socket
descriptor, already connected to client
through the same port
Reliability of service
• Must be provided at application layer
• Provided by kernel on transport layer
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-20
Socket Options and Control Operations
There are various ways to get and set the options that affect a socket:
• System call fcntl() - provides control functions on files and sockets
• System call ioctl() - provides control functions on files, sockets, terminal and devices
• System call setsockopt()
#include <sys/socket.h>
int getsockopt(int sockfd, int level, int optname, void *optval, socklen_t *optlen);
int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen);
• These system calls get / set the value of Socket Option for socket, specified by descriptor sockfd
• Parameter level specifies subsystem responsible for the option, for example:
SOL_SOCKET – general socket code
IPPROTO_IP, IPPROTO_TCP – protocol-specific code
• Parameter optname specifies ID of the specific option.
• Parameter optval used to specify/extract the specific value of the option.
• Parameter optlen specifies length of optval.
• The option could be binary flag (0 or 1) or more complex value.
• The following socket options are inherited by a connected TCP socket from the listening socket: SO_DEBUG,
SO_DONTROUTE, SO_KEEPALIVE, SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_RCVLOWAT, SO_SNDBUF,
SO_SNDLOWAT, TCP_MAXSEG, and TCP_NODELAY. To ensure that one of these socket options is set for the
connected “accept” socket when the three-way handshake completes, we must set that option for the “listen”
socket.
Example:don't wait for TIME_WAIT state delay expiration before TCP Server restart.
int on=1;
….
if(setsockopt (sockListen, SOL_SOCKET, SO_REUSEADDR, (char *)&on, sizeof(on)) < 0)
{
perror(…);
}
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-21
Socket Options
Level
Name
Description
Settable
Data
Type
IPPROTO_IP
IP_OPTIONS
options in IP header
yes
char[]
IPPROTO_UDP
UDP_NOCHECKSUM
enables sending if datagrams with checksum=0
yes
int (bool)
IPPROTO_TCP
TCP_MAXSEG
get TCP maximum segment size
no
int
TCP_NODELAY
enables or disables the Nagle optimization
algorithm for TCP sockets
yes
int (bool)
SO_DEBUG
enables recording of debugging information
(for processes with EUID=0 only)
yes
int (bool)
SO_REUSEADDR
enables local address reuse
yes
int (bool)
SO_KEEPALIVE
enables sending keep connections alive messages
yes
int (bool)
SO_DONTROUTE
enables routing bypass for outgoing messages
same as flag MSG_DONTROUT)
yes
int (bool)
SO_LINGER
linger if data present - block shutdown(), close() calls
until queued data sent or timeout expired
yes
struct
linger
SO_BROADCAST
permits to transmit broadcast messages
yes
int (bool)
SO_OOBINLINE
enables reception of out-of-band data in band
(else, OOB data is received with MSG_OOB flag only)
yes
int
SO_SNDBUF
buffer size for output
yes
int
SO_RCVBUF
buffer size for input
yes
int
SO_SNDLOWAT
minimum byte count for output
no in Linux
int
SO_RCVLOWAT
minimum byte count for input
no in Linux
int
SO_SNDTIMEO
timeout value for output
no in Linux
int
SO_RCVTIMEO
timeout value for input
no in Linux
int
SO_TYPE
get the type of the socket
no
int
SO_ERROR
get and clear error on the socket
no
int
SOL_SOCKET
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-22
Socket Control Operations
#include <unistd.h>
#include <fcntl.h>
F_SETFL, O_NONBLOCK
F_SETFL, O_ASYNC
Non-blocking I/O.
int fcntl(int fildes, int cmd, /*arg*/ ...);
F_SETOWN
F_GETOWN
Set / Get the socket owner. Owner is
recipient of SIGIO and SIGURG signals.
#include <unistd.h>
Category
Socket
int ioctl(int fd, int request,
… /* void *arg */);
File
• A common use of ioctl()
by network programs
(typically servers) is
to obtain information
on all the host's interfaces,
the interface addresses,
whether the interface
supports broadcasting,
multicasting, etc.
Interface
ARP
Routing
© D.Zinchin [[email protected]]
Signal-driven I/O.
SIGIO sent when socket status changed
request
SIOCATMARK
SIOCSPGRP
SIOCGPGRP
FIONBIO
FIOASYNC
FIONREAD
FIOSETOWN
FIOGETOWN
SIOCGUFCONF
SIOCSIFADDR
SIOCGIFADDR
SIOCSIFFLAGS
SIOCGIFFLAGS
SIOCSIFDSTADDR
SIOCGIFDSTADDR
SIOCSIFBRDADDR
SIOCGIFBRDADDR
SIOCSIFNETMASK
SIOCGIFNETMASK
SIOCSIFMETRIC
SIOCGIFMETRIC
SIOCGIFMTU
SIOCxxx
SIOCSARP
SIOCGARP
SIOCDARP
SIOCADDRT
SIOCDELRT
Description
At out-of-band mark ?
Set process ID / group ID of socket
Get process ID / group ID of socket
Set/clear nonblocking flag
Set/clear asynchronous I/O flag
Get # of bytes in receive buffer
Set process ID / group ID of file
Get process ID / group ID of file
Get list of all interfaces
Set interface address
Get interface address
Set interface flags
Get interface flags
Set point-to-point address
Get point-to-point address
Set broadcast address
Get broadcast address
Set subnet mask
Get subnet mask
Set interface metric
Get interface metric
Get interface MTU
(many more; implementation-dependent)
Create/modify ARP entry
Get ARP entry
Delete ARP entry
Add route
Delete route
Introduction to Network Programming in UNIX & LINUX
Datatype
int
int
int
int
int
int
int
int
struct ifconf
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struct ifreq
struc arpreq
struc arpreq
struc arpreq
struct rtentry
struct rtentry
3-23
I/O Multiplexing
Input/Output Multiplexing
Process
is simultaneous handling of 2 or more different I/O channels.
sd1
2) Superserver inetd – waits for requests for multiple different services on
different ports and invokes corresponding service as separate sub-server
process, when specific request is accepted.
sd2
sock2
sock1
Examples
1) Printer connected to network waits simultaneously for requests:
- from local host processes
- on UNIX Domain STREAM socket
- from remote processes
- on IPv4 TCP socket
kernel
I/O Multiplexing Solution 1. Fork child process per channel
fork
Process
fork Child Process 1
fd
sd1
fd
Child Process 2
read
read
write
fd
sd2
read
write
pipe
sock2
sock1
kernel
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-24
I/O Multiplexing Solution 2. Polling
• Set both sockets to non-blocking mode
fcntl (sd1, F_SETFL, fcntl(sd1, F_GET_FL,0) | O_NONBLOCK);
fcntl (sd2, F_SETFL, fcntl(sd2, F_GET_FL,0) | O_NONBLOCK);
• Read from both sockets and, if nothing available, wait timeout
while(…)
{
/* try to read from 1st socket */
len = read (sd1, buff);
if (len >=0) break;
/*go to data processing */
if (errno != EWOULDBLOCK)
{ perror(…);}
/* try to read from 2nd socket */
len = read (sd2, buff);
if (len >=0) break;
/*go to data processing */
if (errno != EWOULDBLOCK)
{ perror(…);}
/* provide polling timeout sleep */
sleep(TIMEOUT);
}
/* begin accepted data processing */
…
© D.Zinchin [[email protected]]
Note: The following two error codes are equal:
EWOULDBLOCK = EAGAIN
Introduction to Network Programming in UNIX & LINUX
3-25
I/O Multiplexing Solution 3. Signal-Driven I/O
•
Establish handler for SIGIO signal.
void iohandler (int sig)
{…}
…
signal (SIGIO, iohandler); /* or using sigaction() */
•
Declare process as sockets owner
fcntl (sd1, F_SETOWN, getpid( ) );
fcntl (sd2, F_SETOWN, getpid( ) );
•
Enable signal-driven I/O
fcntl (sd1, F_SETFL, fcntl(sd1, F_GET_FL,0) | O_ASYNC);
fcntl (sd2, F_SETFL, fcntl(sd2, F_GET_FL,0) | O_ASYNC);
The Problems
1)
Signal does not contain information, on which descriptor the event occurred.
2)
This model works good with UDP – signal is sent by kernel when:
- new datagram arrives
- asynchronous error occurs
In case of TCP the signal is sent on any socket status change:
- connect request accepted, connection established, disconnect request accepted, disconnected,
data arrived, data sent, error occurred, etc.
As result, it is difficult to recognize the proper event to handle the data.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-26
I/O Multiplexing: System Call select
#include <sys/select.h> /* According to POSIX */
#include <sys/time.h> /* According to earlier standards */
#include <sys/types.h>
#include <unistd.h>
FD Set
Type fd_set is bitmask. Each bit flag in the mask
corresponds to file descriptor which equal to the
index of this bit flag.
FD Set Modification Macros:
FD_ZERO (fd_set *set);
/*empty the set*/
FD_SET (int fd, fd_set *set); /* set one bit ON */
FD_CLR (int fd, fd_set *set); /* set one bit OFF */
FD_ISSET (int fd, fd_set *set); /* test one bit */
struct timeval {
long tv_sec;
/* seconds */
long tv_usec;
/* microseconds */
};
int select (int maxFDplus1, fd_set *readFDs, fd_set *writeFDs, fd_set *exceptFDs, struct timeval *timeout);
• This system call provides the process with possibility
to wait for a number of file descriptors to change status:
- Descriptors in set readFDs are watched to became ready
for non-blocking read
- Descriptors in set writeFDs are watched to became ready
for non-blocking write
- Descriptors in set exceptFDs are watched for exceptions
• Specific events are not watched, if FD Set parameter is NULL.
• Parameter maxFDplus1 specifies the maximal width of FD Sets
to avoid check of all the bit flags in each FD Set.
• Parameter timeout specifies the wait period as follows:
- NULL – wait until I/O event occurs.
- pointer to structure with positive values
– wait no more than specified time.
- pointer to structure with tv_sec=tv_usec=0
– don’t wait (non-blocking check, used by polling)
• Return value is number of descriptors where event occurred.
The 3 FD Sets are modified to point on affected descriptors.
• Return value is 0 if timeout expired, -1 if error occurred.
© D.Zinchin [[email protected]]
Example. Wait for input [fd(0)] during 5 seconds
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
…
fd_set rfds;
struct timeval tv;
int retval;
FD_ZERO(&rfds);
FD_SET(0, &rfds);
tv.tv_sec = 5;
tv.tv_usec = 0;
retval = select(1, &rfds, NULL, NULL, &tv);
if (retval == -1) {
perror("select()");
} else if (retval) {
printf("Data is available now.\n");
/* FD_ISSET(0, &rfds) will be true. */
}else{
printf("No data within five seconds.\n");
}
Introduction to Network Programming in UNIX & LINUX
3-27
I/O Multiplexing: System Call pselect
#include <sys/select.h> /* According to POSIX */
#include <sys/time.h> /* According to earlier standards */
#include <sys/types.h>
#include <unistd.h>
struct timespec {
long tv_sec;
/* seconds */
long tv_nsec;
/* nanoseconds */
};
int pselect (int maxFDplus1, fd_set *readFDs, fd_set *writeFDs, fd_set *exceptFDs,
const struct timespec *timeout, const sigset_t *sigmask);
2 IN 1
select()
sigsuspend()
= pselect()
• This system call provides the process with possibility to wait simultaneously:
- for I/O events (for a number of file descriptors to change status)
- for signals
• It differs from select() call only by format of timeout parameter and by additional parameter sigmask,
which specifies the signal disposition for accepting of desirable signals.
I/O Multiplexing with simultaneous signal handling.
Example 2.
Example 1. Race condition.
if the signal occurs in critical region,
Reliable solution.
…
it will be lost if select blocks forever.
sigset_t newmask, oldmask, zeromask;
void sighandler(int sig){
intr_flag=1;
}
…
sigemptyset(&zeromask);
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
if (intr_flag)
critical
handle_intr(); /* handling before select */
region
if ( (nready = select( ... ) ) < 0) {
if (errno == EINTR) {
if (intr_flag)
handle_intr(); /* handling after select */
}
...
}
if (intr_flag)
handle_intr(); /* handle the signal */
if ( (nready = pselect ( ... , &zeromask) ) < 0) {
if (errno == EINTR) {
if (intr_flag)
handle_intr ();
}
...
}
© D.Zinchin [[email protected]]
sigprocmask (SIG_BLOCK, &newmask,
&oldmask);
Introduction to Network Programming in UNIX & LINUX
/* block SIGINT */
3-28
I/O Multiplexing: System Call poll
Event Mask Bits:
#include <sys/poll.h>
struct pollfd {
int fd;
/* file descriptor */
short events; /* requested events */
short revents; /* returned events */
};
int poll(struct pollfd *fdArray,
unsigned int fdArrayLen,
int timeout);
#define POLLIN
#define POLLPRI
#define POLLOUT
#define POLLERR
#define POLLHUP
#define POLLNVAL
#ifdef _XOPEN_SOURCE
#define POLLRDNORM
#define POLLRDBAND
#define POLLWRNORM
#define POLLWRBAND
#endif
#ifdef _GNU_SOURCE
#define POLLMSG
#endif
0x0001
0x0002
0x0004
0x0008
0x0010
0x0020
/* There is data to read */
/* There is urgent data to read */
/* Writing now will not block */
/* Error condition */
/* Hung up */
/* Invalid request: fd not open */
0x0040
0x0080
0x0100
0x0200
/* Normal data may be read */
/* Priority data may be read */
/* Writing now will not block */
/* Priority data may be written */
/* Linux only */
0x0400
• The system call poll() is variation of system call select()
• The parameter fdArray is array of structures of length fdArrayLen.
• Each structure in fdArray corresponds to single file descriptor to be watched for events.
• Requested and returned events are specified by separate event masks (events, revents), constructed
from Event Mask Bits.
• The timeout value specified as number of milliseconds and has following meaning:
-
timeout = -1 – wait until I/O event occurs.
-
timeout > 0 – no more than specified time.
-
timeout = 0 – don’t wait (non-blocking check, used by polling)
• Return value is number of descriptors where event occurred.
The occurred event type is described by revents field of fdArray element, corresponded to specific descriptor.
• Return value is 0 if timeout expired, -1 if error occurred.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-29
application
I/O Models
recvfrom
kernel
system call
no datagram ready
I/O Models: Blocking, Non-Blocking
wait for data
1. Blocking I/O Model
process blocks in
call to recvfrom
datagram ready
copy datagram
copy data from
kernel to user
return OK
copy complete
process
datagram
application
recvfrom
kernel
system call
no datagram ready
EWOULDBLOCK
recvfrom
system call
no datagram ready
EWOULDBLOCK
2. Non- Blocking I/O Model
process repeatedly
calls recvfrom,
waiting for an
OK return
(polling)
recvfrom
system call
wait for data
no datagram ready
EWOULDBLOCK
recvfrom
system call
datagram ready
copy datagram
copy data from
kernel to user
return OK
process
datagram
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
copy complete
3-30
application
I/O Models
I/O Models: Multiplexing, Signal-Driven
select
kernel
system call
no datagram ready
process blocks in
call to select,
waiting for one of
sockets to become
readable
wait for data
3. I/O Multiplexing Model
return readable
recvfrom
process blocks while
data copied into
application buffer
system call
datagram ready
copy datagram
copy data from
kernel to user
return OK
copy complete
process
datagram
application
process
continues
executing
sigaction
(set SIGIO
handler)
kernel
system call
return
no datagram ready
wait for data
4. Signal-Driven I/O Model
deliver SIGIO
SIGIO handler
recvfrom
system call
datagram ready
copy datagram
process blocks while
data copied into
application buffer
copy data from
kernel to user
return OK
copy complete
process
datagram
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-31
application
I/O Models
aio_read
(set signal
handler)
kernel
system call
return
no datagram ready
wait for data
5.Asynchronous I/O Model
process
continues
executing
The POSIX defines set of system calls,
providing the API for implementation
of asynchronous I/O on set of descriptors:
datagram ready
copy datagram
copy data from
kernel to user
deliver signal
signal handler
process
datagram
copy complete
aio_read(), aio_write()
• These functions allow the calling process to initiate single read (write) asynchronous I/O request.
aio_error()
• This function returns the error status associated with the single asynchronous I/O request.
It is equivalent to errno value that would be set by the corresponding read() or write() system call.
• If the operation has not yet completed, then the error status will be equal to EINPROGRESS.
aio_return()
• This function returns the result associated with the single asynchronous I/O request after its completion.
lio_listio()
• This function allows the calling process to initiate a list of I/O requests within a single function call.
• Depending on passed argument values, the function could wait until all I/O is complete or
I/O Models:
Asynchronous after request scheduling, with following signal notification.
to return
immediately
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-32
I/O Models Comparison
Each input operation normally has two distinct phases:
•
Waiting for the data to be ready
•
Copying the data from the kernel to the process
Blocking I/O
initiate
Non-Blocking I/O I/O Multiplexing Signal-Driven I/O Asynchronous I/O
check
check
initiate
check
check
blocked
check
wait for data
check
check
complete
notification
initiate
blocked
complete
ready
initiate
blocked
blocked
complete
blocked
check
complete
1st phase handled differently
2nd phase handled the same
(blocked in call to recvfrom)
copy data
from kernel
to user
notification
handles both
phases
POSIX gives the following definitions of Synchronous and Asynchronous Input / Output:
• A Synchronous I/O operation causes the requesting process to be blocked until that I/O operation completes.
• An Asynchronous I/O operation does not cause the requesting process to be blocked.
Using these definitions, the first four I/O models—Blocking, Non-Blocking, I/O Multiplexing, and Signal-Driven I/O—
are all Synchronous because the actual I/O operation blocks the process.
And only the Asynchronous I/O model matches the definition of Asynchronous I/O.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-33
Daemon Process
Daemon is a process that runs in the background and is not associated with a controlling terminal.
Many standard UNIX network services: printer, remote login, file transfer, tasks scheduling (cron)
are provided by servers which running as daemons.
Who starts the daemons ?
• Daemons started during system startup
These daemons are started by scripts /etc/rc…and have superuser privileges.
In such a way traditionally the following daemons are started:
Server cron
(initializes the specified tasks in scheduled time)
Superserver inetd (listens on multiple sockets and spawns sub-servers per service request)
Web Server
Mail Server
Server syslogd
(provides logging services for all running daemons)
• Daemons started by superserver inetd
These are FTP Server, TFTP Server, Telnet Server, Rlogin Server, etc.
These servers are spawned by inetd for handling of specific request and run as daemons.
• Daemons started by cron server
These are the programs configured (in system file /usr/lib/cronab ) to be started in specific scheduled time.
A program also could be scheduled by means of crontab and at UNIX commands.
All programs started in specific moment of time by cron server, are executed as daemons.
• Programs started from user terminal
How could a process to become a daemon ?
• Pass to background
• Disassociate from process group
• Disassociate from control terminal
• Close stdin, stdout, stderr and all inherited unnecessary file descriptors
• Reset working directory and file creation mask.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-34
Daemon syslogd
Since daemons do not have a controlling terminal, they can not use standard output and standard error streams.
To provide information output from daemons, the standard syslogd daemon is used.
The syslogd daemon life cycle:
• starts during UNIX startup;
• reads its configuration (file /etc/syslog.conf ), specifying where the accepted messages would be logged;
• opens Unix Domain socket and binds it to well-known name (/var/run/log or /dev/log);
• listens for messages from all other daemons and handles them according to specified configuration.
To communicate with syslogd daemon, other daemon processes could use the following functions:
#include <syslog.h>
void syslog(int priority, const char *message, ... );
void openlog(const char *ident, int options, int facility);
void closelog( );
• Function syslog() establishes the connection with daemon syslogd and logs the message.
• Parameter message is format string (as in printf() ) extended with %m pattern,
which is replaced with the error message corresponding to the current value of errno.
• Parameter priority is combination of 2 values:
- level
(severity)
(0=LOG_EMERG – highest severity, …, 7=LOG_DEBUG – lowest severity)
- facility (functional area) (LOG_CRON, LOG_FTP, LOG_MAIL, LOG_USER, etc.)
The level and facility values are used in configuration file /etc/syslog.conf for specification,
to where the syslogd will forward specific messages. (See full level and facility list in man description)
• Function openlog( ) could be called in the beginning of a process to specify common prefix ident, default value
of facility and additional options (output to console, print PID, etc.) for all the upcoming messages.
• Function closelog( ) could be called when the application is finished sending log messages.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-35
#include <syslog.h>
int closeAll {…}
/* To find all actually open descriptors, navigate through /proc/self/fd/
or use system-dependent system calls like fcloseall(), closefrom() etc. */
Daemon
Initialization
int daemon_init(const char *prog_name, int facility)
{
int i;
pid_t pid;
This example shows the function,
which “demonizes” the process.
Some of UNIX / LINUX systems
provide daemon () function with
the same functionality.
• The First fork passes the
child
process to background
• To disassociate from control
terminal, process became
session (and group) leader
• The fork guarantees,
that the 2nd child is no longer
a session leader, so it cannot
acquire a control terminal.
• The standard input, output and
error are redirected to /dev/null
to avoid the errors, when these
descriptors unexpectedly assigned
to files, sockets, and then printf or
perror is called.
• Working directory and file mask
could be reset to specific values,
depending on process functionality.
if ( (pid = fork()) < 0){
return (-1);
}else if (pid > 0){
exit(0);
}
1st
© D.Zinchin [[email protected]]
/* parent terminates */
/*--- child 1 continues in background... ----*/
if (setsid() < 0) {
/* become session leader */
return (-1);
}
signal(SIGHUP, SIG_IGN);
if ( (pid = fork()) < 0) {
return (-1);
}else if (pid > 0) {
exit(0);
}
/* disassociate from control terminal */
/* child 1 terminates */
/*--- child 2 continues, it is not session leader ... ----*/
chdir("/");
/* change working directory */
closeAll();
/* close all file descriptors */
open("/dev/null",O_RDONLY); /* redirect stdin, stdout, and stderr */
open("/dev/null",O_RDWR);
open("/dev/null",O_RDWR);
openlog(pname, LOG_PID,
/* pre-configure syslog output */
facility);
}
return (0);
/* initialization success */
Introduction to Network Programming in UNIX & LINUX
3-36
Superserver inetd
This server simultaneously waits for requests for multiple different services on different ports and invokes
corresponding service as separate sub-server process, when specific request is accepted.
The Superserver inetd has the following advantages:
1. It allows a single process inetd to be waiting for incoming client requests for multiple services,
instead of one process for each service. This reduces the total number of processes in the system.
2. It simplifies writing daemon processes since most of the startup details are handled by inetd.
The “price” for these advantages is execution of fork() and exec() for every handling request.
The Superserver inetd life cycle:
• Starts during UNIX system startup
• Reads its configuration from file /etc/inetd.conf
• Opens all sockets specified by configuration and performs simultaneous wait for request
• Accepting request, spawns ( fork() + exec() ) specified sub-server to handle specific request
and passes actual server name as first argument (argv[0]) to spawned process.
The structure of
configuration file
/etc/inetd.conf :
The example of
configuration file
/etc/inetd.conf :
© D.Zinchin [[email protected]]
FIELD
service-name
socket-type
protocol
wait-flag
user-name
server-program
server-program-arguments
ftp
telnet
login
tftp
…
stream
stream
stream
dgram
tcp
tcp
tcp
udp
nowait
nowait
nowait
wait
root
root
root
nobody
DESCRIPTION
must be in /etc/services
stream or dgram
tcp or udp (must be in /etc/protocols)
wait (iterative) or nowait (concurrent)
from /etc/passwd, typically root
full pathname to be used for exec
arguments for exec
/usr/bin/ftpd
/usr/bin/telnetd
/usr/bin/rlogind
/usr/bin/tftpd
Introduction to Network Programming in UNIX & LINUX
ftpd -1
telnetd
rlogind -s
tftpd -s /tftpboot
3-37
Superserver inetd
work schema
1)
bind( )
On startup, inetd reads the /etc/inetd.conf file
and creates a socket of the appropriate type
(stream or datagram) for all the specified
services. It binds the sockets and for TCP
sockets also performs listen.
2)
Simultaneous waiting for request on all open
sockets is performed with system call select.
3)
For each arrived request sub-server process is
forked. It duplicates stdin, stdout and stderr to
be a socket descriptor, sets GID and UID and
performs exec call to actual server.
4)
5)
listen( )
(TCP only)
..add to FD Set..
select( ) (Read events)
accept( )
(TCP only)
Parent process in the same time continues to
wait for other requests. Also it periodically
accepts and handles SIGCHILD when subservers terminate.
For concurrent services (nowait) corresponded
bits in fd_set bitmask are always reset to ON
before next select call.
For iterative services (wait) the corresponded
bit flags restored only during handling of
SIGCHILD signal after termination of previous
sub-server of the same type.
Note: Configuring of UDP services as nowait
can cause a race condition, where:
-the inetd program selects on the socket
-and the server program reads from the socket.
© D.Zinchin [[email protected]]
For each service
listed in file
/etc/inetd.conf
socket( )
fork( )
is parent ?
no
yes
close accept-ed socket
(TCP only)
no
is “wait” service ?
yes
temporary remove FD
from FD Set
while child is running
Introduction to Network Programming in UNIX & LINUX
close all FDs
other than socket
dup socket FD to FDs:
0,1,2
close socket FD
setgid() setuid()
(if user not root)
exec( ) server
3-38
Name and Address Conversions
There are four types of network information that an application might want to look up:
• Hosts
• Networks
• Protocols
• Services
Protocol and Service information is always obtained from static files (/etc/protocols, /etc/services).
Host and Network information could be obtained from
application
Resolver Functionality
different sources:
• Domain Name System (DNS)
application
code
• Static files (/etc/hosts, /etc/networks)
function
call
• Network Information System (NIS)
• Lightweight Directory Access Protocol (LDAP).
function
return
resolver
code
The specific type of name service used by specific
host is depends on configuration, provided by administrator.
The user application, independently on specific name service
configuration, obtains this information using Resolver –
standard functionality, providing interface to name service.
UDP
request
UDP
reply
local
name
server
other
name
server
resolver
configuration
files
The standard API provides the following conversion methods:
• Hosts information
gethostbyaddr(),
gethostbyname()
• Networks information
getnetbyaddr(),
getnetbyname()
• Protocols information
getprotobyname(), getprotobynumber()
• Services information
getservbyname(), getservbyport()
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-39
Host Information Utilities
#include <netdb.h>
struct hostent {
char *h_name;
/* official (canonical) name of host */
char **h_aliases; /* pointer to array of pointers to alias names */
int
h_addrtype; /* host address type: AF_INET */
int
h_length;
/* length of address: 4 */
char **h_addr_list; /* ptr to array of ptrs with IPv4 addrs */
};
struct hostent *gethostbyname (const char *hostname);
struct hostent *gethostbyaddr (const char *addr, socklen_t len, int family);
• Both these methods provide host information by Domain Name or IP Address
• Return value is pointer to hostent structure on success, NULL on failure (h_errno specifies the error)
hostent { }
h_name
Example. Extract IP Address by Host Name
official hostname \0
h_aliases
h_addrtype AF_INET
h_length
alias #1 \0
4
h_addr_list
alias #2
\0
NULL
in_addr{ }
IP addr #1
in_addr{ }
IP addr #2
NULL
in_addr{ }
IP addr #3
h_length=4
© D.Zinchin [[email protected]]
…
char [ ] hostName=“www.google.com”;
sockaddr_in servAddr;
struct hostent* pEntry;
…
pEntry=gethostbyname(hostName);
if (pEntry != null)
{
servAddr.sin_family = pEntry -> h_addrtype;
bcopy(pEntry -> h_addr_list,
(char*) &servAddr.sin_addr,
pEntry ->h_length);
}
…
Introduction to Network Programming in UNIX & LINUX
3-40
Example of gethostbyname() usage.
#include <stdio.h>
#include <netdb.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
int
main(int argc, char **argv)
{
char *ptr, **pptr;
char str [INET_ADDRSTRLEN];
struct hostent *hptr;
while (--argc > 0) {
ptr = *++argv;
if ( (hptr = gethostbyname (ptr) ) == NULL) {
fprintf (stderr, "gethostbyname error for host: %s, h_errno= %d\n",
ptr, h_errno );
continue;
}
printf ("official hostname: %s\n", hptr->h_name);
for (pptr = hptr->h_aliases; *pptr != NULL; pptr++)
printf ("\talias: %s\n", *pptr);
switch (hptr->h_addrtype) {
case AF_INET:
pptr = hptr->h_addr_list;
for ( ; *pptr != NULL; pptr++)
printf ("\taddress: %s\n",
inet_ntoa (*pptr));
break;
default:
perror ("unknown address type");
break;
}
}
exit(0);
}
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-41
Service Information Utilities
#include <netdb.h>
struct servent {
char *s_name;
/* official service name */
char **s_aliases; /* alias list */
int
s_port;
/* port number, network-byte order */
char *s_proto;
/* protocol to use */
};
struct servent *getservbyname (const char *servname, const char *protoname);
struct servent *getservbyport (int port, const char *protoname);
• Both these methods provide service information by service name or port number
• Some Internet services are provided using either TCP or UDP.
In this case parameter protoname could specify specific protocol of interest (“tcp”, “udp”)
• Return value is pointer to servent structure, NULL on failure.
Example. Extract Port Number by Service Name
…
struct servent* pEntry;
int tftpPort;
pEntry=getservbyname(“tftp”, “udp”);
if (pEntry != null)
{
tftpPort = pEntry -> s_port;
}
…
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-42
Unix Domain Sockets
The Unix Domain protocols are not an actual protocol suite,
but a way of performing Inter-Process Communication
Example. Create and Bind
Unix Domain Stream Socket.
on a single host using socket API.
The Unix Domain socket binding is provided to file path.
To connect() to Unix Domain socket, the process would
have the same permissions as required to open() the file.
Unix Domain Address Structure
#include <sys/un.h>
struct sockaddr_un {
sa_family_t
sun_family;
/* AF_LOCAL (AF_UNIX) */
char
sun_path[108]; /* pathname\0*/
};
#include <socket.h>
#include <sys/un.h>
…
int sockFd;
struct sockaddr_un servAddr;
char [ ] filePath=“/tmp/anyname”;
/* create Unix Domain stream socket */
sockFd = socket (AF_LOCAL, SOCK_STREAM, 0);
/* file to be used as address,
would not exist before binding */
unlink(filePath);
/* bind the socket */
bzero((char*)&servAddr, sizeof(servAddr));
servAddr.sun_family = AF_LOCAL;
strncpy(servAddr.sun_path,
filePath, strlen(filePath));
bind(sockFd, &servAddr, sizeof(servAddr);
…
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-43
Unix Domain Socket Features
Socket-Based Pipe
#include <sys/socket.h>
int socketpair( int family,
int type,
int protocol,
int sockfd[2] );
/* AF_LOCAL */
/*SOCK_STREAM / SOCK_DGRAM */
/* 0 */
/* (output) array of 2 descriptors */
• Creates the pair of already connected unnamed sockets.
• If SOCK_STREAM protocol is used, the full-duplex stream pipe is created.
• On success returns 0 and fills sockfd[0] and sockfd[1] .
• On error returns -1, errno specifies the error.
Unix Domain & Ancillary Data
struct msghdr {
• Ancillary Data is control information passed by means
void
* msg_name;
/* address */
of sendmsg(), recvmsg() system calls.
socklen_t
msg_namelen;
/* size of address */
struct iovec * msg_iov;
/* scatter/gather array */
• The fields msg_control and msg_controllen of the
size_t
msg_iovlen;
/* msg_iov array length */
void
* msg_control;
/* ancillary data*/
structure msghdr are used for Ancillary Data.
socklen_t msg_controllen; /* ancillary data length */
• The following types of Ancillary Data are used with
};
Unix Domain sockets:
Passing Descriptors:
• Sender opens resource (file) and “sends” the descriptor, allocated in its process.
• Receiver “receives” newly allocated descriptor, pointing to the same resource.
Passing Credentials:
• Sender “sends” standard structure
• Receiver “receives” it filled with sender credentials: PID, UID, EUID, GID, etc.
• To see the specific data structures for handling of Ancillary Data on specific Linux/Unix system,
see man pages for system calls recvmsg and sendmsg.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-44
Distributed Application
Distributed Application is an application made up of distinct components
that are physically located on different computer systems, connected by a network.
The components of Distributed Application are distributed across multiple computers on a network,
but seem to be running on the same user's computer.
Host 1
Host
Non-Distributed
Application
Component A
Host 2
Distributed
Application Client
Distribution
Component A
Network
Request
Stub of
Component B
Component B
Distributed
Application Server
Stub of
Component A
Component B
Distributed Application Service is call to remote component with passing of input and output parameters
Distributed Application Server is responsible to accept Client requests and to provide the call Service
Distributed Application Client sends requests to the Server and accepts remote call results.
Distributed Application Design Tasks:
Examples of Distributed Application Technologies:
• Discover the desired Server Host
• Sun RPC (Remote Procedure Call)
• Discover and connect to the desired Server Process
• CORBA (Common Object Request Broker Architecture)
• Serialization / Deserialization of input/output
parameters passed over network.
• Java RMI (Remote Method Invocation)
© D.Zinchin [[email protected]]
• Microsoft DCOM (Distributed Common Object Model)
Introduction to Network Programming in UNIX & LINUX
3-45
Sun RPC
Sun RPC (Remote Procedure Call) is a powerful technique for constructing distributed, client-server based
applications. It allows the execution of individual routines on remote computers across a network.
RPC isolates the application from the physical and logical elements of the data communications, and hides the call
to subroutine on remote server under the “traditional” local function call interface.
Non-Distributed
Application
RPC Client
Procedure A
Procedure A
Client Stub of Procedure B
RPC
Conversion
Procedure B
RPC Server
Server Stub of Procedure B
Server
Communication
Client Interface
Server Interface
Client
Communication
Procedure B
To develop an RPC application the following steps are needed:
• Specify the protocol for client server communication
• Develop the client program
• Develop the server program
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-46
RPC Application Development
Before Distribution
After Distribution
program main ( ) calls
local procedures
client main( ) calls
client stub procedures
main.c
lin
k
P_main.c
calls
non-distributed
program
calls
lin
k
prog
RPC specification
XDR file
P.x
client program
client
P_clnt.c
client stub
rpcge
n
P.h
common
include file
RPC
run-time
library
server stub
calls
proc.c
local procedures
called from program main()
P_svc.c
P_proc.c
server procedures
called by server stub
lin
k
server
server program
• The remote procedure in RPC is identified by triplet:
Program ID – unique hexadecimal id of Server Program,
Version
– version id of Server Program
Procedure ID – serial number of procedure under the specific Server Program
• The client-server communication interface is defined using XDR (eXternal Data Representation) protocol
The XDR defines the set of serializable data types and syntax for Program ID, Version, Procedure ID definition.
• Standard RPC compiler rpcgen compiles XDR interface definitions and builds C code for interface parts of
Server Stub and Client Stub, and also H file with common interface constants.
• RPC run-time library is linked during build of Server and Client application and provides Client and Server
communication functionalities.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-47
RPC Simple Example
main.c – before distribution
int printmessage(char* msg);
void main()
{
char [ ] msg = “test”;
int result;
result = printmessage (msg);
}
P_main.c – after distribution
#include <rpc/rpc.h>
#include “P.h”
void main()
{
char [ ] msg = “test”;
int * pResult;
CLIENT* pClnt ; /* Connection Handle */
pClnt = clnt_create(“MyHost”, MESSAGEPROG, MESAGEVERS, “udp”);
pResult = printmessage_1( &msg, pClnt);
clnt_destroy(pClnt);
}
P.x - the XDR definition of interface
P.h – generated by rpcgen
program MESSAGEPROG {
version MESSAGEVERS {
int PRINTMESSAGE(string) = 1;
} = 1;
} = 0x20000099;
#define MESSAGEPROG 0x20000099
#define MESSAGEVERS 1
#define PRINTMESSAGE 1
int * printmessage_1(char**, CLIENT*);
int * printmessage_1_svc(char **, struct svc_req *);
rpcgen
P_proc.c – after distribution
proc.c – before distribution
int printmessage(char* msg)
{
/* print msg */
return 0;
}
© D.Zinchin [[email protected]]
#include <rpc/rpc.h>
#include “P.h”
int * printmessage_1_svc(char ** args, struct svc_req *);
{
static int result;
/*must be static to return by pointer */
char * msg = *args; /*extract the argument passed by pointer */
/* print msg */
return &result;
/* return result by pointer */
}
Introduction to Network Programming in UNIX & LINUX
3-48
Connection to RPC Server
Server Host
Port Mapper
(Port 111)
Client Host
Program ID,
Version
RPC Client
Client Stub
Registration Map
RPC Server
Port Number
Program ID,
Version
Procedure ID,
parameters
RPC Server
Port
Register on startup
RPC Server
(Ephemeral Port)
Result Data
Server Stub
Procedure
• Port Mapper is standard daemon, listening on port 111 UDP (TCP)
and handling map (Program ID, Version) -> (Port Number)
• Each RPC Server starts on ephemeral port and registers with Port Mapper
• Each RPC Client calls Port Mapper on specific host to accept the port of target RPC Server
Than RPC Client calls the RPC Server with request, containing Procedure ID and parameters
Note:
RPC does not provide automatic discovery of Server Host.
To accept the RPC service, the RPC Client is responsible to know the name of the target Server Host.,
where the proper RPC Server is running.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-49
NFS: Network File System
NFS provides transparent file access for clients to files and filesystems on a remote server.
NFS accesses only the portions of a file that a process references, and a goal of NFS is to make this access
transparent (user process accesses local and remote filesystems and files information in the same way).
When user process accesses remote filesystem or file, the local NFS Client sends a request to remote NFS Server,
which performs the requested operation and provides the requested information in its reply.
Before the local NFS Client can access files from remote NFS Server’s filesystem, this remote filesystem must be
mounted to the local mount point at the NFS Client’s host via NFS Mount Protocol.
/
/
/mydir123
/var
file0
NFS mount
file1
/dir123
file1
To reference particular filesystem or file on the remote NFS Server, the NFS Client obtains a File Handle, an opaque
object generated by NFS Server. To perform any following operation on remote file or filesystem, the Client sends
back to NFS Server the corresponded File Handle.
NFS Client calls are performed by the client kernel, on behalf of client user processes.
NFS Servers, for efficiency, are implemented within the server kernel.
NFS implementation is based on RPC.
NFS was originally written to use RPC over UDP. Newer implementations, however, also support RPC over TCP.
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-50
NFS
A. Mount remote file system
#mkdir /mydir123
#mount hostA:/dir123 /mydir123
(Remote file system /dir123 from
host hostA is mounted to
mount point directory /mydir123
on local host)
A
user process
mount
command
1.register
at start
port
mapper
mountd
daemon
111/udp,tcp
6. mount
system
call
2. get port # RPC request
NFS
Client
NFS
Server
3. RPC reply with port #
4. mount RPC request
client kernel
B. Access remote file via NFS
#cat /mydir123/file1
(Transparent access from local
host to file1 at mounted file
system)
5. RPC reply with file handle
of remote filesystem
server kernel
B
local file
processing
user
process
NFS file
processing
local
file access
NFS
Client
NFS
Server
local
file access
2049/udp,tcp
RPC requests
client kernel
RPC replies
local
disk
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
server kernel
local
disk
3-51
Network Management
Network Management is the set of activities, methods, procedures, and tools that related to the
operation, administration, maintenance, and provisioning of network systems.
Operation means keeping the network up and running, including the monitoring of possible problems.
Administration means keeping track of network resources and their assignments.
Maintenance means performing repairs and upgrades of software and hardware components of network system.
Provisioning means configuring of network system resources to support a given service.
Business Management
Service Management
Network Management
Element Management
Network Element
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-52
FCAPS—ISO Telecommunications Management Network Model
FCAPS is abbreviation of: Fault, Configuration, Accounting, Performance, Security- the areas of Network Management.
Example: FCAPS of Telecommunication System
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-53
Network Management in a “nutshell”
Managed
Agent
Manager
Object
Request
Response
Unsolicited Notification
address
space
MIB
Network
Managemen
t
Protocols
partial
data
address
space
MIB
full
data
MIB (Management Information Base)
The database of information maintained by the Agent, that Manager can query or set
Network Management Protocol
The protocol between Manager and Agent, describing:
-the common rules of addressing
-the basic data types
-the format of requests, responses and notifications
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-54
SNMP – Simple Network Management Protocol
The SNMP is Application Layer (OSI Model, layer 7) UDP-based network protocol.
The SNMP message is called Portable Data Unit (PDU):
IP Packet
UDP Datagram
SNMP PDU
SNMP Common Header
SNMP Get/Set Header
SNMP Data
SNMPv1 provides 5 PDU types. SNMPv2, SNMPv3 have 2 more PDU types:
0. GetRequest
- Retrieve the value of a variable or list of variables.
1. SetRequest
- Change the value of a variable or list of variables.
2. GetNextRequest
- Retrieve the value the lexicographically next variable in the MIB. (Walk through MIB)
3. GetBulkRequest
- Optimized version of GetNextRequest (SNMPv2)
4. Response
- Returns variable bindings and acknowledgement for all requests.
5. Trap
- Asynchronous notification from agent to manager.
6. InformRequest
- Acknowledged asynchronous notification from manager to manager (SNMPv2).
© D.Zinchin [[email protected]]
Introduction to Network Programming in UNIX & LINUX
3-55
SNMP MIB and OIDs
SNMP itself does not define which information (which variables) a managed system should offer.
The available information is defined by Management Information Bases (MIBs). MIBs describe the structure of the
management data of a device subsystem.
SNMP MIBs use a hierarchical namespace containing Object Identifiers (OID). Each OID identifies a variable that
can be read or set via SNMP. The OID Namespace is hierarchical tree.
The International Telecommunication Union (ITU) Standardization organization maintains the top-level OIDs and
delegates responsibility to define OID sub-trees to other organizations.
SNMP MIBs are described by means of language ASN.1 (Abstract Syntax Notation 1),
containing definitions of OID aliases, Object Identifiers and data types.
OID Namespace tree
Fragment of MIB definition in ASN.1 language
CigASN1Module { joint-iso-ccitt(2) country(16) us(840) organization(1)
motorola(113728) gss(1) cig(1) common(3) asn1Module(2) 0}
root
ccit(0)
-- alias
cigcom OBJECT IDENTIFIER ::=
{ joint-iso-ccitt(2) country(16) us(840) organization(1) motorola(113728) gss(1)
cig(1) common(3) }
iso(1)
joint-iso-ccitt(2)
org(3)
country(16)
dod(6)
us(840)
-- object OIDs
cigModule OBJECT IDENTIFIER ::= {cigcom modules(0)}
organization(1)
cigAttribute OBJECT IDENTIFIER ::= {cigcom attributes(1)}
motorola(113728)
cigGroupAttribute OBJECT IDENTIFIER ::= {cigcom groupAttributes(2)}
internet(1)
gss(1)
-- data types
CigDisplayRadius ::= REAL
cig(1)
common(3)
2.16.840.1.113728.1.1.3
© D.Zinchin [[email protected]]
CigSiteConfiguration ::= ENUMERATED {
omni (0),
sixty (1),
onetwenty (2),
omnisixty (3) }
CigSiteConfigList ::= SEQUENCE OF CigSiteConfiguration
END
Introduction to Network Programming in UNIX & LINUX
3-56