Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum Outline    UDP Layer Architecture Receive Path Send Path.

Download Report

Transcript Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum Outline    UDP Layer Architecture Receive Path Send Path.

Transport Layer: UDP
COMS W6998
Spring 2010
Erich Nahum
Outline



UDP Layer Architecture
Receive Path
Send Path
Recall what UDP Does


UDP packet format
0
3
7

15
31

Source Port (16)
Destination Port (16)

Length (16)
Checksum (16)

Data



RFC 768
IP Proto 17
Connectionless
Unreliable
Datagram
Supports multicast
Optional
checksum
Nice and simple.
Yet still 2187 lines
of code!
UDP Header
The udp header: include/linux/udp.h
struct udphdr {
__be16 source;
__be16 dest;
__be16 len;
__sum16 check;
};
Sidebar: UDP-Lite



UDP packet format
0
3
7
15
31

Source Port (16)
Destination Port (16)
Checksum Coverage (16)
Checksum (16)
Data




RFC 3828
Very similar to UDP
Difference is checksum
covers part of packet
rather than all
Checksum coverage
says how many bytes
(starting from header)
are covered by
checksum
Idea is certain apps
would rather have a
damaged packet than
none
Examples are audio,
video codecs
IP Protocol 136
Linux UDP-Lite
implementation shares
most code with UDP
Sources of UDP Packets
1.
2.
Packets arrive on an interface and are
passed to the udp_rcv() function.
UDP packets are packed into an IP packet
and passed down to IP via
ip_append_data() and
ip_push_pending_frames()
UDP Implementation Design
Higher Layers
socket.c
sock.c
sock_sendmsg
sock_queue_rcv_skb
udp.c
udp.c
ICMP
__udp_queue_rcv_skb
icmp_send
ROUTING
udp_sendmsg
ip_route_output_flow
__udp4_lib_lookup_skb
__udp4_lib_rcv
MULTICAST
udp_rcv
udp_push_pending_frames
__udp4_lib_mcast_deliver
Ip_output.c
Ip_input.c
ip_local_deliver_finish
ip_append_data
ip_push_pending_frames
UDP Proto
struct proto udp_prot = {
.name
.owner
.close
.connect
.disconnect
.ioctl
.destroy
.setsockopt
.getsockopt
.sendmsg
.recvmsg
.sendpage
.backlog_rcv
.hash
.unhash
.get_port
.memory_allocated
.sysctl_mem
.sysctl_wmem
.sysctl_rmem
.obj_size
.slab_flags
.h.udp_table
};
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
"UDP",
THIS_MODULE,
udp_lib_close,
ip4_datagram_connect,
udp_disconnect,
udp_ioctl,
udp_destroy_sock,
udp_setsockopt,
udp_getsockopt,
udp_sendmsg,
udp_recvmsg,
udp_sendpage,
__udp_queue_rcv_skb,
udp_lib_hash,
udp_lib_unhash,
udp_v4_get_port,
&udp_memory_allocated,
sysctl_udp_mem,
&sysctl_udp_wmem_min,
&sysctl_udp_rmem_min,
sizeof(struct udp_sock),
SLAB_DESTROY_BY_RCU,
&udp_table,
udp_table
/**
*
struct udp_table - UDP table
*
*
@hash: hash table, sockets are
*
@hash2: hash table, sockets are
*
@mask: number of slots in hash
*
@log:
log2(number of slots in
*/
struct udp_table {
struct udp_hslot
*hash;
struct udp_hslot
*hash2;
unsigned int
mask;
unsigned int
log;
};
hashed on (local port)
hashed on (local port, local address)
tables, minus 1
hash table)
udp_table_init() allocates the hash tables, initializes them:
for (i = 0; i <= table->mask; i++) {
INIT_HLIST_NULLS_HEAD(&table->hash[i].head, i);
table->hash[i].count = 0;
spin_lock_init(&table->hash[i].lock);
}
Outline



UDP Layer Architecture
Receive Path
Send Path
Receiving packets in UDP

From user space, you can receive udp traffic
with three system calls:




recv() (when the socket is connected).
recvfrom()
recvmsg()
All three are handled by udp_rcv() in the
kernel.
Recall IP’s inet_protos
net_protocol
inet_protos[MAX_INET_PROTOS]
0
handler
udp_rcv()
udp_err()
err_handler
gso_send_check
gso_segment
gro_receive
gro_complete
1
net_protocol
handler
err_handler
gso_send_check
gso_segment
gro_receive
gro_complete
MAX_INET_
PROTOS
net_protocol
igmp_rcv()
Null
Receive Path: udp_rcv
Higher Layers
sock.c

sock_queue_rcv_skb
udp.c
ICMP
__udp_queue_rcv_skb
icmp_send
__udp4_lib_lookup_skb
__udp4_lib_rcv
MULTICAST
udp_rcv
Ip_input.c
ip_local_deliver_finish
__udp4_lib_mcast_deliver
Calls __udp4_lib_rcv(skb,
&udp_table,
IPPROTO_UDP);
 Function is used by both
UDP and UDP-Lite
Receive: __udp4_lib_rcv
Higher Layers
sock.c

sock_queue_rcv_skb
udp.c
ICMP
__udp_queue_rcv_skb
icmp_send



__udp4_lib_lookup_skb
__udp4_lib_rcv

MULTICAST
udp_rcv
Ip_input.c
ip_local_deliver_finish

__udp4_lib_mcast_deliver
Looks up the route table from
the skb
Checks that skb has a header
Checks that length is good
Calcs the checksum
Pulls out saddr, daddr
Checks if address is multicast
 Calls
__udp4_lib_mcast_deliver()
Receive: __udp4_lib_rcv (cont)
Higher Layers
sock.c

sock_queue_rcv_skb
udp.c
ICMP
__udp_queue_rcv_skb
icmp_send

__udp4_lib_lookup_skb
__udp4_lib_rcv
MULTICAST
udp_rcv
Ip_input.c
ip_local_deliver_finish
__udp4_lib_mcast_deliver

Looks up the socket in the
udptable
 Via __udp4_lib_lookup_skb()
 Increases refcount on the sk
(socket)
If socket is found
 Calls __udp_queue_rcv_skb()
 Decrements refcount with
sock_put(sk)
If not,
 Send ICMP_UNREACHABLE
 Drop packet.
Recv: __udp_queue_rcv_skb
Higher Layers
sock.c

sock_queue_rcv_skb
udp.c

ICMP
__udp_queue_rcv_skb
icmp_send
__udp4_lib_lookup_skb
__udp4_lib_rcv
MULTICAST
udp_rcv
Ip_input.c
ip_local_deliver_finish
__udp4_lib_mcast_deliver
Calls sock_queue_rcv_skb
Increments some statistics
Outline



IP Layer Architecture
Receive Path
Send Path
Sending packets in UDP


From user space, you can send udp traffic with three
system calls:
 send() (when the socket is connected).
 sendto()
 sendmsg()
All three are handled by udp_sendmsg() in the kernel.
 udp_sendmsg() is much simpler than the tcp parallel
method , tcp_sendmsg().
 udp_sendpage() is called when user space calls
sendfile() (to copy a file into a udp socket).


sendfile() can be used also to copy data between one file
descriptor and another.
udp_sendpage() invokes udp_sendmsg().
UDP Socket Options


For IPPROTO_UDP/SOL_UDP level, there exists a
socket option UDP_CORK
Added in Linux kernel 2.5.44.
int state=1;
setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,
sizeof(state));
for (j=1;j<1000;j++)
sendto(s,buf1,...)
state=0;
setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,
sizeof(state));
UDP_CORK (cont)




The above code fragment will call udp_sendmsg()
1000 times without actually sending anything on the
wire (in the usual case, when without setsockopt() with
UDP_CORK, 1000 packets will be sent).
Only after the second setsockopt() is called, with
UDP_CORK and state=0, one packet is sent on the
wire.
Kernel implementation: when using UDP_CORK,
udp_sendmsg() passes MSG_MORE to
ip_append_data().
UDP_CORK is not in glibc, you need to add it to your
program:
#define UDP_CORK 1
Send Path: udp_sendmsg()
Higher Layers
socket.c


Checks length, MSG_OOB
Checks if there are frames
pending





If so, pull routing info out of sk
Otherwise, look up via
ip_route_output_flow()
ROUTING
udp_sendmsg
ip_route_output_flow
Calls ip_append_data()


udp.c
If so, jump to do_append_data
Gets the address
Checks if socket is connected

sock_sendmsg
udp_push_pending_frames
Handles fragmentation
Calls
udp_push_pending_frames()
Ip_output.c
ip_append_data
ip_push_pending_frames
udp_push_pending_frames()
Higher Layers
socket.c

Checks that there is room
in the skb via skb_peek()




If not, goto out and bail
Creates UDP header
Checksums if necessary
(or partially for UDP-Lite)
Calls
ip_push_pending_frames()

sock_sendmsg
udp.c
ROUTING
udp_sendmsg
ip_route_output_flow
Combines all pending IP
fragments on the socket as
one IP datagram and
sends it out
udp_push_pending_frames
Ip_output.c
ip_append_data
ip_push_pending_frames
UDP Backup
Recall the sk_buff structure
sk_buff_head
struct sock
sk_buff
next
prev
sk
tstamp
dev
...lots..
...of..
...stuff..
transport_header
network_header
mac_header
head
data
tail
end
truesize
users
linux-2.6.31/include/linux/skbuff.h
sk_buff
net_device
Packetdata
``headroom‘‘
MAC-Header
IP-Header
UDP-Header
UDP-Data
``tailroom‘‘
dataref: 1
nr_frags
...
destructor_arg
skb_shared_info
Recall pkt_type in sk_buff

pkt_type: specifies the type of a packet






PACKET_HOST: a packet sent to the local host
PACKET_BROADCAST: a broadcast packet
PACKET_MULTICAST: a multicast packet
PACKET_OTHERHOST:a packet not destined for the
local host, but received in the promiscuous mode.
PACKET_OUTGOING: a packet leaving the host
PACKET_LOOKBACK: a packet sent by the local host
to itself.