By default, Ethernet has a variable frame size up to 1,500 bytes. The Maximum Transmission Unit (MTU) defines this upper bound and defaults to the 1,500 byte limitation. If data is sent across the network, the data is broken into pieces no larger than the MTU frame size. Right away, we can see a problem with the MTU limitation for Oracle RAC's Cluster Interconnect. Many Oracle databases are configured with a database block size of 8KB. If one block needs to be transferred across the private network for Cache Fusion purposes, the 8KB block will be broken into six frames. Even with a 2KB block size, the block will be broken into two frames. Those pieces need to be assembled back together when arriving at the destination. To make matters worse, the maximum amount of data Oracle will attempt to transmit is defined by multiplying thedb_block_size initialization parameter by the db_file_multiblock_read_count parameter. A block size of 8KB taken 128 blocks at a time leads to 1 megabyte of data needing to be transferred.
Jumbo Frames allows a MTU value of up to 9,000 bytes. Unfortunately, Jumbo Frames is not allowed in all platforms. Not only does the OS need to support Jumbo Frames, but the network cards in the servers and the network switch behind the private network need to support Jumbo Frames. Many of today's NICs and switches do support Jumbo Frames, but Jumbo Frames is not an IEEE standard, and as such, there may be different implementations that may not all work well together. Not all configurations will support the larger MTU size. When configuring the network pieces, it is important to remember that the smallest MTU of any component in the route is the maximum MTU from point A to B. You can have the network cards configured to support 9000 bytes, but if the switch is configured for a MTU of 1,500 bytes, then Jumbo Frames won't be used. Infiniband supports Jumbo Frames up to 65,000 bytes.
It is out of scope of this book to provide direction on how to enable Jumbo Frames in the network switch. You should talk with their network administrator, who may, in turn, have to consult the switch vendor's documentation for more details. On the OS network interface side, it is easy to configure the larger frame size. The following examples are from Oracle Linux 6. First, we need to determine which device is used for the Cluster Interconnect.
[root@host01 ~]$ oifcfg getif
eth0 192.168.56.0 global public
eth1 192.168.10.0 global cluster_interconnect
The eth1 device supports the private network. Now we configure the larger MTU size.
[root@host01 ~]# ifconfig eth1 mtu 9000
[root@host01 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
In the ifcfg-eth1 file, one line is added that says ?MTU=9000? so that the setting persists when the server is restarted.
The interface is verified to ensure the larger MTU is used.
[root@host01 ~]# ifconfig ?a
eth0 Link encap:Ethernet HWaddr 08:00:27:98:EA:FE
inet addr:192.168.56.71 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe98:eafe/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3749 errors:0 dropped:0 overruns:0 frame:0
TX packets:3590 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:743396 (725.9 KiB) TX bytes:623620 (609.0 KiB)
eth1 Link encap:Ethernet HWaddr 08:00:27:54:73:8F
inet addr:192.168.10.1 Bcast:192.168.10.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe54:738f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:268585 errors:0 dropped:0 overruns:0 frame:0
TX packets:106426 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1699904418 (1.5 GiB) TX bytes:77571961 (73.9 MiB)
Notice that device eth1 has the larger MTU setting. The traceroute utility can be used to verify the largest possible packet size.
[root@host01 ~]# traceroute host02-priv ?mtu
traceroute to host02-priv (192.168.10.2), 30 hops max, 9000 byte packets
1 host02-priv.localdomain (192.168.10.2) 0.154 ms F=9000 0.231 ms 0.183 ms
Next, a 9,000 byte packet is sent along the route. The ?F option ensure the packet is not broken into smaller frames.
[root@host01 ~]# traceroute -F host02-priv 9000
traceroute to host02-priv (192.168.10.2), 30 hops max, 9000 byte packets
1 host02-priv.localdomain (192.168.10.2) 0.495 ms 0.261 ms 0.141 ms
The route worked successfully.
Now a packet one byte larger is sent along the route.
[root@host01 ~]# traceroute -F host02-priv 9001
too big packetlen 9001 specified
The error from the traceroute utility shows the packet of 9,001 bytes is too big. These steps verify that Jumbo Frames is working. Let's verify that the change improved the usable bandwidth on the cluster interconnect. To do that, the iperf utility is used. The iperfutility can force a specific packet length with the ?l parameter. The public interface is not configured for Jumbo Frames and no applications are connecting to the nodes so the public network can be used as a baseline.
[root@host02 ~]# iperf -c host01 -l 9000
------------------------------------------------------------
Client connecting to host01, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.56.72 port 18222 connected with 192.168.56.71 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 923 MBytes 774 Mbits/sec
The same test is repeated for the private network with Jumbo Frames enabled.
[root@host02 ~]# iperf -c host01-priv -l 9000
------------------------------------------------------------
Client connecting to host01-priv, TCP port 5001
TCP window size: 96.1 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.10.2 port 40817 connected with 192.168.10.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.28 GBytes 1.10 Gbits/sec
Here we see that the bandwidth increased from 774 Mbs/sec to 1.10 Gbs/sec, a 42% increase! For the same 10 second interval, the number of bytes transferred increased from 923 megabytes to 1.28 gigabytes, a 65% increase!
If the Oracle RAC systems are using Ethernet (Gig-E or 10Gig-E) for the Cluster Interconnect, then the recommendation is to leverage Jumbo Frames for the private network. It is less common to employ Jumbo Frames for the public network interfaces. Jumbo Frames requires that all network components from end to end support the larger MTU sizes. In some cases, it may be tricky to diagnose issues where Jumbo Frames will not work in the system, but even then, the effort is well worth the cost.