Linux Network Troubleshooting
Internal
Overview
This page needs to be reviewed and re-organized.
Organizatorium
- TCP zero windows sent when application receive buffer is empty https://access.redhat.com/solutions/2480661
Packet Capture and Analysis
tcpdump -s 0 -i eno16780032 -w /tmp/$HOSTNAME.pcap
Network Monitoring
On each node, run the monitor.sh script: https://access.redhat.com/articles/1311173. This script will record OS network stats at a set interval and it will allow monitoring changes over time and correlate these changes with packet capture data.
Network Driver Error Messages
grep vmxnet3 sos_commands/kernel/dmesg [ 5.731844] VMware vmxnet3 virtual NIC driver - version 1.1.30.0-k-NAPI [ 5.731858] vmxnet3 0000:0b:00.0: # of Tx queues : 4, # of Rx queues : 4 [ 5.737730] vmxnet3 0000:0b:00.0: irq 72 for MSI/MSI-X [ 5.737786] vmxnet3 0000:0b:00.0: irq 73 for MSI/MSI-X [ 5.737860] vmxnet3 0000:0b:00.0: irq 74 for MSI/MSI-X [ 5.737891] vmxnet3 0000:0b:00.0: irq 75 for MSI/MSI-X [ 5.737916] vmxnet3 0000:0b:00.0: irq 76 for MSI/MSI-X [ 5.738367] vmxnet3 0000:0b:00.0 eth0: NIC Link is Up 10000 Mbps [ 8.186233] vmxnet3 0000:0b:00.0 eno16780032: intr type 3, mode 0, 5 vectors allocated [ 8.187854] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps
Kernel Network Paramenters
cat etc/sysctl.conf
net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.accept_redirects=0 net.ipv4.conf.default.accept_redirects=0 net.ipv4.conf.all.log_martians=1 net.ipv4.conf.default.log_martians=1 net.core.wmem_max = 12582912 net.core.rmem_max = 26214400 net.ipv4.tcp_rmem = 10240 87380 26214400 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_no_metrics_save = 1 net.core.netdev_max_backlog = 5000 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv6.conf.all.disable_ipv6 = 1 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 fs.suid_dumpable = 0
Also see
Inspect packet loss
ethtool -S
awk '($NF !~ "^0$") {print}' sos_commands/networking/ethtool_-S_eno16780032 | egrep -v "[u,m,b]cast|LRO pkts rx|[LR,TS]O byte(s)?|[LR,TS]O pkts|pkts linearized" NIC statistics: Tx Queue#: 1 Tx Queue#: 2 Tx Queue#: 3 Rx Queue#: 1 pkts rx OOB: 45 drv dropped rx total: 29 err: 29 Rx Queue#: 2 Rx Queue#: 3
RX Drops
- ifconfig packet drops reported after upgrading to RHEL7 https://access.redhat.com/solutions/2073223
proc/net/dev
cat proc/net/dev
Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed lo: 727497676 2498032 0 0 0 0 0 0 727497676 2498032 0 0 0 0 0 0 eno16780032: 216193874050 702019404 0 2658277 0 0 0 87265004 195315249141 549883330 0 0 0 0 0 0
IP and TCP Diagnostics
Review the IP, TCP OS protocol handler stats.
Check whether IP fragmentation is occurring. This is normal behaviour when an application sends a datagram which exceeds the MTU (1500).
Check number of failures due to fragment loss. TCP divides data into MSS-sized segments which should not require IP fragmentation so fragmentation is likely caused by UDP traffic.
netstat -s
Ip: 702329554 total packets received 0 forwarded 0 incoming packets discarded 699283912 incoming packets delivered 550941269 requests sent out 16 dropped because of missing route 5 fragments dropped after timeout 4372810 reassemblies required 1336065 packets reassembled ok 7 packet reassembles failed 618990 fragments received ok 1856970 fragments created
Check the rate of TCP retransmissions. A low rate is a sign that the network infrastructure is healthy. Problems with packet loss or high latency in the environment for any reason, reflects in a high rate of TCP retransmissions.
Check for socket buffer overflows.
Check for listen queue overflows.
netstat -s | egrep "pruned|collapsed|overflowed"
542 packets pruned from receive queue because of socket buffer overrun 4 packets pruned from receive queue 1197 packets collapsed in receive queue due to low socket buffer