Linux Network Troubleshooting: Difference between revisions
No edit summary |
|||
Line 11: | Line 11: | ||
* TCP zero windows sent when application receive buffer is empty https://access.redhat.com/solutions/2480661 | * TCP zero windows sent when application receive buffer is empty https://access.redhat.com/solutions/2480661 | ||
=Packet Capture and Analysis= | ====Packet Capture and Analysis==== | ||
tcpdump -s 0 -i eno16780032 -w /tmp/$HOSTNAME.pcap | tcpdump -s 0 -i eno16780032 -w /tmp/$HOSTNAME.pcap | ||
=Network Monitoring= | ====Network Monitoring==== | ||
On each node, run the monitor.sh script: https://access.redhat.com/articles/1311173. This script will record OS network stats at a set interval and it will allow monitoring changes over time and correlate these changes with packet capture data. | On each node, run the monitor.sh script: https://access.redhat.com/articles/1311173. This script will record OS network stats at a set interval and it will allow monitoring changes over time and correlate these changes with packet capture data. | ||
=Network Driver Error Messages= | ====Network Driver Error Messages==== | ||
grep vmxnet3 sos_commands/kernel/dmesg | grep vmxnet3 sos_commands/kernel/dmesg | ||
Line 33: | Line 33: | ||
[ 8.187854] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps | [ 8.187854] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps | ||
=Kernel Network Paramenters= | ====Kernel Network Paramenters==== | ||
cat etc/sysctl.conf | cat etc/sysctl.conf | ||
Line 66: | Line 66: | ||
Also see {{Internal|Kernel Runtime Configuration|Kernel Runtime Configuration}} | Also see {{Internal|Kernel Runtime Configuration|Kernel Runtime Configuration}} | ||
=Inspect packet loss= | ====Inspect packet loss==== | ||
ethtool -S | ethtool -S | ||
Line 82: | Line 82: | ||
Rx Queue#: 3 | Rx Queue#: 3 | ||
=RX Drops= | ====RX Drops==== | ||
* ifconfig packet drops reported after upgrading to RHEL7 https://access.redhat.com/solutions/2073223 | * ifconfig packet drops reported after upgrading to RHEL7 https://access.redhat.com/solutions/2073223 | ||
=proc/net/dev= | ====proc/net/dev==== | ||
cat proc/net/dev | cat proc/net/dev | ||
Line 94: | Line 94: | ||
eno16780032: 216193874050 702019404 0 2658277 0 0 0 87265004 195315249141 549883330 0 0 0 0 0 0 | eno16780032: 216193874050 702019404 0 2658277 0 0 0 87265004 195315249141 549883330 0 0 0 0 0 0 | ||
=IP and TCP Diagnostics= | ====IP and TCP Diagnostics==== | ||
Review the IP, TCP OS protocol handler stats. | Review the IP, TCP OS protocol handler stats. |
Revision as of 13:45, 31 July 2017
Internal
Overview
This page needs to be reviewed and re-organized.
Organizatorium
- TCP zero windows sent when application receive buffer is empty https://access.redhat.com/solutions/2480661
Packet Capture and Analysis
tcpdump -s 0 -i eno16780032 -w /tmp/$HOSTNAME.pcap
Network Monitoring
On each node, run the monitor.sh script: https://access.redhat.com/articles/1311173. This script will record OS network stats at a set interval and it will allow monitoring changes over time and correlate these changes with packet capture data.
Network Driver Error Messages
grep vmxnet3 sos_commands/kernel/dmesg [ 5.731844] VMware vmxnet3 virtual NIC driver - version 1.1.30.0-k-NAPI [ 5.731858] vmxnet3 0000:0b:00.0: # of Tx queues : 4, # of Rx queues : 4 [ 5.737730] vmxnet3 0000:0b:00.0: irq 72 for MSI/MSI-X [ 5.737786] vmxnet3 0000:0b:00.0: irq 73 for MSI/MSI-X [ 5.737860] vmxnet3 0000:0b:00.0: irq 74 for MSI/MSI-X [ 5.737891] vmxnet3 0000:0b:00.0: irq 75 for MSI/MSI-X [ 5.737916] vmxnet3 0000:0b:00.0: irq 76 for MSI/MSI-X [ 5.738367] vmxnet3 0000:0b:00.0 eth0: NIC Link is Up 10000 Mbps [ 8.186233] vmxnet3 0000:0b:00.0 eno16780032: intr type 3, mode 0, 5 vectors allocated [ 8.187854] vmxnet3 0000:0b:00.0 eno16780032: NIC Link is Up 10000 Mbps
Kernel Network Paramenters
cat etc/sysctl.conf
net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.accept_redirects=0 net.ipv4.conf.default.accept_redirects=0 net.ipv4.conf.all.log_martians=1 net.ipv4.conf.default.log_martians=1 net.core.wmem_max = 12582912 net.core.rmem_max = 26214400 net.ipv4.tcp_rmem = 10240 87380 26214400 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_no_metrics_save = 1 net.core.netdev_max_backlog = 5000 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv6.conf.all.disable_ipv6 = 1 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 fs.suid_dumpable = 0
Also see
Inspect packet loss
ethtool -S
awk '($NF !~ "^0$") {print}' sos_commands/networking/ethtool_-S_eno16780032 | egrep -v "[u,m,b]cast|LRO pkts rx|[LR,TS]O byte(s)?|[LR,TS]O pkts|pkts linearized" NIC statistics: Tx Queue#: 1 Tx Queue#: 2 Tx Queue#: 3 Rx Queue#: 1 pkts rx OOB: 45 drv dropped rx total: 29 err: 29 Rx Queue#: 2 Rx Queue#: 3
RX Drops
- ifconfig packet drops reported after upgrading to RHEL7 https://access.redhat.com/solutions/2073223
proc/net/dev
cat proc/net/dev
Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed lo: 727497676 2498032 0 0 0 0 0 0 727497676 2498032 0 0 0 0 0 0 eno16780032: 216193874050 702019404 0 2658277 0 0 0 87265004 195315249141 549883330 0 0 0 0 0 0
IP and TCP Diagnostics
Review the IP, TCP OS protocol handler stats.
Check whether IP fragmentation is occurring. This is normal behaviour when an application sends a datagram which exceeds the MTU (1500).
Check number of failures due to fragment loss. TCP divides data into MSS-sized segments which should not require IP fragmentation so fragmentation is likely caused by UDP traffic.
netstat -s
Ip: 702329554 total packets received 0 forwarded 0 incoming packets discarded 699283912 incoming packets delivered 550941269 requests sent out 16 dropped because of missing route 5 fragments dropped after timeout 4372810 reassemblies required 1336065 packets reassembled ok 7 packet reassembles failed 618990 fragments received ok 1856970 fragments created
Check the rate of TCP retransmissions. A low rate is a sign that the network infrastructure is healthy. Problems with packet loss or high latency in the environment for any reason, reflects in a high rate of TCP retransmissions.
Check for socket buffer overflows.
Check for listen queue overflows.
netstat -s | egrep "pruned|collapsed|overflowed"
542 packets pruned from receive queue because of socket buffer overrun 4 packets pruned from receive queue 1197 packets collapsed in receive queue due to low socket buffer