Currently i setup a new Kubernetes cluster with Rancher2. On one node i got connection problems from time to time. I could simple test it with:
curl -v https://registry-1.docker.io/v2/jenkins/jenkins/manifests/lts
This always shows the ip it tries to connect (so DNS is working), but the connection sometimes is not working!
To retrieve the used route i used:
ip route get to 184.108.40.206
But it looked fine. Doing a traceroute also always worked:
traceroute -n -I 220.127.116.11
So i started tcpdump in order to debug the connection:
ip link <- retrieve the ethernet device used (here: ens3) tcpdump -i ens3 -w tcpdump.out -s 1520 port 443
For the first analysation i used tcptrace:
The output looks like:
1 arg remaining, starting with 'tcpdump.out' Ostermann's tcptrace -- version 6.6.7 -- Thu Nov 4, 2004 72 packets seen, 72 TCP packets traced elapsed wallclock time: 0:00:00.026699, 2696 pkts/sec analyzed trace file elapsed time: 0:00:21.058367 TCP connection info: 1: xxxx.xxxx.com:46356 - xxx1.xxxx.com.com:443 (a2b) 15> 13< 2: xxxx.xxxx.com:25526 - xxx1.xxxx.com.com:443 (c2d) 8> 4< 3: xxxx.xxxx.com:17942 - ec2-52-22-181-254.compute-1.amazonaws.com:443 (e2f) 3> 0< (unidirectional) 4: xxxx.xxxx.com:64676 - lb-192-30-253-112-iad.github.com:443 (g2h) 3> 0< (reset) (unidirectional) 5: xxxx.xxxx.com:19286 - ec2-52-54-216-153.compute-1.amazonaws.com:443 (i2j) 3> 0< (unidirectional) 6: xxxx.xxxx.com:42942 - ec2-34-200-28-105.compute-1.amazonaws.com:443 (k2l) 12> 9< (complete) (reset) 7: xxxx.xxxx.com:10764 - xxx1.xxxx.com.com:443 (m2n) 1> 1<
Request number 6 was a working one and request number 5 was the one that did not get a correct connection.
To get more detailed info about a request you can use:
tcptrace -o5 tcpdump.out
The output looked like:
1 arg remaining, starting with 'tcpdump.out' Ostermann's tcptrace -- version 6.6.7 -- Thu Nov 4, 2004 171 packets seen, 171 TCP packets traced elapsed wallclock time: 0:00:00.020314, 8417 pkts/sec analyzed trace file elapsed time: 0:00:50.000579 TCP connection info:
For a more detailed graphical analysis you can use wireshark and simply load the generated file.
By looking through the lines i found out the problem was the firewall. The ports i accepted acks on there limited from 32.768-65.535 but it used ports between 10.000 and 20.000 to make the connection.
So i changed the minimum port to 5.000, after upgrading the firewall everything worked fine.