20.03.2020 Autor Mike 0 0

Currently i setup a new Kubernetes cluster with Rancher2. On one node i got connection problems from time to time. I could simple test it with:

curl -v https://registry-1.docker.io/v2/jenkins/jenkins/manifests/lts

This always shows the ip it tries to connect (so DNS is working), but the connection sometimes is not working!

To retrieve the used route i used:

ip route get to 35.169.231.249

But it looked fine. Doing a traceroute also always worked:

traceroute -n -I 35.169.231.249

So i started tcpdump in order to debug the connection:

ip link <- retrieve the ethernet device used (here: ens3)
tcpdump -i ens3 -w tcpdump.out -s 1520 port 443

For the first analysation i used tcptrace:

tcptrace tcpdump.out

The output looks like:

1 arg remaining, starting with 'tcpdump.out'
Ostermann's tcptrace -- version 6.6.7 -- Thu Nov  4, 2004

72 packets seen, 72 TCP packets traced
elapsed wallclock time: 0:00:00.026699, 2696 pkts/sec analyzed
trace file elapsed time: 0:00:21.058367
TCP connection info:
  1: xxxx.xxxx.com:46356 - xxx1.xxxx.com.com:443 (a2b)                       15>   13<
  2: xxxx.xxxx.com:25526 - xxx1.xxxx.com.com:443 (c2d)                        8>    4<
  3: xxxx.xxxx.com:17942 - ec2-52-22-181-254.compute-1.amazonaws.com:443 (e2f)    3>    0<  (unidirectional)
  4: xxxx.xxxx.com:64676 - lb-192-30-253-112-iad.github.com:443 (g2h)             3>    0<  (reset)  (unidirectional)
  5: xxxx.xxxx.com:19286 - ec2-52-54-216-153.compute-1.amazonaws.com:443 (i2j)    3>    0<  (unidirectional)
  6: xxxx.xxxx.com:42942 - ec2-34-200-28-105.compute-1.amazonaws.com:443 (k2l)   12>    9<  (complete)  (reset)
  7: xxxx.xxxx.com:10764 - xxx1.xxxx.com.com:443 (m2n)                        1>    1<

Request number 6 was a working one and request number 5 was the one that did not get a correct connection.

To get more detailed info about a request you can use:

tcptrace -o5 tcpdump.out

The output looked like:

1 arg remaining, starting with 'tcpdump.out'
Ostermann's tcptrace -- version 6.6.7 -- Thu Nov  4, 2004

171 packets seen, 171 TCP packets traced
elapsed wallclock time: 0:00:00.020314, 8417 pkts/sec analyzed
trace file elapsed time: 0:00:50.000579
TCP connection info:

For a more detailed graphical analysis you can use wireshark and simply load the generated file.

By looking through the lines i found out the problem was the firewall. The ports i accepted acks on there limited from 32.768-65.535 but it used ports between 10.000 and 20.000 to make the connection.

So i changed the minimum port to 5.000, after upgrading the firewall everything worked fine.