5 minute read

In this weekly post I explored how Cloudfare devs debug network applications and IPFS as a solution to hyper-centralized Internet.

Debugging UDP for Cloudfare Tunnel

These notes provides an easier understanding of this Cloudfare post.

What is Cloudfared?

“Cloudfare Tunnel lets customers connect to their private services and networks through the Cloudfare network without having to expose their public IPs or ports through their firewall.”

When a user wants to use a customer service, it makes a request to a server of the Cloudfare network, also known as Cloudfare edge. (As a user is intended, for instance, a guy navigating on Internet, the customer is the company offering the services and using the Cloudfare products). Cloudfared is a tool, running on the same network as the services, that creates a tunnel between the Cloudfare edge and the services.

How does Cloudfared works?

Cloudfared keeps alive many TCP connections proxied over HTTP/2 to different servers on Cloudfare Edge. When a user makes a request to the hostname of a service, the Cloudfared Edge proxies the request the service through Cloudfared.

Why using QUIC over UDP instead of TCP over HTTP/2?

TCP traffic sent over HTTP/2 is susceptible to Head of Line (HoL) blocking . Additionaly it is not possible to initiate communication from cloudfared HTTP/2 server in an efficient way.

QUIC solves the possible HoL blocking issue.

Setup for adding QUIC support to Cloudfared

The servers on the edge are TCP-based listeners. So it is necessary to add a QUIC listener on the servers on the edge that can communicate with the QUIC version of Cloudfared, based on UDP. According to later tcpdump reports:

  • Cloudfare edge server is running on 173.13.13.10:50152
  • Cloudfared process is running on 198.41.200.200:7844

However, Cloudfared didn’t even established a connection to the servers on the edge. The first checks were:

  • Is firewall rules allowing traffic to this port? Yes
  • Is iptables rules accepting or dropping appropriate traffic for this port? Yes

To analyze transmitted packets, the tcpdump command was sudo tcpdump -n -i eth0 port 7844 and udp. Something strange was happening:

  • packets from edge servers to Cloudfared had 198.41.200.200:7844 as destination addresses
  • packets from Cloudfared to edge servers had 203.0.113.0:7844 as source address

These two commands were useful for troubleshooting:

  • ip addr list outputs the addresses attached to network interfaces. The output shows two addresses, 198.41.200.200 and 203.0.113.0, for eth0
  • ip route show shows the routing rules of the host

In the latter case, the output was:

  ip route show

  default via 203.0.113.0 dev eth0

It is clear that the default address is sent back by Cloudfared as the source address of the packet.

Why does TCP work and UDP doesn’t?

TCP is a connection oriented protocol where the two parts establish a connection through the 3-way handshake (conect()). The kernel stores a state for the connection, containing the source address as well.

UDP is a connectionless protocol so the kernel doesn’t store the state of the connection. The server on Cloudfared invokes recvfrom() system call and tells the source address, but not the destination address. When the Cloudfared server responds to the client, invoking the sendto() system call, we can specify the destination address but not the source address. The kernel has to set the source address and in this case it picks the default address.

Forcing the correct source address

Linux has generic I/O system calls, sendmsg() and recvmsg(), that allows to pass additional control information (called oob or out-of-band data), including the source address, by means of the msg_control field of the msghdr struct. To fix the problem, the control information filled by recvmsg() is passed to sendmsg().

To solve additional bugs, the strace tool was used to track the system calls parameters. Eventually, the cmsg_level (control message level) of the msg_control field was set to IPROTO_TCP instead of IPROTO_IPV4.

IPFS as a solution for hyper-centralized Internet

Old but gold article that tells why HTTP is not enough anymore.

IPFS is a distributed file system that seeks to connect all computing devices with the same system of files.

The aim of IPFS is similar to that of HTTP but in the former peers are content users as well as providers.

HTTP cut the costs of publishing contents whose lifetime depended on a continuously alive server. If the machine failed or the chain of links is broken, the content is lost. How many times you clicked a link and the result was 404 Page not found? Centrally managed web servers inevitably shut down, the domain changes ownership or the company goes out of business.

Originally web was intended to be decentralized but today it is becoming more and more centralized because few web server providers store the majority of contents. This hypercentralization conveys privacy and security risks. Governments could intercept all the traffic by means of this hypercentralized servers or communications could be interrupted by a DDoS attack to those servers.

Nowadays, web highly depends on Internet backbone that connects datacenters. Even though data are stored redundantly, backbone failures are not such a rare event and we are completely cut off from the resources.

IPFS is the solution because contents are stored among the users of the network itself. Instead of querying a server for the location of a file, we ask the entire network for the content of file. It is possible thanks to the hash of a file that creates a fixed-length string starting from its content. It is intrinsically secure because if the content is modified, the hash is modified as well. For contents that update frequently, like a website, IPFS lets tou use IPNS that associates a pubkeyhash (hash of a public key) to a content. The content could change while the pubkeyhash remains the same.

A Distributed Hash Table is the core of IPFS to find the nodes that stores a specific content. Larger files are separated in smaller chunks and distributed among nodes. So, when we query a content we can download it from many nodes at the same time instead of a single server.

However, a node doesn’t need to store all possible files but he can pin only those that it wants to keep available to other users.

Hashes are not human-friendly to be remembered. IPFS allows to use the existing Domain Name System (DNS) to provide human-readable links from IPFS/IPNS contents. It just needs to add a TXT record to the nameserver.

Leave a comment