Trickles: A Stateless Network Stack for Improved Scalability, Resilience and Flexibility

Trickles: A Stateless Network Stack for Improved Scalability, Resilience and Flexibility (PDF) by Alan Shieh, Andrew C. Myers, Emin Gun Sirer (2005)

Abstract: Traditional operating system interfaces and network protocol implementations force system state to be kept on both sides of a connection. Such state ties the connection to an endpoint, impedes transparent failover, permits denial-of-service attacks, and limits scalability. This paper introduces a novel TCP-like transport protocol and a new interface to replace sockets that together enable all state to be kept on one endpoint, allowing the other endpoint, typically the server, to operate without any per-connection state. Called Trickles, this approach enables servers to scale well with increasing numbers of clients, consume fewer resources, and better resist denial-of-service attacks. Measurements on a full implementation in Linux indicate that Trickles achieves performance comparable to TCP/IP, interacts well with other flows, and scales well. Trickles also enables qualitatively different kinds of networked services. Services can be geographically replicated and contacted through an anycast primitive for improved availability and performance. Widely-deployed practices that currently have client-observable side effects, such as periodic server reboots, connection redirection, and failover, can be made transparent, and perform well, under Trickles. The protocol is secure against tampering and replay attacks, and the client interface is backwards-compatible, requiring no changes to sockets-based client applications.

What you get when you combine continuations and networking. The idea should be obvious to most LtUers.

I really can't believe this hasn't been mentioned on LtU before.

The site is here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

PDF link broken

The PDF link appears to point straight back here to LtU. The PDF can be found here.

fixed


Neat

I like it, although there are some downsides:

* The network continuation is 75+12m bytes (where m is the number of "loss events" and usually = [<=?] 1). A TCP header is 20 bytes, so the difference is significant but not terribly so.

* CPU overhead is higher but, they say, "does not pose a server bottleneck even at gigabit speeds." TCP's utilization is shown at ~55%, while Trickles is ~78%, when copying a file from memory.

On the other hand, strangely, they do not reference any of the other work on continuations and networking, particularly the webby stuff.

I do wonder how hard it would be to make it stateless on both sides....

Stateless both sides

I do wonder how hard it would be to make it stateless on both sides....

I suppose it could be done, but then the connection would drop as soon as a packet was lost. As it stands, it survives packet loss because the client has enough state to retransmit.

Soft state in routers?

Well, what if the routers take the responsibility of keeping this state to some extent (as they already do for various things, especially in multicast, and even for TCP connections)?
Yes, the routers then become the bottleneck, but in some cases this shift of responsibility might be useful.

The routers themselves then

The routers themselves then become vulnerable, and IP never guarantees that the same route will be used, so this doesn't seem viable. No, I think the incentives against misuse are properly aligned if the client must maintain the state.

Egress routers only

No, I think the incentives against misuse are properly aligned if the client must maintain the state.

I agree. But "the client" may have a broader meaning than "the software running on the specific chunk of hardware that person is holding on his lap". The client may be an organization, or unit within it. As long as this organization properly redistributes incentives internally (e.g., by firing abusers :) ), I believe it's perfectly ok to treat the whole organization responsible to the outside server. If the router we are talking about belongs to the same organization as the end user, then it is not vulnerable. E.g., I could envision egress routers maintaining TCP state for clients residing in their organization. This would factor network state out of hardware nodes running applications - I am not sure whether the benefits of this are worth the effort, but from mechanism design POV it looks doable.

Too bad, it starts with SYN, ACK

IMHO, this new protocol should have two variants:
1- a UDP like question-answer variant. The client can send request data with the fist "SYN" packet and the server is notified of the SYN packet (instead of having the SYN packet processed only by the network stack), so the server can choose either:
* to answer immediately: less latency (only one RTT) but the server is vulnerable to DOS attack with source IP spoofing, so it's useful only in protected networks.
* or to answer only with an ACK, but in the background the server starts also to process the request to fill a cache (with a low priority) . So when the request/reply to the ACK arrives on the server, hopefully the server can answer immediately as the answer is already in the cache.

This may imply using a new API for the client though, I'm not sure..

2- the TCP like protocol described in the article.

Sure the cache in (1) can be seen as a state in the server, but as it's only a cache, it doesn't really matter..

Actually, it does support #1 already.

In the Trickles API, the client can write to the socket before connecting. Queued-up data is sent with the very first packet. This usage is of course not backward compatible with TCP sockets.