RPC Under Fire

Steve Vinoski, RPC Under Fire, Internet Computing.

Nice discussion of the problems associated with the RPC model, which abstracts the network making remote calls look like local calls, even though they exhibit different types of failure.

Web services, JAX, and Cw are also mentioned.

Related links: here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

RPC vs Object/XML-mapping problems

A lot of the article discusses the difficulty of translating a language's native structures to and from XML. He calls this Object/XML mapping, reminding us of the problems with Object/Relational mapping.

However, I'm more interested in this quote from the article:

What seems like a good, simple idea on the surface — hiding networks and messages behind a more familiar application development idiom — often causes far more harm than good. Worse still is that it’s harm that, even 30 years later, we’re still learning about — usually, the hard way.

Does anyone have any more information about the harm of making remote procedure calls look like local ones? The abstraction seems useful. I guess the question is how it should deal with distribution problems.

Are people moving away from RPCs and exposing more of the network layer? Are there any alternative abstractions that are worth looking at? e.g. message passing.

The XMLHTTPRequest/ActiveX.XM

The XMLHTTPRequest/ActiveX.XMLHTTP model in browsers (what they call Ajax now) is actually pretty good: you have to deal with asynchrony, idempotence (GET vs POST) and response codes, and it's _still_ very simple to program against.

However, it's rather coarsely grained (chunks of text and XML are the basic unit) and one-way (browser always initiates).

Re: RPC vs Object/XML-mapping problems

Check out A Note on Distributed Computing. I first read this ten years ago and it changed the way I look at things.

it's mostly just common sense

When I first heard of RPC (back when I was a kid), the first reaction was: "wow, so my program counter goes over to that other machine, while my machine sits idle? how stupid!"

When I first saw the idea of passing serialized objects over HTTP (SOAP and the like), my first reaction was: "i've seen this before, it was called CORBA, and it failed".

This is just common sense vs. the love of abstraction, no?

Different...

Now it uses XML! ;-)

That is too true for comfort.

But like some annoying phoenix, bad but simple ideas will always return.

This is just common sense vs.

This is just common sense vs. the love of abstraction, no?

I agree about the common sense, but not about the love of abstraction (naturally, since I love abstraction).

If you agree with the criticism regarding RPC, the conclusion to draw is that the RPC abstraction is the wrong abstraction. It is way too simple an abstraction for what you need. So you need a better abstraction (e.g., async queues or buffers, blackborads, tuple spaces or whatever). The failure of one specific abstraction, won't make us abandon our love for abstractions, just like the fact that we love abstraction doesn't compel us to like any specific abstraction (such as RPC).

Eric Raymond on RPC (scroll d

Eric Raymond on RPC (scroll down to about the middle)

He takes an "ecological" rather than "mathematical" view:

one of the functions of interfaces is as choke points that prevent the implementation details of modules from leaking into each other

...

the RPC model tends to encourage programmers to treat network transactions as cost-free

...

RPC seems to encourage the production of large, baroque, over-engineered systems with obfuscated interfaces, high global complexity, and serious version-skew and reliability problems

Of course, this is all rather unverifiable...

One of my age-old question...

What successful systems are based on RPC?

From the top of my head, I only see NFS and SMB, with NIS and it's derivative barely registering...

There's got to be others, no? If not, that's rather few successes for RPC.

Not many large systems...

RPC doesn't scale particularly well. OTOH, RPC has long been used for many simple (and app-specific) client-server applications.

not so simple

wow, so my program counter goes over to that other machine, while my machine sits idle? how stupid!

Your problem is you are thinking too simple. That is you are assuming your Foobar1000 is running a function on someone else's Foobar1000 while your's sits idle. In that case, RPC is stupid. Consider the following situations though:

The remote computer is some super computer with much better abilities than your desktop, and the function is something that needs a super computer. Eg. Multiply this 100x100 matrix by this, something that would slow your computer down greatly (particularly when you are doing this in a loop), but is trivial to the other machine.

You program has forked just before calling this RPC, the other half of the program is continuing on, chewing up 100% of the local CPU, and the remote function does the other half of the problem, chewing up 100% of the remote CPU. Basically this is a variation of MPI.

The remote computer has some resource that the local doesn't. RPC is an abstraction not for CPU, but for some other hardware. NFS is designed like this (though implementations universally write their own RPC to save the overhead, the spec assumes this abstraction), and the X windows system is a form of this, but using the X client libraries as the abstraction to the X server elsewhere.

If your local computer can do everything the remote can, then you are right, RPC is stupid. If your local computer cannot, either because it is overloaded, slow, or lacks some file/device, then you are wrong.

Nope, he's right on the money.

In the scenario you outlined, all you've really done is implement asynchronous messenging over RPC. What I think Vladimir was trying to get across is that RPC is a poor abstraction. Even if you fork or spawn another thread to call the remote procedure, you're still left with some thread/process sitting there doing nothing but waiting and wasting resources. From this perspective, RPC really is stupid.

That was the paper

Not really, because RPC is just an abstraction for messaging. (which could be synchronous or asynchronous. I gave examples of both)

When working on a white board RPC is great, because on a white board it is okay to say "Some magic happens here", while ignoring all the errors. On real computers though there are far too many errors that you need to handle, so you cannot safely use RPC for anything that you wish to set and forget.

Vladimir seemed to be saying that RPC as a whole was bad. This is not the case, so long as you look as RPC as a simple abstraction for messaging. For trivial problems (that is something where you die on error, and restart by hand) RPC is an easy way to get your program running quickly, without having to write a protocol. SOAP/XML is much the same, an easy way around writing your own protocol. (Though writing the XML may be harder than writing your own protocol)

I'm completely fascinated...

...with XML-RPC, SOAP, JAX and all of these technologies. Fascinated in the sense that watching train wrecks in slow motion is fun.

They treat what is essentially building an arbitrarily complex network protocal as a data representation problem. "Hm," they seem to say, "I need to get something done that involves moving a blob of data from where it is to where it needs to be, without deadlocking and ensuring that I'll actually make forward progress. I know, I'll pack the data in XML! That'll fix everything!"

RPC schemes aren't entirely useless. They work pretty well for essentially trivial problems, which make up most of the situations that people who use them get into. But as soon as you go beyond a simple query/response, they are looking at stuffing 20 pounds of potatos into a 10 pound potato-capacity bag, and fixing all the "impedence mismatch" problems in the world only expands the capacity to 12.5 pounds.

A fun experiment you can do at home: go ask your JAX guru what the phrase "byzantine failure" means.

Good point

RPC schemes aren't entirely useless. They work pretty well for essentially trivial problems, which make up most of the situations that people who use them get into.

This is worth keeping in mind. I think RPC is a useful abstraction in these cases, since it allows you to solve the simple problems without knowing what the phrase "byzantine failure" means.

Obviously, if you need to design robust distributed systems, you better know what it means, but then it's really not the responsibility of a programming construct to make sure you know this sort of thing...

A useful but fragile abstraction?

If you want to look at it as an abstraction, it is not a good one in the sense that it is fragile. It simplifies solutions to some problems, sure, but not others. Also, there are no big warnings signs when you go off into the deep end. ("idl2c foo.idl -> "Warning: this looks like you have more than one server. Do you know what you are doing?")

As an abstraction, RPC abstracts the wrong things; it just covers the simplest part of network communications and hides the other issues.

I guess you haven't read my c

I guess you haven't read my comment above about abstraction.

The problem with this whole concept

and it doesn't matter if you're talking about RPC or CORBA or RMI or SOAP or XMLRPC or what have you, is that you are in effect creating a wire protocol, not unlike SMTP. The fact that a lot effort is being put into making the wire protocol look like just another function call doesn't change this fundamental truth.

The problem with wire protocols is that the software version of one side can be signifigantly different from the software version on the other side. Which means either you need a carefully defined protocol where new software can still talk to old software and vice versa, or you have a major configuration issue, and have to make sure that when one half is upgraded, the other half is upgraded as well.

The easiest way to make a wire protocol that isn't vulnerable to change is to a) design the protocol seperate from the implementation (see SMTP), and b) abstract as much of the implementation away as possible. But the whole point of RPC etc. is to encourage you to not view the wire protocol as seperate from the code, and to reveal more of the implementation than strictly necessary.

This isn't to say they are totally worthless- especially not if they're used wisely. And there are situations where they are the cat's pajamas. It's just that they aren't the magic elixir they keep being sold as (on about a seven year cycle, it appears).

RPC is pretty useful

RPC is fine, under certain conditions: (1) you have a tolerable RPC library, (2) you're willing to adapt to the realities of the network, and (3) your clients and servers are loosely coupled.

These days, the typical RPC user is hacking on an in-house script that needs to talk to a simple server process. The server may be local or across the backbone.

And modern RPC code is certainly a lot less offensive than CORBA:

import xmlrpclib
server = xmlrpclib.Server("http://myhost.com/path")
server.submit_record({name: "Joe", phone: "555-1212"})

Usually, somewhere around version 1.5 of the server, the author discovers several things: To reduce latency, batch requests. To achieve decent reliability, make server functions idempotent, and periodically retry failed requests. To support future expansion, use keyword arguments or Perl-style hash tables whenever possible.

For many applications, the alernative to RPC is a line-oriented TCP protocol (a la SMTP) that takes too long to write, introduces buffer overflows in your server, and doesn't handle Unicode correctly.

RPC is pretty useful?

RPC is fine, under certain conditions: (1) you have a tolerable RPC library, (2) you're willing to adapt to the realities of the network, and (3) your clients and servers are loosely coupled.

Is using synchronous communication a good idea in a loosely coupled environment?

It can be...

Mind you, there's a world of difference between synchronous messaging (think HTTP) and RPC.

Synchronous messaging does no

Synchronous messaging does not mean RPC, but RPC means synchronous messaging at least to some extent. In a loosely coupled environment, I would choose message oriented middleware over RPC any time.

Of course terms like loosely coupled mean practically nothing. Your loose could be very different from mine.

There are also different aspects into coupling. Components can be coupled because of same data structures, communication protocols, assumptions about available services etc. Because RPC is quite strongly connected to underlying PL and paradigm, those things tend to leak out. That's why I have hard time understanding how one could use it in a loosely coupled environment.

Modern RPC far less language-specific

Modern RPC protocols like XML-RPC and (to a certain extent) SOAP rely on single cross-language data model. This data model is roughly equivlant to that used in most dynamic scripting languages: strings, integers, floating-point numbers, booleans, binary globs, lists, and sets of key/value pairs.

A straightforward data model has sharply reduced leakage of underlying language crud onto the wire.

Coupling

An example of a loosely-coupled system might be the original NFS file operations: you could read and write files, list directories, and so on, but the server maintained no per-client state. The server never knew whether a file was open, or whether if any locks existed. Up to a point, this worked fine.

The trouble appeared when the NFS architects tried to implement Unix deletion and locking semantics, which require per-client state. This required adding all sorts of buggy daemons and distributed protocols that still don't work quite right. As usual, tight coupling causes headaches.

I don't think these issues have much to do with synchronicity versus asynchronicity. All things being equal, I slightly prefer synchronous systems (unless scalability is required), because most programmers can get it kinda-sorta right--as opposed to asynchronous RPC, which is generally screwed up by all but the best programmers.

REST

You're advocating HTTP without realizing it :-)

make server functions idempotent

the whole GET/POST thing

use keyword arguments

GET query parameters (name=value), POST form fields (including binary attachments and such)

Unicode

response encodings

introduces buffer overflows in your server

use a standard HTTP server - there's lots of them, including embedded ones like Jetty

REST is OK

REST (passing RPC messages as CGI parameters) is basically OK as long as your parameters fit a simple key-value model and your result is a file. This happens often enough in the real world.

However, as soon as you need to pass complex data structures across the wire, REST gets pretty hackish. If you find yourself doing funny encoding tricks to stuff your arguments into CGI parameters, or doing a lot of XML parsing on the returned document, it's easier to just use an off-the-shelf RPC library.

There's other problems with REST, too. For example, there's no standard, well-implemented way to package up multiple RPC calls into a single request (similar to XML-RPC multicall, for example). You'd think that HTTP pipelining would let you do this, but I've yet to test a standalone HTTP library that could avoid a gratuitous round-trip per request. Maybe that's improved in the last year or two?

Coupling?

RPC is fine, under certain conditions:...your clients and servers are loosely coupled.

But RPC pretty much ensures that your clients and servers are tightly coupled, due to versioning issues, proceedure call semantics, and a whole host of requirements put on both sides in order to make it look like a proceedure call.

The RPC problem is a perfect

The RPC problem is a perfect example of a more general issue in both software and engineering that I can’t resist commenting on. As engineers and scientists we are trained to be analytic. Analysis is certainly powerful and beautiful, and we often try to extend analysis and abstraction into the environment in order to avoid the ugly consequences of dealing with reality. RPC is the perfect example since it tries to extend the analytic environment of the computer into the network and the world. Getting this strategy to work is always a matter of further and further extensions. Taken to its logical conclusion it amounts to specifying and thus controlling the entire world! Is this really what we want to do? Why not just accept reality and deal with it? In the case of communication it is only a matter of adaptation. Provide information on the capabilities of the various parts and configure as needed. This might be more work but it is “realistic”, and it doesn’t necessarily prevent abstraction

REST is best

REST is not a substitute for RPC, but a different model altogether.

It's no problem to send an XML document (or any other binary content) via POST, if the transfer of more complex data structures is required.

The semantics of GET are such that the URL by itself should be sufficient to identify the requested resource, so very complex search queries should probably be performed by POSTing the search to some query handler that uses it to create a resource representing the search and gives the URL of that resource in return, then GETting the query results from that URL. If the query will take time, then the URL could alternatively be used to provide a query status indicator. That way the client can get on with something else while the search is being performed, polling the result URL periodically to see if it's been updated.

Because REST transfers aren't RPC calls, there isn't any sense in trying to package up multiple RPC calls into them. But, again, there's nothing to stop you sending a lengthy document representing a batch of work to be done, and getting a document representing a schedule (with a URL to query the status of each work item requested) in return. For multiple tasks, it's probably better to do things asynchronously anyway.

RPC is good in some cases, bad in some others.

It all depends on the nature of the application. If the application wants to drive other applications or be driven by other applications, then the RPC abstraction works. Examples of such applications are defense applications, where every module sends and receives commands in order for the whole system to be at a specific state at any moment in time.

If the application only wants to submit data into a server, then PRC is not good; mainly it is slow, and not very dynamic. There are other protocols that are much better, including plain SQL.