Languages and systems for edge-computing?

In my work, I've started to think a lot about the role of edge computing (think Opera Mini) in applications (and potentially the role of CDNs). My guess is that it's a very constrained C/C++ world right now, outside of some projects like maybe Flux (though it is currently targeted at traditional data centers).

More of a shot into the dark than usual for me, but any suggestions of where to look for orchestrating and benefiting from these? A lot of it is closed/proprietary right now, making it a bit hard to google. One direction to look into is IDS implementations and smart routers, but these seem to be along a slightly different evolutionary path...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

clarification

You put a lot of acronyms and broad terms into a very short amount of space; it would help if you could clarify a few points to let me understand exactly what it is that you are asking:

From the title I infer that you are asking for us to maybe throw out a few specific topics and/or resources to help you familiarize yourself with edge computing, correct?

First point that I get stuck on with this is right at edge computing. From the wikipedia article it seems like "edge computing" refers to essentially distributed cloud-computing, with data and processing happening somewhere between the client and central server. On this topic Akamai seems to be the originator of the term, but there are also several established concepts that might be of interest: P2P architecture is an obvious parallel, but also Content Centric Networking is an interesting related concept. If you would clarify what aspect of this scheme you are interested in (and maybe how that relates to programming languages), that would help in narrowing down the topic.

By "Opera Mini" I assume you are referring to the Opera browser? Does any particular aspect about that browser make it special within this context or are you just referring to maybe the general computing power available within the browser from javascript, java, etc..?

CDN = Content Delivery Network?

C/C++, I am not sure why any language in particular would have any hold on this market because I don't quite understand what projects we are referring to here.

Flux? I can only think of one Flux for this context, which I coincidentally interned for at one time, are you referring to the job-scheduler/workflow company?

IDS? All I can find is (Intrusion Detection System) which does not sound on topic at all.

I agree, this is dense and

I agree, this is dense and not very readable. There was also the Flux project/team at Utah, where they've been going into test beds recently.

Ah, let me try again, and

Ah, let me try again, and with some motivation:

I've been designing a multicore web browser so that the web won't suck as much on future mobile devices. This is great if we really do the computations on the client (e.g., we've been successfully parallelizing parsing and various CSS layout components/languages). However, why compute in the client if you don't need to? Opera Mini and some others have begun to poke at this. In their case, they go one more step: some computation will occur near the phone/ISP provider (you don't want to hop all the way to a Norwegian proxy on every site!) that removes junk from the page before it's sent to your browser.

I don't see the need to give up correctness in exchange for performance (as an ex-designer, this is horrible if I already mobile-optimized my site!). For example, several of our parallel algorithms are speculative. If we speculate wrongly, we incur extra work on the client, which is bad: the ISP (which might have something stronger than a 1 watt machine) could figure out all of the values we're speculating on for us, and send those along with the page. The ISP is acting as an edge here, instead of a full-on proxy.

One way to think of this, I want to not only write a parallel browser, but a distributed one. Traditionally, we only see content delivery networks or intrusion detection systems working here. The thing is you *really* don't want this layer to be a bottleneck... and most systems I've heard about here are closed source / proprietary.

And Flux... I meant the MSR dataflow language for data centers. It's still very much a prototype so that probably wasn't a good reference :)

Edit: so what I'm wondering about is how we write programs for edges or contain edge components. My intuition is "very carefully and lightly."

So what I'm wondering about

So what I'm wondering about is how we write programs for edges or contain edge components.

Honestly, I'd ask the people who do it what they do, first -- I don't think you can really design a domain-specific language without getting to know the domain and what the acual pain points specific to it are. (Plus there may actually be several divergent architectures getting bundled under a single umbrella.)

You're at Berkeley, so you might consider taking advantage of the vast social network at your disposal: you almost certainly already know other grad students or professors who can introduce you to people who either do research on or build these kinds of systems for a living.

I realize this isn't really technical advice, but hey, it's (a) outside my expertise, but (b) I still like giving advice anyway. :)

Just putting out feelers :)

Just putting out feelers :) I have been watching out for folks in this area, but it's pretty deep in the trenches (despite all the mesh, cloud, IDS, etc. stuff going on near me). It's also still self-serving at this point -- I want to play around in this area and therefore the representative setup in order to see how it interacts with my own problems. I think some LTU'ers work for telecoms, so was worth a shot :)

I agree with your first point -- I've learned to better identify and (largely) ignore work that jumped right to the let's-make-a-language-or-framework stage. Conversely, while the PL'er inside me often cringes when non-PL people design languages, I know to listen when it's a language to address a problem they understand.

Edit: I should also add that I've found people to be very secretive in this area

I implemented a distributed

I implemented a distributed Java virtual machine for the Palm Pilot once (back when the heap was limited to 32 or 64K, total of 1MB flash). The idea was to use a server to pre-process classfiles into something that could be loaded/interpreted efficiently on the Palm. This was pretty simple of course, and the distribution was fixed/explicit. One related problem is imaging: how much work to do on the server vs. how much to do on the client, given bandwidth and scalability trade offs.

A pure transparent parallel system is an interesting thought, and especially relevant with respect to new-again cloud computing. Perhaps what we need is more inference, automatically deciding how to partition code between remote and local environments.

sounds interesting...

I almost understand where you're coming from. I might be someone you want to hear from, but I'm not sure. Can you say exactly what class of problem you might want to tackle? When you talk about moving where computation occurs, so a boundary between client and server is more fluid than now (than just serving up web pages etc), are you thinking of something like a distributed operating system (that is: very broad) or something like a markup server-side markup language where parts execute in different places? Or something else?

lmeyerov wrote: I've been designing a multicore web browser so that the web won't suck as much on future mobile devices. This is great if we really do the computations on the client (e.g., we've been successfully parallelizing parsing and various CSS layout components/languages). However, why compute in the client if you don't need to?

I've been working (on and off) in commercial/closed source edge computing and CDNs (content delivery networks) for several years. Yes, industry folks tend to be secretive. What's more, systems tend to be complex, so advantage goes to folks who don't publish designs for complex parts, leading to secretive behavior.

Also, to reduce problem scope so complexity isn't as bad, folks tend to solve narrow problems, as opposed to the whole distributed ball of wax. So if a big distributed code picture is a whole alphabet of problems, A to Z, most everyone tackles just one or a few of these. So you see a solution for X or Y in isolation, without any regard for how they might work with other components in a suite of solutions. The space is therefore very balkanized. For example, everyone has a different caching solution, specific to particular narrow problems. Thus caching for storage is different from caching for networking, even if one might think leverage can be had in solving both at once.

lmeyerov wrote: One way to think of this, I want to not only write a parallel browser, but a distributed one. Traditionally, we only see content delivery networks or intrusion detection systems working here. The thing is you *really* don't want this layer to be a bottleneck... and most systems I've heard about here are closed source / proprietary.

If that's your main point, I think I know what you mean. What you want to do is very hard. It's do-able, but you have a mountain of human factors problems in the way of making progress. For example, you'd need a different protocol between clients and servers besides HTTP. (You'd need a lot of new things, but I only need one example to make my point.) This part alone would absorb all your energy if you wanted consensus with anyone else, instead of just blazing a trail on your own. Collectively, crowds of opinionated geeks are as easy to collaborate with as lunatics, with similar results.

I've mostly worked on optimizing results when protocols are not changed. The closest thing I know to what you're talking about is the ESI (edge side includes) spec from Oracle and Akamai a few years ago. I designed, wrote, and deployed a version of ESI in a caching/reverse-web-proxy CDN a few years ago. That server side markup system almost has the makings of expressing composable remote computation, but only in the form of included fragments that can have independent identity and caching behavior. It had no general expression of computation per se.

I'm just trying to establish some context; I'm not saying you want to do something like ESI. (Actually, I assume you don't because ESI was a narrow solution to a web caching problem.)

If you want blue sky discussion, I might contribute, but I think the problem is hopelessly large. If you want ready-made solutions out there to apply, I think they mostly don't exist. I know vaguely what competitors are doing in various space touching this area. Most of it is turnkey solutions to specific performance scenarios; there's no public infrastructure to leverage.

I speak Meyerovian

There was also the Flux project/team at Utah, where they've been going into test beds recently.

Well, then, good to see Utah people being polemical about monogamy. ;)

Responding to andrew...

CDN = Content Delivery Network; see Steve Souders' books on building fast web sites for an explanation. Alternatively, Phil Greenspun's PANDA and SEIA books explain it, too (both free, online).

Opera Mini = Opera's embedded mobile web browser

Flux refers to Cumulo Flux's cloud computing workflow tool:

Cumulo Flux is a tool to construct and execute pipelines or workflows on the internet. Flux can be thought of as a way of executing and coordinating arbitrary computations in the cloud. The services that Flux connects together can be running on any computer and written in any programming language. You can even write a service that runs from a browser client. Flux is a drag and drop web based tool that anyone, even non-programmers, can start using immediately. Flux leverages Google App Engine to allow you to do serious globally distributed computing from anywhere.

Edit: apparently I speak a broken dialect of Meyerovian - I had the wrong Flux:

And Flux... I meant the MSR dataflow language for data centers.

MSR = Microsoft Research... but I'm not aware of this work being done at MS Research. See: Flux: An Adaptive Partitioning Operator for Continuous Query Systems