Question: do you have to climb the tower of interpreters?

You know what is interesting about meta-circular interpreters? The infinite recursion of eval-apply is always broken in the end by the application of some primitives. So in the end, no matter what your computational model is, you are still executing a stream of primitives in your "ground" language. And, usually, "lambda" is all you'll ever need as a primitive. But looking at so many toy interpreters you'd see that "lambda" is not quite so "primitive" and requires a lot of machinery to work.
So instead of providing some means of abstraction directly as a ground mechanism to build everything else around it, I wanted to do a simple thought exercise and to go from another direction: if we have a fixed set of primitives, without means of abstraction, what are the ways to compose/combine them?

Imagine we have a really old CPU without a support for a call instruction, but it has a conditional jump, ALU instructions, loads/stores. It's a simple exercise to define how would a call instruction would look like (in pseudo-assembler):

define CALL(target_addr):
load stack_pointer, tmp
store tmp, the-address-of-the-next-instruction-after-call //don't care now how we can get this
dec tmp
store stack_pointer, tmp
//skipped the frame pointer manipulation
branch target_addr

So, in general, our CPU IS capable of doing necessary steps to create an abstraction mechanism, but there is no way to "plug it in" by the CPU itself unless our preprocessor rewrites all "CALL" tokens into these primitives. Not good.
The only other way I could think of is to interpret another language, which has this CALL as primitive. I.e. the CPU itself becomes such a "preprocessor". But does that mean that the only way to escape deficiencies of your current computational model is to climb the tower of interpreters? (aka interpret another language that has that capability. My favorite examples: a Scheme interpreter with first-class macros in scheme by Matt Might, Kernel interpreter in Scheme by J.Shutt, 3-LISP by Brian Smith)

Are there any other ways, besides interpreting another language? Maybe there is a way to build such a reflexive virtual machine that can modify and extend itself? I haven't seen such yet.

I've recently stumbled upon Ian Piumarta's "Open,extensible composition models" VPRI Technical Report TR-2011-002 which describes a meta-circular evaluator, where instead of your usual big cond-statement in eval you select your evaluator based on the type of expression you are evaluating (same for applicators) and this mapping is available for the programmer. I've found this a very interesting read, maybe someone could comment on this paper, as I've surely missed a lot.

I have surely asked a lot of questions in a very unstructed form, but if anyone would maybe give some useful pointers (something to read on the topic, maybe) it would be greatly appreciated.