Supporting a spectrum from whole program to separate compilation to aid in efficient program generation

There are certainly whole program compilers with the aim of making higher level languages compile far more efficient runtime executables. But as I currently understand these compilers, their practical usefulness for large scale program development is limited due to efficiency of the compilation process itself.

So, I was wondering if any languages - more specifically, the types of higher level languages that feature language abstractions that might benefit most whole program compilation - supported a compilation model and associated language abstractions that tried to encompass a wider spectrum of optional efficiency oriented language abstractions and features between whole program model compilation and separate compilation in the name of runtime efficiency.

As for "whole program compilation," what I have in mind instead is some kind of limited "library" - or "package" or "unit" (etc.), let's use "library" for now - abstraction comprised of multiple source files that are subject to "whole library compilation. This would allow programming teams to decide when to apply a more limited scope of "whole program compilation" just to performance critical libraries in their programs.

Between these "libraries" (however they are compiled), we have "separate compilation" made safe via traditional module "signatures," or unit "interfaces" or whatever (pick your lingo and preferred module header styled feature).

But additionally, within the "separate compilation model," we might also support other language features that assist the compiler in generating more efficient code. Some low hanging fruit might be: annotations and/or compiler driven function inlining, annotation or compiler driven loop unrolling, C++ or D style template data types and functions (as an alternative to "normal" generic, parametric polymorphic functions and data types), C (and Smalltalk, interestingly)-styled variable length data structures, full blown Scheme styled macros (no, not always or even primarily an efficiency tool, but still...), annotations for "destructured" function argument data types (potentially avoiding memory allocation for passing many function arguments), and so on and so forth.

Once could write entire programs in the language without using these features, but one could also write entire libraries (say, just for example, a performance oriented OpenGL or encryption library) using these directives and alternative language level abstractions - and then pull out all the stops by subjecting the "library" to whole program compilation. Aside from segregation of the program into discrete libraries (yes, not always an easy task in the general case), most of these efficiency oriented features could be utilized incrementally in modifying a program in response to profiling.

Skipping the not-so-high-level C++ (simple macros, templates), are there any higher level languages that support such a "spectrum of efficiency oriented language abstractions and features." I'm particularly interested in the concept of a more limited scope for "whole program compilation" via some kind of more selective "library" abstraction.

I'd welcome any wisdom on efficacy, usability and implementation challenges.