A Logic File System. Yoann Padioleau and Olivier Ridoux, IRISA / University of Rennes. General Track, pp. 99-112 of the Proceedings http://www.usenix.org/events/usenix03/tech/padioleau.html
This is the most interesting, insightful, and innovative talk at this
conference.
The goal of the talk is to combine the navigation and querying in a
file system. Roughly speaking, the talk attempted to bring filesystems
from the age of hierarchical and network databases into the age of
relational -- and, moreover, deductive databases -- and yet maintain
the familiar cd/ls/mkdir/mv interface, file paths and shell globs.
The current filesystems are built around the hierarchical model
(links, if present, are only uni-directional. Therefore filesystems do
not actually follow the network database model). Unfortunately, a
hierarchy is a tree rather than a lattice: a hierarchy decomposes the
system around one aspect disregarding the others. For example, the
typical layout of the filesystem emphasizes the scope: /, /usr,
/usr/local, /usr/local/share/ghostscript/7.05. Other aspects, e.g.,
library code, are smeared all over the place: /lib, /usr/lib,
/usr/local/lib and /usr/local/share/ghostscript/7.05/lib. Locating
files that are local to an installation is easy (/usr/local), locating
all the library files is hard. Many interesting queries are difficult
to execute or even to formulate. For example, to list all directories
that contain only empty subdirectories, to "cd" to a directory with
files foo.h or foo.c (disjunctive query), or to list all the files *.h
or *.c in the given tree that are not in (sub)directories named
Attic. To answer such queries, we have to write quite complex logical
formulas, made more complex by the fact that me must keep the
navigational structure in mind.
Many operating systems have tools that impose a relational layer on
the file system -- glimpse, locate, find -- and offer a keyword-based
search. However these tools often return a great number of results,
which are not grouped in any particular way. For example, "locate
sendmail.cf" may return a couple of file names. But "locate
!sendmail.cf" (where ! stands for negation), if it were allowed, would
list almost any file on the system. Clearly the user will find such a
result unmanageable. The user would much prefer if the output were
automatically grouped according to the most general and informative
categories: the most general keywords that _will_ refine the
search. Furthermore, the user should be able to navigate such a
virtual directory of results using the familiar 'cd' and 'ls'
commands, or reclassify a file with a 'mv' command. Finally, if the
user adds or deletes a file, the virtual file tree should be updated
automatically.
We would like to "cd /lib" and descend into the Directory that
contains only library files, grouped into subdirectories "root",
"usr", "local". We would like to "cd /bin|/sbin" and get to the root
of the "filesystem tree" that contains only executable files (again,
grouped into "subdirectories"). We may even do "cd !src" to get the
view of the filesystem without any source files. The Logical File
System described in this talk (LISFS for short, not to be confused
with the Log File system) can accomplish all that -- now. The prototype
implementation is fully functional and reasonably fast. An
ls-benchmark shows hardly any performance overhead of far more
expressive filesystem queries.
The logical filesystem LISFS takes a relational view: files are
entities that are described by their properties (attributes). To
locate a file, we execute a query. LISFS lets a user compose a query
incrementally: "cd /query" executes the query and makes the
corresponding collection current. "cd query1" refines the original
query. Incidentally, this means that "cd a/b" and "cd b/a" are the
same: the path separator stands for a conjunction, and therefore, is
commutative. The closed-world assumption makes negation queries
possible (e.g., "cd !/lib"). LISFS goes farther than the naive
relational model. A user may create taxonomies of attribute
values. For example, the user may create a category "lib" with
sub-categories "shared-lib" and "static-lib" -- using the mkdir
command. LISFS will infer that anything classified as a shared library
is also a library. The inference rules can be more complex: we can
define a category "dual-lib" for project's libraries that can be
linked both statically and dynamically. When LISFS reifies the result
of a query as a filesystem directory, the most general categories that
partition the result become the names of subdirectories. LISFS thus
lets us navigate the inference tree. The complex inference requires a
logical engine -- and indeed the current version of LISFS includes a
Prolog interpreter, as a part of the filesystem code! Truly a file
path is a logical formula.
Like the semantic file system, LISFS has intrinsic attributes, whose
values are derived from the content of a file (e.g., file's size or
the resolution of a JPEG image). Unlike the semantic file system, a
user can define his own attributes and organize them into arbitrary
taxonomies.
The paper describes the algorithms, the implementation of directory
operations, and the obvious optimizations (e.g., caching).
LISFS is implemented truly at the low, file system level, as a
user-level NFS daemon. Therefore, none of the applications need to be
modified in any way to take advantage of LISFS. For example, the
wildcard expansion in a shell (glob) "just works".
LISFS is written with the help of a PerlFS toolkit. However, Yoann
told me that the implementation language was not truly Perl. Yoann
designed his own language with types and an ability to express
post-conditions and invariants. That source language was then
mechanically translated into Perl. Perhaps Yoann might wish to supply
more details?
The only filesystem that comes close to LISFS in terms of integration
of query and navigation was the file system of Apple Newton (so-called
'soup'). But NewtonScript never had any logical inference.
In the future research, Yoann plans to implement views of a file. We
could then edit selected sections from a TeX document as if they were
a separate file (Emacs' all-mode is a glimpse of that
functionality). It seems that LISFS can be meaningfully integrated
with Semantic Web.
Yoann was asked a question if a common person (without the knowledge
of logic, etc) would be able to meaningfully navigate LISFS. Yoann
gave an excellent answer: a common person can use Google. LISFS is not
much different.
The prototype and more information are available at http://www.irisa.fr/LIS
|