archives

UpgradeJ: Incremental Typechecking for Class Upgrades

UpgradeJ: Incremental Typechecking for Class Upgrades, Gavin Bierman, Matthew Parkinson and James Noble.

One of the problems facing developers is the constant evolution of components that are used to build applications. This evolution is typical of any multi-person or multi-site software project. How can we program in this environment? More precisely, how can language design address such evolution? In this paper we attack two significant issues that arise from constant component evolution: we propose language-level extensions that permit multiple, co-existing versions of classes and the ability to dynamically upgrade from one version of a class to another, whilst still maintaining type safety guarantees and requiring only lightweight extensions to the runtime infrastructure. We show how our extensions, whilst intuitive, provide a great deal of power by giving a number of examples. Given the subtlety of the problem, we formalize a core fragment of our language and prove a number of important safety properties.

There has been an energetic discussion of API evolution in the forum, so when I saw this paper I thought it might be of interest to LtU readers.

What makes backreferences impossible in a DFA regular expressions evaluator?

Hello all,

I've been lurking for a while but this is my first post, so I apologize if I'm off-topic. I have read in papers such as this one that matching regular expressions with a "Thompson's NFA" or DFA makes matching backreferences inefficient (the above paper says it's np complete). Since I've written a regex engine that uses the "Thompson's NFA" algorithm as I understand it that matches backreferences efficiently in most cases, I'm assuming I am misunderstanding either Thompson's NFA or the use cases that can cause exponential runtime.

I've made the entire code available here, and I'm pasting the relevant code at the bottom of my post (it is short).

Why are backreferences considered impossible with a dfa regex engine?

BEGIN RELEVANT CODE

-- the compiler compiles a regex to a single MatchSequence
data RegexOp where
        MatchEndAnchor  :: RegexOp
        MatchChar       :: Char -> RegexOp
        MatchAny        :: RegexOp
        MatchSequence   :: [RegexOp] -> RegexOp
        MatchGreedy     :: Int -> Int -> RegexOp -> RegexOp -- min, max, operation
        MatchNonGreedy  :: Int -> Int -> RegexOp -> RegexOp
        MatchClass      :: [RegexOp] -> RegexOp -- used for character classes and (..|..)
        MatchLookAhead  :: RegexOp -> RegexOp
        MatchNegative   :: RegexOp -> RegexOp
        MatchBackRef    :: Int -> RegexOp
        BeginCapture    :: RegexOp
        EndCapture      :: RegexOp
    deriving (Eq, Show)

...

-- a possible state, our runtime keeps a list of all possible states by using the List Monad
data ReState = ReState {
    stream              :: String
    ,current_groups     :: [String]
    ,groups             :: [String]
    ,current_match      :: String
} deriving (Show, Eq)

...

-- start a new capture
--
push_capture :: ReState -> ReState
push_capture state = ReState (stream state) ([] : current_groups state) (groups state) (current_match state)

...

-- finish the newest capture and add it to our groups
--
pop_capture :: ReState -> ReState
pop_capture state = ReState (stream state) (tail $ current_groups state) (groups state ++ [head $ current_groups state]) (current_match state)

...

-- dfa_mkop is our main execution function, it takes an op and a state
-- and returns a list of all possible states
--
dfa_mkop :: RegexOp -> ReState -> [ReState]

...

dfa_mkop (MatchBackRef refnum) state =
        do
            g <- return $ groups state
            if length g < refnum
                then []
                else do
                    gval <- return $ g !! (refnum - 1)
                    st <- return $ stream state
                    if length st < length gval
                        then []
                        else if take (length gval) st == gval then [matched (length gval) state] else []

dfa_mkop BeginCapture state = [push_capture state]
dfa_mkop EndCapture state = [pop_capture state]