User loginNavigation |
archivesUpgradeJ: Incremental Typechecking for Class UpgradesUpgradeJ: Incremental Typechecking for Class Upgrades, Gavin Bierman, Matthew Parkinson and James Noble.
There has been an energetic discussion of API evolution in the forum, so when I saw this paper I thought it might be of interest to LtU readers. By neelk at 2008-08-31 12:55 | OOP | Semantics | Software Engineering | 2 comments | other blogs | 7084 reads
What makes backreferences impossible in a DFA regular expressions evaluator?Hello all, I've been lurking for a while but this is my first post, so I apologize if I'm off-topic. I have read in papers such as this one that matching regular expressions with a "Thompson's NFA" or DFA makes matching backreferences inefficient (the above paper says it's np complete). Since I've written a regex engine that uses the "Thompson's NFA" algorithm as I understand it that matches backreferences efficiently in most cases, I'm assuming I am misunderstanding either Thompson's NFA or the use cases that can cause exponential runtime. I've made the entire code available here, and I'm pasting the relevant code at the bottom of my post (it is short). Why are backreferences considered impossible with a dfa regex engine? BEGIN RELEVANT CODE -- the compiler compiles a regex to a single MatchSequence data RegexOp where MatchEndAnchor :: RegexOp MatchChar :: Char -> RegexOp MatchAny :: RegexOp MatchSequence :: [RegexOp] -> RegexOp MatchGreedy :: Int -> Int -> RegexOp -> RegexOp -- min, max, operation MatchNonGreedy :: Int -> Int -> RegexOp -> RegexOp MatchClass :: [RegexOp] -> RegexOp -- used for character classes and (..|..) MatchLookAhead :: RegexOp -> RegexOp MatchNegative :: RegexOp -> RegexOp MatchBackRef :: Int -> RegexOp BeginCapture :: RegexOp EndCapture :: RegexOp deriving (Eq, Show) ... -- a possible state, our runtime keeps a list of all possible states by using the List Monad data ReState = ReState { stream :: String ,current_groups :: [String] ,groups :: [String] ,current_match :: String } deriving (Show, Eq) ... -- start a new capture -- push_capture :: ReState -> ReState push_capture state = ReState (stream state) ([] : current_groups state) (groups state) (current_match state) ... -- finish the newest capture and add it to our groups -- pop_capture :: ReState -> ReState pop_capture state = ReState (stream state) (tail $ current_groups state) (groups state ++ [head $ current_groups state]) (current_match state) ... -- dfa_mkop is our main execution function, it takes an op and a state -- and returns a list of all possible states -- dfa_mkop :: RegexOp -> ReState -> [ReState] ... dfa_mkop (MatchBackRef refnum) state = do g <- return $ groups state if length g < refnum then [] else do gval <- return $ g !! (refnum - 1) st <- return $ stream state if length st < length gval then [] else if take (length gval) st == gval then [matched (length gval) state] else [] dfa_mkop BeginCapture state = [push_capture state] dfa_mkop EndCapture state = [pop_capture state] |
Browse archivesActive forum topics |
Recent comments
22 weeks 2 days ago
22 weeks 2 days ago
22 weeks 2 days ago
44 weeks 3 days ago
48 weeks 5 days ago
50 weeks 2 days ago
50 weeks 2 days ago
1 year 6 days ago
1 year 5 weeks ago
1 year 5 weeks ago