User loginNavigation |
What makes backreferences impossible in a DFA regular expressions evaluator?Hello all, I've been lurking for a while but this is my first post, so I apologize if I'm off-topic. I have read in papers such as this one that matching regular expressions with a "Thompson's NFA" or DFA makes matching backreferences inefficient (the above paper says it's np complete). Since I've written a regex engine that uses the "Thompson's NFA" algorithm as I understand it that matches backreferences efficiently in most cases, I'm assuming I am misunderstanding either Thompson's NFA or the use cases that can cause exponential runtime. I've made the entire code available here, and I'm pasting the relevant code at the bottom of my post (it is short). Why are backreferences considered impossible with a dfa regex engine? BEGIN RELEVANT CODE -- the compiler compiles a regex to a single MatchSequence data RegexOp where MatchEndAnchor :: RegexOp MatchChar :: Char -> RegexOp MatchAny :: RegexOp MatchSequence :: [RegexOp] -> RegexOp MatchGreedy :: Int -> Int -> RegexOp -> RegexOp -- min, max, operation MatchNonGreedy :: Int -> Int -> RegexOp -> RegexOp MatchClass :: [RegexOp] -> RegexOp -- used for character classes and (..|..) MatchLookAhead :: RegexOp -> RegexOp MatchNegative :: RegexOp -> RegexOp MatchBackRef :: Int -> RegexOp BeginCapture :: RegexOp EndCapture :: RegexOp deriving (Eq, Show) ... -- a possible state, our runtime keeps a list of all possible states by using the List Monad data ReState = ReState { stream :: String ,current_groups :: [String] ,groups :: [String] ,current_match :: String } deriving (Show, Eq) ... -- start a new capture -- push_capture :: ReState -> ReState push_capture state = ReState (stream state) ([] : current_groups state) (groups state) (current_match state) ... -- finish the newest capture and add it to our groups -- pop_capture :: ReState -> ReState pop_capture state = ReState (stream state) (tail $ current_groups state) (groups state ++ [head $ current_groups state]) (current_match state) ... -- dfa_mkop is our main execution function, it takes an op and a state -- and returns a list of all possible states -- dfa_mkop :: RegexOp -> ReState -> [ReState] ... dfa_mkop (MatchBackRef refnum) state = do g <- return $ groups state if length g < refnum then [] else do gval <- return $ g !! (refnum - 1) st <- return $ stream state if length st < length gval then [] else if take (length gval) st == gval then [matched (length gval) state] else [] dfa_mkop BeginCapture state = [push_capture state] dfa_mkop EndCapture state = [pop_capture state] By Brandon Niemczyk at 2008-08-31 20:02 | LtU Forum | previous forum topic | next forum topic | other blogs | 12895 reads
|
Browse archives
Active forum topics |
Recent comments
22 weeks 6 days ago
22 weeks 6 days ago
22 weeks 6 days ago
45 weeks 19 hours ago
49 weeks 2 days ago
50 weeks 6 days ago
50 weeks 6 days ago
1 year 1 week ago
1 year 6 weeks ago
1 year 6 weeks ago