User loginNavigation |
What makes backreferences impossible in a DFA regular expressions evaluator?Hello all, I've been lurking for a while but this is my first post, so I apologize if I'm off-topic. I have read in papers such as this one that matching regular expressions with a "Thompson's NFA" or DFA makes matching backreferences inefficient (the above paper says it's np complete). Since I've written a regex engine that uses the "Thompson's NFA" algorithm as I understand it that matches backreferences efficiently in most cases, I'm assuming I am misunderstanding either Thompson's NFA or the use cases that can cause exponential runtime. I've made the entire code available here, and I'm pasting the relevant code at the bottom of my post (it is short). Why are backreferences considered impossible with a dfa regex engine? BEGIN RELEVANT CODE
-- the compiler compiles a regex to a single MatchSequence
data RegexOp where
MatchEndAnchor :: RegexOp
MatchChar :: Char -> RegexOp
MatchAny :: RegexOp
MatchSequence :: [RegexOp] -> RegexOp
MatchGreedy :: Int -> Int -> RegexOp -> RegexOp -- min, max, operation
MatchNonGreedy :: Int -> Int -> RegexOp -> RegexOp
MatchClass :: [RegexOp] -> RegexOp -- used for character classes and (..|..)
MatchLookAhead :: RegexOp -> RegexOp
MatchNegative :: RegexOp -> RegexOp
MatchBackRef :: Int -> RegexOp
BeginCapture :: RegexOp
EndCapture :: RegexOp
deriving (Eq, Show)
...
-- a possible state, our runtime keeps a list of all possible states by using the List Monad
data ReState = ReState {
stream :: String
,current_groups :: [String]
,groups :: [String]
,current_match :: String
} deriving (Show, Eq)
...
-- start a new capture
--
push_capture :: ReState -> ReState
push_capture state = ReState (stream state) ([] : current_groups state) (groups state) (current_match state)
...
-- finish the newest capture and add it to our groups
--
pop_capture :: ReState -> ReState
pop_capture state = ReState (stream state) (tail $ current_groups state) (groups state ++ [head $ current_groups state]) (current_match state)
...
-- dfa_mkop is our main execution function, it takes an op and a state
-- and returns a list of all possible states
--
dfa_mkop :: RegexOp -> ReState -> [ReState]
...
dfa_mkop (MatchBackRef refnum) state =
do
g <- return $ groups state
if length g < refnum
then []
else do
gval <- return $ g !! (refnum - 1)
st <- return $ stream state
if length st < length gval
then []
else if take (length gval) st == gval then [matched (length gval) state] else []
dfa_mkop BeginCapture state = [push_capture state]
dfa_mkop EndCapture state = [pop_capture state]
By Brandon Niemczyk at 2008-08-31 20:02 | LtU Forum | previous forum topic | next forum topic | other blogs | 13601 reads
|
Browse archives
Active forum topics |
Recent comments
3 days 20 hours ago
3 days 20 hours ago
3 days 20 hours ago
3 weeks 4 days ago
4 weeks 2 days ago
4 weeks 3 days ago
4 weeks 4 days ago
4 weeks 4 days ago
4 weeks 4 days ago
5 weeks 5 hours ago