How is error information combined when parsers are combined? For example, using <|> to combine parsers, I would expect the set of expected characters for an error to be the union of the sets of expected characters from the individual parsers. (I’m finding it hard to pin down the behaviour of <|> or even to find the relevant source code.)

  • jaror@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 year ago

    To understand the behavior of megaparsec’s <|> operator it is useful to know about “consuming” and “non-consuming” (or “empty”) parses. To illustrate that concept I’ll compare a literal string parser to a parser that parses each character separately, watch:

    > let p = string "abc"
    > let q = sequence [char 'a', char 'b', char 'c']
    > parseMaybe (p <|> string "abd") "abd")
    Just "abd"
    > parseMaybe (q <|> string "abd") "abd"
    Nothing
    
    

    So, what happened? Well, when string "abc" tries to parse the string "abd" it fails without consuming any input. Or you can think of it as backtracking back to the beginning of the string. In contrast, the parser sequence [char 'a', char 'b', char 'c'] does consume the 'a' and 'b' characters even if it fails. In this case, <|> will not even try to use the string "abd" parser.

    You can manually force the parser to backtrack by using the try function as follows:

    > parseMaybe (try q <|> string "abd") "abd"
    Just "abd"
    
    

    But note that this can cause exponential running time, so try to avoid it.

    To answer your question given this information: the error information will be combined but only if both arguments of <|> are failing without consuming any input. If either consumes input, then only the error information from that branch is used.