Problems parsing a string with pyparsing
Problems parsing a string with pyparsing
i was trying to parse a string with pyparsing so all the words were separated from the punctuation signs, i was using this expression to do it:
OneOrMore(Word(alphanums)) + OneOrMore(Char(printables))
But when i parse the following string with this expression:
return abc(1, ULLONG_MAX)
All the words inside the parentheses get split:
['return', 'abc', '(', '1', ',', 'U', 'L', 'L', 'O', 'N', '_', 'M', 'A', 'X', ')', ';']
But if i use this expression:
OneOrMore(Word(alphanums)) + OneOrMore(Char(string.punctuation))
Only a part of the string gets parsed:
['return', 'abc', '(']
What is wrong with those expressions?
Personally I would recommend to use regex instead for parsing, which would also allow you to more easily test your expressions. You could then get the list as
As for what's wrong with your expressions:
First expression: Once you hit
(
,OneOrMore(Char(printables))
will take over and continue matching every printable char. Instead you should use OR (|
) with the alphanumerical first for priorityOneOrMore(word | Char(printables))
Second expression. You're running into the same issue with your use of
+
. Once string.punctuation takes over, it will continue matching until it encounters a char that is not a punctuation and then stop the matching. Instead you can write:Do note that underscore is considered a punctutation so ULLONG_MAX will be split, not sure if that's what you want or not.