Miriam and I were debugging a regular expression, and it was an educational experience.
The platform is Java, and our problem is:
- Input is a long string
- We want to replace occurrences of "PRE[SEARCHSTRING]POST" with "PREreplacedPOST"
- "PRE" and "POST" are patterns
- "[SEARCHSTRING]" is a string that contains a lot of special regex character like "[", "\", and "$".
String quotedPattern = Pattern.quote(searchString);
Now, how do we match "PRE" and "POST" without losing them? They disappear if we try:
return input.replace("PRE" + quotedPattern + "POST", "replaced");
After some fiddling we came up with:
return input.replaceAll("(PRE)" + quotedPattern + "(POST)", "$1replaced$2");
This matches "PRE" and "POST", but references the captured subgroups with "$1" and "$2" so they don't disappear.
But we then tried the following instead:
return input.replaceAll("(?<=PRE)" + quotedPattern + "(?=POST)", "replaced");
"(?=POST)" is a zero-width positive lookahead, which is an incredibly cool technical name. It matches a pattern that does appear ahead ("positive lookahead") but this pattern won't be regarded when checking what parts of the string matched the regular expression ("zero-width"). "(?<=PRE)" is, similarly, a zero-width positive lookbehind. There are also negative lookaheads and lookbehinds that make sure the pattern doesn't appear.
Is there a more elegant way to do this?


