Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Wednesday, April 30, 2008

Regular Expressions Are Fun

Miriam and I were debugging a regular expression, and it was an educational experience.

The platform is Java, and our problem is:

  • Input is a long string
  • We want to replace occurrences of "PRE[SEARCHSTRING]POST" with "PREreplacedPOST"
  • "PRE" and "POST" are patterns
  • "[SEARCHSTRING]" is a string that contains a lot of special regex character like "[", "\", and "$".
We found java.util.regex.Pattern.quote to help with the special characters in "[SEARCHSTRING]":

String quotedPattern = Pattern.quote(searchString);

Now, how do we match "PRE" and "POST" without losing them? They disappear if we try:
return input.replace("PRE" + quotedPattern + "POST", "replaced");

After some fiddling we came up with:

return input.replaceAll("(PRE)" + quotedPattern + "(POST)", "$1replaced$2");

This matches "PRE" and "POST", but references the captured subgroups with "$1" and "$2" so they don't disappear.

But we then tried the following instead:

return input.replaceAll("(?<=PRE)" + quotedPattern + "(?=POST)", "replaced");

"(?=POST)" is a zero-width positive lookahead, which is an incredibly cool technical name. It matches a pattern that does appear ahead ("positive lookahead") but this pattern won't be regarded when checking what parts of the string matched the regular expression ("zero-width"). "(?<=PRE)" is, similarly, a zero-width positive lookbehind. There are also negative lookaheads and lookbehinds that make sure the pattern doesn't appear.

Is there a more elegant way to do this?

Wednesday, April 23, 2008

Sorting time spans with jQuery Tablesorter

I've been working a little with Christian Bach's nifty Tablesorter plugin for jQuery, and I'd like to share some cool stuff I've learned.

I wanted to sort a column of time spans formatted like this: "2 days 15:11:06". I wrote a custom parser and ended up with this:

$.tablesorter.addParser({
id: 'timespan',
is: function(s) { return false; }, // don't auto detect
format: function(s) { return s.replace(/\D/g,""); },
type: 'numeric'
});

And the usage:
$("#mytable").tablesorter({
headers : {
2 : 'timespan' // the 3rd column is a time span
}
});

All it's doing is removing all non-digits and asking for a numeric comparison. It should even work equally well for "0 days 02:30:00" and for "02:30:00", since leading zeros don't affect the sorting :-)

Wednesday, February 13, 2008

Who's afraid of dynamic typing?

If you're afraid of dynamic typing, stop. Take some time to learn a good dynamically-typed language, then take some more time to learn how some people build big production systems with it, and how they manage to do it with less effort and increased safety and reliability vs. equivalent implementations in popular statically-typed languages. Regardless of your conclusions, I can guarantee you'll increase your breadth as a developer and that your new insights will improve your effectiveness in any programming language.

So why are people afraid of dynamic typing? Maybe it's the way most people approach dynamic languages:

  1. skim through some tutorials
  2. write some toy code
  3. reach a final verdict about their applicability to large-scale development
These verdicts are usually based on gut instinct and second-hand stories rather than on actual experience developing real systems with guidance on proper ways to use the language.

I'll finish with a story: One time a good C++ developer got excited about a dynamic language. He used it extensively for scripting and was familiar with all of its features, and he convinced the necessary parties to develop a real upcoming system in this language. But when writing the system he threw every one of his development practices out the window and half-assed the whole thing, writing incredible spaghetti code, not testing, and cramming in as many language features as he could. The project was not a complete failure, but the "obvious" conclusion was that this language wasn't appropriate for these tasks, whereas my conclusions are:
  1. Be responsible. You may also use a dynamic language for scripting, but when you're building a real system you're not scripting.
  2. Don't rely on static typing - it isn't there to rely on. You have to unit-test.
  3. Use your language's idioms and good shortcuts, avoid the evil ones. Seek experienced guidance on this issue.
  4. Don't get in over your head - before applying a language feature in production, become familiar with it. Don't worry, it'll wait for you!
Some articles I love on the topic:

Monday, February 11, 2008

Boo and TiddlyWiki

I'm now learning Boo (as part of learning Brail). One of the most helpful Boo sites I've found is The Book of Boo - and it's a TiddlyWiki, no less!