Zhaba Zhournal | |||||
Friday, October 10, 2003
Editing geekery Okay, this is how much of a regular-expression geek I am: I have previously had dreams in which I did regular expression searches, but last night I had one that was so accurate I was able to use it, to good effect, at work this morning. (Warning: This next bit probably doesn't make much/any sense if you don't use regular expressions. But I'm pleased with myself, so I'm writing it anyway.) <geek> I'd been searching the files I'm editing for HTML character entitiesé instead of é, for instance. So I searched for this: &[^&;]+;Except that kept bringing up every line that had an ampersand and a semicolon, including the lines with bizarre hidden Word coding. (This, for instance: T]âÞ?CæÍ@¨Ž¡T0æ?}GÕÕÚ¡øEtó;ðoKôv(úÄÔÜÜÃÞ>DÇP/·æi¼l?DsÅ&ÑÍÚyÂ+Ç;D¦}ú‚ What the hell is that?) And I wound up with the dismaying message "Found 109 occurrence(s) in 32 file(s)"a pain in the neck to individually examine. Feh. But in my dream, it occurred to me to limit the size of the string: &[^&;]{2,8}; I.e., to only catch strings with two to eight characters between the & and the ;. And I managed to remember it, and I tried it when I got to work this morning. And not only did it work, I was actually correct about the number of characters in a legitimate HTML entity. (In general, the longest ones have six characters, but in the Greek language set there's one with eight characters: ϑ I don't know if that will show up if I try to put it in here, though. ϑdoes that work on anyone's browser?) Anyway, I ran that search, and got this: "Found 5 occurrence(s) in 5 file(s)". Hey, I like those numbers a lot better... </geek> And now, back to my regularly scheduled editing... [ at 10:02 AM • by Abby • permalink • ] |
|