SEARCH W7R

Saturday, June 23, 2012

Regular Expressions

w7r.blogspot.com

Regex Basics

  • Regex stands for regular expressions
  • Used to describe patterns in text files
  • Can detect dynamic patterns, such as phone numbers followed by a persons name
  • Supported in nearly every programming languages with minimal differences
  • A standard for efficient editing of text documents
  • A more powerful version of "Find and Replace"
  • Can save you a lot of time revising typed up work

Like Find and Replace on Roids

If you have heard of the Find, Replace, or Find and Replace feature on Microsoft Word or any other software, then you probably know that it can eliminate a lot of time spent on tedious and repetitive tasks such as changing every occurance of the word "he" with "she" or "Mr" to "Mrs" or "Ms".

Efficiency Is Key

No matter how large and complex a pattern becomes, the time spent finding/coding the correct regular expression will stay relatively the same. Regex code can be reused and altered to describe similar situations that do not match the original expressions criteria.

The reason Regex is efficient is ultimately because you are allowing a computer to repeat the pattern instead of you, your mouse, your eyes, your keyboard do manually.

An Example of Regex's Efficiency

A blogger, such as myself realizes that he has a lot of <b>bold text</b> tags instead of the preferred <strong>bold text</strong>. It was an easy mistake to make using the "Compose" tab in Blogger's post editor, but a very difficult problem to fix via the HTML tab in post editor.

The blogger spends 30 minutes fixing the format for one of his many posts and realizes he can't waste so much time on such a small detail. So, the next time he fixes the format of a post he utilizes the Find and Replace feature and spends only 8 minutes fixing the format of a post of similar length. He's satisfied, but does not want to waste 8 minutes each on all of his posts (8min/1post, 800min/100posts).

Using Regex the same blogger is able to apply the same edits to each post in 1 minute, because all he has to do is reuse his previously written expression (the same changes occur on each blog post). With practice regex code for these canges can be written in under 5 minutes.

Thus, the blogger could apply all the necessary format changes to his 100 blog posts in under 105 minutes. That is 1 hour and 45 minutes well spent!

Here is a regular expression I used to change my <b> tags to <strong> tags


Find: <(\s*)b((>)|(\s+.*>))
Replace: <strong>
Find: <(\s*)/(\s*)b((>)|(\s+.*>))
Replace: </strong>

No comments: