Raven’s new Site Performance tool instantly collects 30+ metrics for any website. Then, you can… well, find out.

Stripping Styles and Deprecated HTML Elements and Attributes

Raven

Stripping Styles and Deprecated HTML Elements and Attributes

One of the most frustrating and tedious tasks for web designers is stripping out improper characters and deprecated HTML elements and attributes. Improper characters include the apostrophes, quotation marks, and em-dashes that are created from Word or Pages documents. Deprecated HTML elements and attributes include <font>, target=”", and align=”".

The most common scenario involves getting a Word document from a client or reengineering the HTML of an old-school website. Usually a web designer has to do a series of Find & Replaces. If there’s a lot to Find & Replace, it ends up using valuable time that could be spent designing or reengineering the HTML.

To streamline this unavoidable process, I asked Scott to write me a script that I could use in BBEdit (or TextWrangler) that will strip all unnecessary information from HTML documents. For me, that meant stripping out all inline styles, classes, deprecated HTML elements and attributes, and replacing improper characters with proper ones.

Scott was able to create a Perl script that strips the HTML down to its bare essentials – the way God intended. Now, all I have to do is open the HTML document in BBEdit, and then run the Unix Filter script. Here’s a download link for the script, and instructions on how to install and use it.

  1. Download HTML Stripper You will want to right-click this link to save it
  2. Copy the file to /User/[home]/Library/Application Support/BBEdit/Unix Support/Unix Filters
  3. There are two way to run the script in BBEdit:
    1. From BBEdit’s menu, click on Window – Palettes – Unix Filters. The Unix Filters palette will appear with HTML Stripper as one of the options. Open the HTML document that you would like to strip, click on HTML Stripper in the palette, and then click on the Run button.
    2. Open the HTML document that you would like to strip. From BBEdit’s menu, click on Unix Filters – HTML Stripper.

Update (05/18/2006) – The HTML Stripper is now available for TextMate

Update by @RavenJeremy (07/2012) – We’ve removed the links to those tools because they don’t exist anymore, but here’s a link to one that’s still around.

Tell us what you think

  • Allan

    YOU RULE!!! Thanks!!

  • Allan

    I’ve got something funky going on. I just tried a test and this:
    <h1><a href=”/”><img src=”/images/logo.png” id=”logo” alt=”Some Logo” border=”0″ align=”left”></a></h1>

    I end up with:

    <h1><a href=”/”><em></a></h1>

    Any idea’s? I’m running this through TextWrangler 2.1 on OS X.4.4.

    This would be tooooo sweet if it ends up working for me. As much as I’ve come to appreciate the finer points of find & replace, this seems more logical and HTMLTidy is laborious to figure out. If only I knew more perl or even some regex I could have more valuable input …

  • http://www.sitening.com/ Jon Henshaw

    Allan, I’ll check it out and get it fixed. I’ll repost it here once I get it figured out.

  • http://www.sitening.com/ Jon Henshaw

    Okay, Scott fixed your problem. The download link has been updated with the latest script.

  • Allan

    Radical. All is well with the cosmos :) Thanks again!!

  • http://www.tylerhall.ws Tyler

    Heck yeah! That script saved me twenty minutes at work yesterday. I even got to go to lunch twenty minutes early :)

  • Diane

    Nice tool, Jon, too bad it only works on a MAC or a UNIX platform.

    For those of us on PC’s and Windows, HTML Tidy does the same job nicely ;)