Java Regular Expression

Some of the requests from the client could be weird but they seem to serve their purpose. We have been using WebSphere portal and we were given the task of filtering the output rendered by the Portal. So I created a custom
HttpServletResponseWrapper and took a copy of the HTML before sending it to the browser. This part was easy.
A filter can be configured in the web.xml of wps.war since the Portal code itself is a WAR deployed on WAS. Stripping the HTML of certain tags and filtering it was more difficult.
I tried to use Java Regex. It worked for some cases but it was very hard to use it to massage nested HTML.

Regex is cool. The following code actually closes the ‘img’ tag. Regex can be used to close HTML tags easily if they are not nested too deeply and there no newlines .Nested HTML tags broken into separate lines are pretty hard to manipulate using Regex.

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegularExpression{


private String source =
"<img width=\"30\" height=\"18\" border=\"0\" src=\'/wps/images/dot.gif\' alt=\"\">";

public static void main( String[] argv ){
RegularExpression regex = new RegularExpression();
regex.changePattern();


}


public void changePattern(){
String content = source.replaceAll("<img[^>]+[>]", "$0</img> ");
System.out.println( content );

}

}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: