Java Regular Expression
January 9, 2007 Leave a comment
Some of the requests from the client could be weird but they seem to serve their purpose. We have been using WebSphere portal and we were given the task of filtering the output rendered by the Portal. So I created a custom
HttpServletResponseWrapper and took a copy of the HTML before sending it to the browser. This part was easy.
A filter can be configured in the web.xml of wps.war since the Portal code itself is a WAR deployed on WAS. Stripping the HTML of certain tags and filtering it was more difficult.
I tried to use Java Regex. It worked for some cases but it was very hard to use it to massage nested HTML.
Regex is cool. The following code actually closes the ‘img’ tag. Regex can be used to close HTML tags easily if they are not nested too deeply and there no newlines .Nested HTML tags broken into separate lines are pretty hard to manipulate using Regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
private String source =
"<img width=\"30\" height=\"18\" border=\"0\" src=\'/wps/images/dot.gif\' alt=\"\">";
public static void main( String[] argv ){
RegularExpression regex = new RegularExpression();
regex.changePattern();
}
public void changePattern(){
String content = source.replaceAll("<img[^>]+[>]", "$0</img> ");
System.out.println( content );
}
}