18 | July | 2013 | MindSpace

Parse JSP using BeautifulSoup

July 18, 2013 Leave a comment

I had to parse a tangle of JSP’s to identify how many HTML controls were calling JavaScript
functions that make AJAX calls back to the application.

So if a ‘key press’ event is fired when a user ‘tabs out’ or presses ‘Enter’ on a
textbox then I wanted the scan to find that.

<html:text maxlength="30" onblur="blurAction(this)" onfocus="displayFieldMsg(this)" onkeydown="keyDownEvents(this)" onkeypress="keyPressEvents(this)" onkeyup="convertUCase(this)" property="txtExtCredit" size="40" style="text-align:left;" styleclass="inputfld"></html:text>

My python skills are rudimentary but this code is able to scan and show a list of ‘html:text’ Struts tags. PyDev eclipse plugin comes in handy for python development.

The code can be further enhanced for more complex scans which I plan to do.


from bs4 import BeautifulSoup
import fnmatch
import sys
import re
import os
import glob

class Parse:

    def __init__(self):
        print 'parsing'
        self.parse()
        #self.folderwalk()

    def parse(self):
        try:
            path = "D:\\path"

            for infile in glob.glob(os.path.join(path, "*.jsp")):
                markup = (infile)
                print markup
                soup = BeautifulSoup(open(markup, "r").read())
            
                data=soup.findAll(re.compile('^html:text'),attrs={'onkeypress':re.compile('^keyPressEvents')})
                for i in data:
                    print i
                     
        except IOError as e:
            print "I/O error({0}): {1}".format(e.errno, e.strerror)
        except:
            print "Unexpected error:", sys.exc_info()[0]
            print "Unexpected error:", markup


    # Not used at this time
    def folderwalk(self):

        rootdir = "D:\\path"
        folderlist =0, []
        
        #Pattern to be matched
        includes = ['*.jsp']
        
        try:
            for root, subFolders, files in os.walk(rootdir):
                for extensions in includes:
                    for filename in fnmatch.filter(files, extensions):
                        print filename
                        #folderlist.append()
        except IOError as e:
            print "I/O error({0}): {1}".format(e.errno, e.strerror)
        except:
            print "Unexpected error:", sys.exc_info()[0]

    
if __name__ == '__main__':
    instance = Parse()

Filed under Python

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

MindSpace

Parse JSP using BeautifulSoup

Blogroll