Greg's Blog

helping me remember what I figure out

CFMX and jTidy

| Comments

While writing the comment code I was hoping to make use of jTidy to parse the comment passed in and have it tidied up so that any HTML provided would be valid. It was also a test for integration into the CMS for this site and it passed with flying colours on CFMX but sadly with the FREE version of BD (which this site runs on) you can’t deploy additional jar files [UPDATE: have a read of this entry and you’ll find out how to load java files on the fly]. So you’ll need to deploy Tidy.jar to your {installPath}WEB_INF/lib/ (CFMX) or {installPath}/BlueDragon_Server_61/lib/ (BD) folder and re-start your server. The approach I adopted I find far from ideal, but maybe somebody out there with more experience in using jTidy and java can provide a few hints, but here’s prinicipally how it works. I created a method (makexHTMLValid()) that expects three arguments: strToParse, thisUrl, tmpPath. The first is the string to be cleaned, the second the URL from which a while will be read in with the string to be cleaned is held and a finally a physical path where the aforementioned template is generated and held for the duration of the parsing. It does seem very laborious and it is. To further clarify the string is written to a file that jTidy then reads in by making an http connection and reads the file in. jTidy itself then writes a while out with the cleaned string and the function finishes by reading in the file and cleaning up all the temp files before returning the cleaned String. The only implementation examples I could find dealt with reading in StringBuffers using the above outline. I’d be delighted to hear of examples on converting a String variable into a StringBuffer and then back again. pathToTempFile = “/relativePath/toYourFile”; cleanedString = makeXHTMLValid(yourStringToParse, “http://”&cgi.SERVER_NAME&”/”&pathToTempFile, ExpandPath(pathToTempFile)); The function is as follows <cffunction name=”makexHTMLValid” displayname=”Tidy parser” hint=”Takes a string and url as a arguments and returns parsed and valid xHTML” output=”true”> <cfargument name=”strToParse” required=”true” type=”string” default=”” /> <cfargument name=”thisUrl” required=”true” type=”string” default=”” /> <cfargument name=”tmpPath” required=”true” type=”string” default=”” /> <cfscript> /** * This function reads in a string, checks and corrects any invalid HTML. It creates two * temporary files, because as far as I can tell jTidy relies on files for parsing * By Greg Stewart * * @param strToParse The string to parse (will be written to file). * @param thisUrl The Url to parse * @param tmpPath The location where the tmp files we be written to, must be * accessible from the web browser * @return returnPart * @author Greg Stewart (gregs(at)tcias.co.uk) * @version 1, August 22, 2004 */ var fileReadIn = “”; // xHTML output var returnPart = “”; // return variable var pageIn = “tmpIn.”&CreateUUID()&”.html”; var pageOut = tmpPath&”tmpOut.”&CreateUUID()&”.html”; var filename = tmpPath&pageIn; var writeData = “”; // create the file stream jFile = createobject(“java”, “java.io.File”); jFile.init(filename); // the file doesn’t exist so use the file stream to create it jFile.createNewFile(); // // writeFile = filename; writeData = toString(trim(arguments.strToParse)); jStream = createobject(“java”,”java.io.FileOutputStream”).init(jFile); // create the UTF-8 file writer and write the file contents jWriter = createobject(“java”, “java.io.OutputStreamWriter”); jWriter.init(jStream); jWriter.write(writeData); // flush the output, clean up and close jWriter.flush(); jWriter.close(); jStream.close(); // jTidy part jTidy = createObject(“java”,”org.w3c.tidy.Tidy”); jTidy.setQuiet(false); jTidy.setIndentContent(true); jTidy.setSmartIndent(true); jTidy.setIndentAttributes(true); jTidy.setWraplen(1024); jTidy.setXHTML(true); // build the Url to parse theUrl = arguments.thisUrl & pageIn; // create the in and out streams for jTidy u = createObject(“java”,”java.net.URL”).init(theUrl); inP = createObject(“java”,”java.io.BufferedInputStream”).init(u.openStream()); outx = createObject(“java”,”java.io.FileOutputStream”).init(pageOut); // do the parsing jTidy.parse(inP,outx); // close the stream outx.close(); // read in the validated file if (fileExists(pageOut)) { fileReader = createObject(“java”, “java.io.FileReader”); fileReader = fileReader.init(pageOut); if (isObject(fileReader)) { lineCount = 0; lineReader = createObject(“java”,”java.io.LineNumberReader”); lineReader = lineReader.init(fileReader); line = lineReader.readLine(); //Read first line, if any into variable line while (isDefined(“line”)) { lineCount = lineCount + 1; //Process the variable line fileReadIn = fileReadIn & line; line = lineReader.readLine(); //Read the next line, if any } } } // close the connection fileReader.close(); // ok now strip all the header/body stuff startPos = REFind(“<body>”, fileReadIn)+6; endPos = REFind(“</body>”, fileReadIn); returnPart = Mid(fileReadIn, startPos, endPos-startPos); // delete the temp files jFile.init(filename); jFile.delete(); jFile.init(pageOut); jFile.delete(); </cfscript> <cfreturn returnPart /> </cffunction>