On more than one occasion I have wanted to pull a HTML text field from a database to populate a portion of a webpage. This is pretty common practice and really handy because you can update pages with forms.
This works really well except when you only want to pull just a portion of HTML text but not the whole thing. For example if you want to pull the first 100 or 200 characters to display as a summary. If the HTML you are truncating contains an open HTML tag it can really screw up the layout of the page.
Well, you really had two options when you wanted to rectify this situation. You could either remove all HTML formatting from the text using a regular expression (which sucked because you’d lose any links or images or styling that was embedded) or you could simply display the text as-is and hope that there were no tags open to screw up the layout of the rest of the page.
Well since both of those solutions weren’t really solutions at all, I decided to write a function that would parse the text of the HTML you pulled from the database and return a string with all the appropriate closing tags that you truncated earlier. (I actually use this script on my blog for the front page.)
This is really easier said than done. For one, there are tags in HTML that don’t have closing elements. The ones that I could think of off the top of my head br, img, input, meta, link, and hr are all included in the code below. It is pretty easy to add other tags that I missed by simply appending to the “sTagTypesToIgnore”. If you guys find any that I missed, please let me know and I will update the code below.
Also, blank spaces and extra information make figuring out what tag is what, more of a challenge.
For those that are really interested in how this code works, I have removed comments because they made the code twice as long. I will be posting the exact same code but with all of the commenting later. If you use this code, please give credit where credit is due and leave the copyright information. Thanks.
Function CloseOpenHTMLTags(sHTMLString) 'This script is provided under the Creative Commons license located 'at http://creativecommons.org/licenses/by-nc/2.5/ . It may not 'be used for commercial purposes with out the expressed written consent 'of NateRice.com sTagTypesToIgnore = "br,img,input,meta,link,hr,!--,!doctype" If Instr(Right(sHTMLString,Len(sHTMLString) - InStrRev(sHTMLString,"<")),">") = 0 _ Then sHTMLString = Left(sHTMLString,InStrRev(sHTMLString,"<")-1) End IF sHTMLStringOrig = sHTMLString aHTMLTags = Split(sHTMLString,"<script") sHTMLString = aHTMLTags(0) For vA = 1 To UBound(aHTMLTags) 'WScript.Echo aHTMLTags(vA) aNoScripts = Split(aHTMLTags(vA),"</script>") If Ubound(aNoScripts) > 0 Then sHTMLString = sHTMLString & aNoScripts(1) Next ReDim aHTMLTags(0) aHTMLTags = Split(sHTMLString,"<") For vA = 0 To UBound(aHTMLTags) If left(trim(sHTMLString),1) <> "<" And vA = 0 Then Else aIndividualHTMLTag = Split(aHTMLTags(vA)) If Len(aHTMLTags(vA)) > 0 Then vTagFoundLoop = 0 TagFound = False sClosingTag = "" Do Until TagFound If Len(aIndividualHTMLTag(vTagFoundLoop)) > 0 And _ aIndividualHTMLTag(vTagFoundLoop) <> "/" Then aHTMLTags(vA) = sClosingTag & aIndividualHTMLTag(vTagFoundLoop) aJustTag = Split(aIndividualHTMLTag(vTagFoundLoop),">") aHTMLTags(vA) = trim(lcase(aJustTag(0))) TagFound = True Exit Do ElseIf Len(aIndividualHTMLTag(vTagFoundLoop)) > 0 And _ aIndividualHTMLTag(vTagFoundLoop) = "/" Then sClosingTag = "/" End If vTagFoundLoop = vTagFoundLoop + 1 Loop End If If Len(trim(aHTMLTags(vA))) > 0 And Len(trim(sHTMLTags)) > 0 Then _ sHTMLTags = sHTMLTags & "," & aHTMLTags(vA) If Len(trim(aHTMLTags(vA))) > 0 And Len(trim(sHTMLTags)) = 0 Then _ sHTMLTags = aHTMLTags(vA) End If Next aTagTypesToIgnore = Split(sTagTypesToIgnore,",") For Each vTagToIgnore In aTagTypesToIgnore sHTMLTags = Replace(sHTMLTags,vTagToIgnore & ",","") sHTMLTags = Replace(sHTMLTags,vTagToIgnore,"") Next ReDim aHTMLTags(0) aHTMLTags = Split(sHTMLTags,",") For vA = 0 To UBound(aHTMLTags) If InStr(aHTMLTags(vA), "/") = 0 And Len(aHTMLTags(vA)) > 0 Then For vB = vA+1 To UBound(aHTMLTags) If InStr(aHTMLTags(vB), "/") > 0 And _ InStr(aHTMLTags(vB), aHTMLTags(vA)) > 0 Then aHTMLTags(vA) = "" aHTMLTags(vB) = "" Exit For End If Next Else aHTMLTags(vA) = "" End If If Len(Trim(aHTMLTags(vA))) > 0 And Len(sOpenHTMLTags) > 0 Then _ sOpenHTMLTags = sOpenHTMLTags & "," & aHTMLTags(vA) If Len(Trim(aHTMLTags(vA))) > 0 And Len(sOpenHTMLTags) = 0 Then _ sOpenHTMLTags = aHTMLTags(vA) Next ReDim aHTMLTags(0) aHTMLTags = Split(sOpenHTMLTags,",") For vA = UBound(aHTMLTags) To 0 Step -1 sClosingTags = sClosingTags & "</" & aHTMLTags(vA) & ">" Next CloseOpenHTMLTags = sHTMLStringOrig & sClosingTags End Function