Close Open HTML Tags in a String with VBScript (Without Using a Regular Expression) – CloseOpenHTMLTags.vbs

On more than one occasion I have wanted to pull a HTML text field from a database to populate a portion of a webpage. This is pretty common practice and really handy because you can update pages with forms.

This works really well except when you only want to pull just a portion of HTML text but not the whole thing. For example if you want to pull the first 100 or 200 characters to display as a summary. If the HTML you are truncating contains an open HTML tag it can really screw up the layout of the page.

Well, you really had two options when you wanted to rectify this situation. You could either remove all HTML formatting from the text using a regular expression (which sucked because you’d lose any links or images or styling that was embedded) or you could simply display the text as-is and hope that there were no tags open to screw up the layout of the rest of the page.

Well since both of those solutions weren’t really solutions at all, I decided to write a function that would parse the text of the HTML you pulled from the database and return a string with all the appropriate closing tags that you truncated earlier. (I actually use this script on my blog for the front page.)

This is really easier said than done. For one, there are tags in HTML that don’t have closing elements. The ones that I could think of off the top of my head br, img, input, meta, link, and hr are all included in the code below. It is pretty easy to add other tags that I missed by simply appending to the “sTagTypesToIgnore”. If you guys find any that I missed, please let me know and I will update the code below.

Also, blank spaces and extra information make figuring out what tag is what, more of a challenge.

For those that are really interested in how this code works, I have removed comments because they made the code twice as long. I will be posting the exact same code but with all of the commenting later. If you use this code, please give credit where credit is due and leave the copyright information. Thanks.

Function CloseOpenHTMLTags(sHTMLString)
  'This script is provided under the Creative Commons license located
  'at http://creativecommons.org/licenses/by-nc/2.5/ . It may not
  'be used for commercial purposes with out the expressed written consent
  'of NateRice.com
  
  sTagTypesToIgnore = "br,img,input,meta,link,hr,!--,!doctype"
  If Instr(Right(sHTMLString,Len(sHTMLString) - InStrRev(sHTMLString,"<")),">") = 0 _
  Then
    sHTMLString = Left(sHTMLString,InStrRev(sHTMLString,"<")-1)
  End IF
  sHTMLStringOrig = sHTMLString
  aHTMLTags = Split(sHTMLString,"<script")
  sHTMLString = aHTMLTags(0)
  For vA = 1 To UBound(aHTMLTags)
    'WScript.Echo aHTMLTags(vA)
    aNoScripts = Split(aHTMLTags(vA),"</script>")
    If Ubound(aNoScripts) > 0 Then sHTMLString = sHTMLString & aNoScripts(1)
  Next
  ReDim aHTMLTags(0)
  aHTMLTags = Split(sHTMLString,"<")
  For vA = 0 To UBound(aHTMLTags)
    If left(trim(sHTMLString),1) <> "<" And vA = 0 Then
    Else
      aIndividualHTMLTag = Split(aHTMLTags(vA))
      If Len(aHTMLTags(vA)) > 0 Then
        vTagFoundLoop = 0
        TagFound = False
        sClosingTag = ""
        Do Until TagFound
          If Len(aIndividualHTMLTag(vTagFoundLoop)) > 0 And _
          aIndividualHTMLTag(vTagFoundLoop) <> "/" Then
            aHTMLTags(vA) = sClosingTag & aIndividualHTMLTag(vTagFoundLoop)
            aJustTag = Split(aIndividualHTMLTag(vTagFoundLoop),">")
            aHTMLTags(vA) = trim(lcase(aJustTag(0)))
            TagFound = True
            Exit Do
          ElseIf Len(aIndividualHTMLTag(vTagFoundLoop)) > 0 And _
          aIndividualHTMLTag(vTagFoundLoop) = "/" Then
            sClosingTag = "/"
          End If
          vTagFoundLoop = vTagFoundLoop + 1
        Loop
      End If
      If Len(trim(aHTMLTags(vA))) > 0 And Len(trim(sHTMLTags)) > 0 Then _
      sHTMLTags = sHTMLTags & "," & aHTMLTags(vA)
      If Len(trim(aHTMLTags(vA))) > 0 And Len(trim(sHTMLTags)) = 0 Then _
      sHTMLTags = aHTMLTags(vA)
    End If
  Next
  aTagTypesToIgnore = Split(sTagTypesToIgnore,",")
  For Each vTagToIgnore In aTagTypesToIgnore
    sHTMLTags = Replace(sHTMLTags,vTagToIgnore & ",","")
    sHTMLTags = Replace(sHTMLTags,vTagToIgnore,"")
  Next
  ReDim aHTMLTags(0)
  aHTMLTags = Split(sHTMLTags,",")
  For vA = 0 To UBound(aHTMLTags)
    If InStr(aHTMLTags(vA), "/") = 0 And Len(aHTMLTags(vA)) > 0 Then
      For vB = vA+1 To UBound(aHTMLTags)
        If InStr(aHTMLTags(vB), "/") > 0 And _
        InStr(aHTMLTags(vB), aHTMLTags(vA)) > 0 Then
          aHTMLTags(vA) = ""
          aHTMLTags(vB) = ""
          Exit For
        End If
      Next
    Else
      aHTMLTags(vA) = ""
    End If
    If Len(Trim(aHTMLTags(vA))) > 0 And Len(sOpenHTMLTags) > 0 Then _
    sOpenHTMLTags = sOpenHTMLTags & "," & aHTMLTags(vA)
    If Len(Trim(aHTMLTags(vA))) > 0 And Len(sOpenHTMLTags) = 0 Then _
    sOpenHTMLTags = aHTMLTags(vA)
  Next
  ReDim aHTMLTags(0)
  aHTMLTags = Split(sOpenHTMLTags,",")
  For vA = UBound(aHTMLTags) To 0 Step -1
    sClosingTags = sClosingTags & "</" & aHTMLTags(vA) & ">"
  Next
  CloseOpenHTMLTags = sHTMLStringOrig & sClosingTags
End Function

 

Leave a Comment