Saturday, February 11, 2012

The Hidden Gem : SPHttpUtility.ConvertSimpleHtmlToText()

I had a requirement which needed to extract only first 200 characters from a Richtextbox in a custom webpart.Now this RichtextBox may contain tables,images,white spaces..html. .... The approach was first to strip all the HTML from the RichTextBox.Text and then remove the white spaces ... 
           
                   //add the following namespace
                 using Microsoft.SharePoint.Utilities;
                 using System.Text.RegularExpressions;

                 //converts all HTML into TEXT
                string convertHtmlToText = SPHttpUtility.ConvertSimpleHtmlToText(RichTextBox.Text, RichTextBox.Length);

                //replace extra white space with single white space
                string removedExtraWhiteSpace = System.Text.RegularExpressions.Regex.Replace(convertHtmlToText , @"\s+", " ");

2 comments: