public final class HTMLHelper extends Object
| Modifier and Type | Field and Description |
|---|---|
static String |
LINE_TERMINATORS_REGEX_CLASS
Regex class of line terminators.
|
static Set<String> |
NODES_TO_REMOVE_WHEN_EMPTY
Names of HTML tags whose nodes should be removed when they do not have a body.
|
| Modifier and Type | Method and Description |
|---|---|
static String |
closeBrTags(String htmlContent)
Converts br tags to the XHTML compatible form.
|
static boolean |
containsNonEmptyTextNodes(String htmlContent)
Tests whether the provided HTML contains non-empty text nodes.
|
static String |
convertHTMLListToText(String htmlContent)
Converts HTML lists to readable plain text representations.
|
static String |
convertXmlSerializedHtmlToLegalHtml(String xmlSerializedHtml)
Does some cleanup to convert HTML generated by serializing an HTML DOM with an XML processor
to legal HTML.
|
static String |
encapsulateAttributesInQuotes(String htmlContent)
Some tools (e.g.
|
static String |
htmlToPlaintext(String htmlContent)
A simple helper which converts HTML to plain text which works similarly to
StringEscapeHelper.removeHtmlMarkup(String), but tries to preserve linebreaks by
replacing closing paragraphs and br elements with newlines. |
static String |
htmlToPlaintextExt(String htmlContent,
boolean convertLists)
Helper method similar to
htmlToPlaintext(String) with the difference that it uses a
more advanced approach to replace HTML lists (namely by invoking
convertHTMLListToText(String)). |
static String |
plaintextToHTML(String plainText)
A simple helper which converts a plain text string to HTML by encapsulating all lines with
paragraphs and escapes characters reserved in XML.
|
static String |
removeComments(String htmlContent)
Removes all HTML comments.
|
static String |
removeHeadElement(String htmlContent)
Strips the head element with all children from an HTML string.
|
static String |
removeNamespaces(String htmlContent)
Removes all namespace definitions and the nodes using these namespaces.
|
static String |
removeUnclosedMetaElements(String htmlContent)
Remove open meta elements.
|
static String |
stripLinebreaks(String htmlContent) |
public static final Set<String> NODES_TO_REMOVE_WHEN_EMPTY
public static final String LINE_TERMINATORS_REGEX_CLASS
public static String closeBrTags(String htmlContent)
htmlContent - the content to processpublic static boolean containsNonEmptyTextNodes(String htmlContent)
htmlContent - the HTML to testpublic static String convertHTMLListToText(String htmlContent)
htmlContent - the HTML content to parse for listspublic static String convertXmlSerializedHtmlToLegalHtml(String xmlSerializedHtml)
xmlSerializedHtml - the HTML that was serialized by an XML processorpublic static String encapsulateAttributesInQuotes(String htmlContent)
htmlContent - The content as HTML.public static String htmlToPlaintext(String htmlContent)
StringEscapeHelper.removeHtmlMarkup(String), but tries to preserve linebreaks by
replacing closing paragraphs and br elements with newlines. More over list items are
surrounded by minus and newline character.htmlContent - the HTML contentpublic static String htmlToPlaintextExt(String htmlContent, boolean convertLists)
htmlToPlaintext(String) with the difference that it uses a
more advanced approach to replace HTML lists (namely by invoking
convertHTMLListToText(String)). This method also converts hyperlinks into text
representation using UrlHelper.convertAnchorsToString(String).htmlContent - the HTML contentconvertLists - whether to convert HTML lists. If true lists will be converted by invoking
convertHTMLListToText(String). If false, list entries will only be
separated by line breaks.public static String plaintextToHTML(String plainText)
plainText - the plain textpublic static String removeComments(String htmlContent)
htmlContent - the HTML contentpublic static String removeHeadElement(String htmlContent)
htmlContent - the content to processpublic static String removeNamespaces(String htmlContent)
htmlContent - the content to processpublic static String removeUnclosedMetaElements(String htmlContent)
htmlContent - the HTML contentCopyright © 2019 Communote team. All rights reserved.