public final class HTMLHelper extends Object
Modifier and Type | Field and Description |
---|---|
static String |
LINE_TERMINATORS_REGEX_CLASS
Regex class of line terminators.
|
static Set<String> |
NODES_TO_REMOVE_WHEN_EMPTY
Names of HTML tags whose nodes should be removed when they do not have a body.
|
Modifier and Type | Method and Description |
---|---|
static String |
closeBrTags(String htmlContent)
Converts br tags to the XHTML compatible form.
|
static boolean |
containsNonEmptyTextNodes(String htmlContent)
Tests whether the provided HTML contains non-empty text nodes.
|
static String |
convertHTMLListToText(String htmlContent)
Converts HTML lists to readable plain text representations.
|
static String |
convertXmlSerializedHtmlToLegalHtml(String xmlSerializedHtml)
Does some cleanup to convert HTML generated by serializing an HTML DOM with an XML processor
to legal HTML.
|
static String |
encapsulateAttributesInQuotes(String htmlContent)
Some tools (e.g.
|
static String |
htmlToPlaintext(String htmlContent)
A simple helper which converts HTML to plain text which works similarly to
StringEscapeHelper.removeHtmlMarkup(String) , but tries to preserve linebreaks by
replacing closing paragraphs and br elements with newlines. |
static String |
htmlToPlaintextExt(String htmlContent,
boolean convertLists)
Helper method similar to
htmlToPlaintext(String) with the difference that it uses a
more advanced approach to replace HTML lists (namely by invoking
convertHTMLListToText(String) ). |
static String |
plaintextToHTML(String plainText)
A simple helper which converts a plain text string to HTML by encapsulating all lines with
paragraphs and escapes characters reserved in XML.
|
static String |
removeComments(String htmlContent)
Removes all HTML comments.
|
static String |
removeHeadElement(String htmlContent)
Strips the head element with all children from an HTML string.
|
static String |
removeNamespaces(String htmlContent)
Removes all namespace definitions and the nodes using these namespaces.
|
static String |
removeUnclosedMetaElements(String htmlContent)
Remove open meta elements.
|
static String |
stripLinebreaks(String htmlContent) |
public static final Set<String> NODES_TO_REMOVE_WHEN_EMPTY
public static final String LINE_TERMINATORS_REGEX_CLASS
public static String closeBrTags(String htmlContent)
htmlContent
- the content to processpublic static boolean containsNonEmptyTextNodes(String htmlContent)
htmlContent
- the HTML to testpublic static String convertHTMLListToText(String htmlContent)
htmlContent
- the HTML content to parse for listspublic static String convertXmlSerializedHtmlToLegalHtml(String xmlSerializedHtml)
xmlSerializedHtml
- the HTML that was serialized by an XML processorpublic static String encapsulateAttributesInQuotes(String htmlContent)
htmlContent
- The content as HTML.public static String htmlToPlaintext(String htmlContent)
StringEscapeHelper.removeHtmlMarkup(String)
, but tries to preserve linebreaks by
replacing closing paragraphs and br elements with newlines. More over list items are
surrounded by minus and newline character.htmlContent
- the HTML contentpublic static String htmlToPlaintextExt(String htmlContent, boolean convertLists)
htmlToPlaintext(String)
with the difference that it uses a
more advanced approach to replace HTML lists (namely by invoking
convertHTMLListToText(String)
). This method also converts hyperlinks into text
representation using UrlHelper.convertAnchorsToString(String)
.htmlContent
- the HTML contentconvertLists
- whether to convert HTML lists. If true lists will be converted by invoking
convertHTMLListToText(String)
. If false, list entries will only be
separated by line breaks.public static String plaintextToHTML(String plainText)
plainText
- the plain textpublic static String removeComments(String htmlContent)
htmlContent
- the HTML contentpublic static String removeHeadElement(String htmlContent)
htmlContent
- the content to processpublic static String removeNamespaces(String htmlContent)
htmlContent
- the content to processpublic static String removeUnclosedMetaElements(String htmlContent)
htmlContent
- the HTML contentCopyright © 2019 Communote team. All rights reserved.