Sometimes when extracting text from another item may result in formatting issues that involve extra blank lines or leading/trailing whitespace on each line.
This commonly occurs when extracting from HTML elements or XML documents.
The following String regular expressions can fix the following issues.
(?m)
= multi-line mode
The following removes the leading/trailing whitespace from each line in the string.
node.getTextContent().replaceAll(
"(?m)^[\s&&[^n]]+|[\s+&&[^n]]+$", "");
Example:
The quick brown fox
jumps over
the lazy dog.
Result:
The quick brown fox
jumps over
the lazy dog.
The following removes extra blank lines from the string.
node.getTextContent().replaceAll("(?m)^[ t]*r?n", "");
Exmaple:
The quick brown fox
jumps over
the lazy dog.
Result:
The quick brown fox
jumps over
the lazy dog.
References
Regular.Expressions.info – Specifying Modes Inside The Regular Expression
Leave a Reply