Remove Extra New Lines and Whitespace Using Java Regex

Sometimes when extracting text from another item may result in formatting issues that involve extra blank lines or leading/trailing whitespace on each line.

This commonly occurs when extracting from HTML elements or XML documents.

The following String regular expressions can fix the following issues.

(?m) = multi-line mode

The following removes the leading/trailing whitespace from each line in the string.

node.getTextContent().replaceAll(
        "(?m)^[\\s&&[^\n]]+|[\\s+&&[^\n]]+$", "");
Example:
   The quick brown fox
      jumps over
         the lazy dog.
 
Result:
The quick brown fox
jumps over
the lazy dog.

The following removes extra blank lines from the string.

node.getTextContent().replaceAll("(?m)^[ \t]*\r?\n", "");
Exmaple:
The quick brown fox
 
 
jumps over
 
 
 
the lazy dog.
 
Result:
The quick brown fox
jumps over
the lazy dog.

References
Regular.Expressions.info - Specifying Modes Inside The Regular Expression