It tell us:
1. How to remove HTML tags from a String in Java
2. How to convert HTML to plain text by filtering out tags in Java.
3. How to selectively remove all HTML tags excluding some specific tags in Java.
Step 1:
Download the jar file from the link below and place it in you project build path.
http://central.maven.org/maven2/org/jsoup/jsoup/1.8.3/jsoup-1.8.3.jar
OR,
Use any of the dependency given below.
Step 2:
Test out the org.jsoup.Jsoup code samples given below as you may need.
File: ConvertHTML.java
package net.codermag.example;
import org.jsoup.Jsoup;
public class ConvertHTML {
public static void main(String[] args) {
String text = "<div><span><b style='color:blue;'>CoderMagnet:</b>The Developer's playground.</span></div>";
System.out.println(Jsoup.parse(text).text());
}
}
Output:
CoderMagnet:The Developer's playground.
In Certain cases we may want to remove all HTML tags but keep a few HTML tags intact.
So to filter all HTML tags except a few we need to implement the code like this.
The code below shows how to remove all HTML tags keeping some HTML tags intact.The program below shows how to remove all HTML tags except "<span>" and "<br>".
package net.codermag.example;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
public class ConvertHTML {
public static void main(String[] args) {
String text = "<div><span><b style='color:blue;'>CoderMagnet:</b>The Developer's playground.</span></div>";
System.out.println(Jsoup.parse(text).text());
Whitelist whitelist = new Whitelist();
whitelist.addTags("span", "br");
System.out.println(Jsoup.clean(text, whitelist));
}
}
Output:
CoderMagnet: The Developer's playground.
<span>CoderMagnet:<br>The Developer's playground.</span>