Friday, 5 February 2016

Remove HTML from String using JSOUP


The JSoup API is a combination of flexible tools that helps us to achieve various HTML related parsing activities in JAVA.
It tell us:
1. How to remove HTML tags from a String in Java
2. How to convert HTML to plain text by filtering out tags in Java.
3. How to selectively remove all  HTML tags excluding some specific tags in Java.

Step 1:
Download the jar file from the link below and place it in you project build path.
http://central.maven.org/maven2/org/jsoup/jsoup/1.8.3/jsoup-1.8.3.jar

OR,
Use any of the dependency given below.


Step 2:
Test out the org.jsoup.Jsoup code samples given below as you may need.

File: ConvertHTML.java

package net.codermag.example;
import org.jsoup.Jsoup;

public class ConvertHTML {
 public static void main(String[] args) {

  String text = "<div><span><b style='color:blue;'>CoderMagnet:</b>The Developer&apos;s playground.</span></div>";
  System.out.println(Jsoup.parse(text).text());
 }
}

Output:

CoderMagnet:The Developer's playground.


In Certain cases we may want to remove all HTML tags but keep a few HTML tags intact.
So to filter all HTML tags except a few we need to implement the code like this.
The code below shows how to remove all HTML tags keeping some HTML tags intact.The program below shows how to remove all HTML tags except "<span>" and "<br>".

package net.codermag.example;

import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;

public class ConvertHTML {
 public static void main(String[] args) {

  String text = "<div><span><b style='color:blue;'>CoderMagnet:</b>The Developer&apos;s playground.</span></div>";
  System.out.println(Jsoup.parse(text).text());

  Whitelist whitelist = new Whitelist();
  whitelist.addTags("span", "br");
  System.out.println(Jsoup.clean(text, whitelist));

 }
}


Output:

CoderMagnet: The Developer's playground.
<span>CoderMagnet:<br>The Developer's playground.</span>



No comments:

Post a Comment

Coder Magnet
CoderMagnet is full of resources from our daily development activities. It has solutions for common problematic scenarios in technologies like Java 8, AEM, JCR and also occasionally gives you tips on Blogger as well.