02 April 2010

TRIE data structure Part 2 : Node and TRIE class in Java




In the first part of the TRIE ADT, we saw the basics of the TRIE data structure. In this section, lets get our hands dirty by directly looking at the TRIE data structure implemented in Java.

We already saw the Node structure of the TRIE ADT had a content (char), a marker (boolean) and collection of child nodes (Collection of Node). It now has one more method called as subNode(char). This method takes a character as argument would return the child node of that character type should that be present.

package dsa.tries;

import java.util.Collection;
import java.util.LinkedList;

/**
 * @author Braga
 */
public class Node {
 char content;
 boolean marker; 
 Collection<Node> child;
 
 public Node(char c){
  child = new LinkedList<Node>();
  marker = false;
  content = c;
 }
 
 public Node subNode(char c){
  if(child!=null){
   for(Node eachChild:child){
    if(eachChild.content == c){
     return eachChild;
    }
   }
  }
  return null;
 }
}

Now that we've defined our Node, lets go ahead and look at the code for the TRIE class. Fortunately, the TRIE datastructure is insanely simple to implement since it has two major methods insert() and search(). Lets look at the elementary implementation of both these methods.

package dsa.tries;

public class Trie{
 private Node root;

 public Trie(){
  root = new Node(' '); 
 }

 public void insert(String s){
  Node current = root; 
  if(s.length()==0) //For an empty character
   current.marker=true;
  for(int i=0;i<s.length();i++){
   Node child = current.subNode(s.charAt(i));
   if(child!=null){ 
    current = child;
   }
   else{
    current.child.add(new Node(s.charAt(i)));
    current = current.subNode(s.charAt(i));
   }
   // Set marker to indicate end of the word
   if(i==s.length()-1)
    current.marker = true;
  } 
 }

 public boolean search(String s){
  Node current = root;
  while(current != null){
   for(int i=0;i<s.length();i++){    
    if(current.subNode(s.charAt(i)) == null)
     return false;
    else
     current = current.subNode(s.charAt(i));
   }
   /* 
    * This means that a string exists, but make sure its
    * a word by checking its 'marker' flag
    */
   if (current.marker == true)
    return true;
   else
    return false;
  }
  return false; 
 }
}

We shall look into the detailed working of the insert() and search() methods in separate posts, but I will tell you the gist of both those methods here. The insert() methods adds a new words to the already existing TRIE data structure. The search() method would return true or false based on whether the search string we specified exist or not.

Take a quick look at the java code above because we are going to use this in the posts that follow.

Cheers,
Bragaadeesh.

11 comments:

cutelyaware said...

Your implementation does not allow for deletion of words. IOW, your trees can only grow and will never shrink which could be a problem for some applications.

BragBoy said...

@cutelyaware: Yes, i have not implemented the delete() method, the simple reason is to keep things simple. I don't want to put in a lot of methods in one stretch in this introductory tutorial.

Anindya said...

import java.util.HashMap;
import java.util.Map;

public class Trie {
private static class Node {
private boolean isWord = false; // indicates a complete word
private int prefixes = 0; // indicates how many words have the prefix
private Map children = new HashMap(); // references to all possible children
}

private Node root = new Node();

/**
* Inserts a new word into the trie
* @param word
*/
public void insertWord(String word){
if(searchWord(word) == true) return;

Node current = root;
for(char c : word.toCharArray()){
if(current.children.containsKey(Character.valueOf(c))){
Node child = current.children.get(Character.valueOf(c));
child.prefixes++;

current = child;
}else{
Node child = new Node();
child.prefixes = 1;

current.children.put(Character.valueOf(c), child);
current = child;
}
}
// we have reached the endof the word, hence mark it true
// if during a search we reach the end of the search string and this
// flag is still false, then the search string is not registered as a valid
// word in the trie but is a prefix
current.isWord = true;
}

/**
* Searches for a word in the trie
* @param word
*/
public boolean searchWord(String word){
Node current = root;
for(char c : word.toCharArray()){
if(current.children.containsKey(Character.valueOf(c))){
current = current.children.get(Character.valueOf(c));
}else{
return false;
}
}
// if at the end of the search of entire word the boolean variable is
// still false means that the given word is not regitered as a valid
// word in the trie, but is a prefix
return current.isWord;
}

/**
* Deletes a word from the trie
* @param word
*/
public void deleteWord(String word){
if(searchWord(word) == false) return;

Node current = root;
for(char c : word.toCharArray()){
Node child = current.children.get(Character.valueOf(c));
if(child.prefixes == 1){
current.children.remove(Character.valueOf(c));
return;
}else{
child.prefixes--;
current = child;
}
}
// since the word is removed now, set the flag to false
current.isWord = false;
}

public static void main(String[] args) {
Trie trie = new Trie();
trie.insertWord("ball");
trie.insertWord("balls");
trie.insertWord("bat");
trie.insertWord("doll");
trie.insertWord("dork");
trie.insertWord("dorm");
trie.insertWord("send");
trie.insertWord("sense");

// testing deletion
//System.out.println(trie.searchWord("ball"));
//trie.deleteWord("ball");
//System.out.println(trie.searchWord("ball"));
//System.out.println(trie.searchWord("balls"));

// testing insertion
//System.out.println(trie.searchWord("dumb"));
//trie.insertWord("dumb");
//System.out.println(trie.searchWord("dumb"));
//trie.deleteWord("dumb");
//System.out.println(trie.searchWord("dumb"));
}
}

Anindya said...

Above is a complete implementation inspired from Bragaadeesh introductory tutorial. It helped me a lot in undertsanding the concepts hence I thought of posting the entire code myself. If anyone is interested practising problem, have a look at the following Topcoder problems

http://www.topcoder.com/stat?c=problem_statement&pm=7411

http://www.topcoder.com/stat?c=problem_statement&pm=6215

BragBoy said...

@Anindya : great job!

john said...

That's neat stuff. How would a the basic implementation of a delete method look for this program?

Varun said...
This comment has been removed by the author.
Sushma said...

Thanks for the code. A few questions about Anindya's delete code :-
1)if(child.prefixes == 1){
current.children.remove(Character.valueOf(c));

lets say the words cart, can, camera are inserted. Now camera is deleted. The above code removes "m" from the children list of "a". How to delete the memory allocated for "e,r,a" nodes.

2) In delete function
// since the word is removed now, set the flag to false
current.isWord = false;

Lets say do, door, does are inserted. Now "does" is deleted. So this sets "o" node a not a word. But "do" is a word.

Vaibhav Sawant said...

I completely agree with sushma .... deletion code requires some correction

Unknown said...

Space complexity of Anindya's code is too much,there are many empty cells in this implementation using hashmaps. Hence, not a desired code by many

stralo said...

Thanks a lot for the tutorial, Bragaadeesh. It's helped me a lot. However, I think there is something wrong with the search() method. I understand the need to ensure the current node isn't null, but I don't think the for-loop should be within a while loop. Please take a look at it again and let me know if there's something I'm missing.