Showing posts with label TRIE. Show all posts
Showing posts with label TRIE. Show all posts
10 September 2010
Trie data structure - In C++
Having had a comprehensive coverage of the TRIE data structure in Java, me and my roommate thought it would achieve completion if we have the same implemented in C++. Don't get carried away by the length of the code. Its as simple and easy as the equivalent one in Java. You may go through the comprehensive tutorial here. Trust me, it takes only 10 minutes!!.
Please feel free to ask any questions if you face difficulties in understanding any part of the resource. I would respond to you immediately.
Demonstration of trie operations
Cheers!!
Jack.
24 July 2010
Google CodeJam 2010 : "File Fix-it" Problem with solution
Problem
On Unix computers, data is stored in directories. There is one root directory, and this might have several directories contained inside of it, each with different names. These directories might have even more directories contained inside of them, and so on.A directory is uniquely identified by its name and its parent directory (the directory it is directly contained in). This is usually encoded in a path, which consists of several parts each preceded by a forward slash ('/'). The final part is the name of the directory, and everything else gives the path of its parent directory. For example, consider the path:
/home/gcj/finals
This refers to the directory with name "finals" in the directory described by "/home/gcj", which in turn refers to the directory with name "gcj" in the directory described by the path "/home". In this path, there is only one part, which means it refers to the directory with the name "home" in the root directory.To create a directory, you can use the mkdir command. You specify a path, and thenmkdir will create the directory described by that path, but only if the parent directory already exists. For example, if you wanted to create the "/home/gcj/finals" and "/home/gcj/quals" directories from scratch, you would need four commands:
mkdir /home
mkdir /home/gcj
mkdir /home/gcj/finals
mkdir /home/gcj/quals
Given the full set of directories already existing on your computer, and a set of new directories you want to create if they do not already exist, how many mkdir commands do you need to use?
Input
The first line of the input gives the number of test cases, T. T test cases follow. Each case begins with a line containing two integers N and M, separated by a space.The next N lines each give the path of one directory that already exists on your computer. This list will include every directory already on your computer other than the root directory. (The root directory is on every computer, so there is no need to list it explicitly.)
The next M lines each give the path of one directory that you want to create.
Each of the paths in the input is formatted as in the problem statement above. Specifically, a path consists of one or more lower-case alpha-numeric strings (i.e., strings containing only the symbols 'a'-'z' and '0'-'9'), each preceded by a single forward slash. These alpha-numeric strings are never empty.
Output
For each test case, output one line containing "Case #x: y", where x is the case number (starting from 1) and y is the number of mkdir you need.Limits
1 ≤ T ≤ 100.No path will have more than 100 characters in it.
No path will appear twice in the list of directories already on your computer, or in the list of directories you wish to create. A path may appear once in both lists however. (See example case #2 below).
If a directory is listed as being on your computer, then its parent directory will also be listed, unless the parent is the root directory.
The input file will be no longer than 100,000 bytes in total.
Small dataset
0 ≤ N ≤ 10.1 ≤ M ≤ 10.
Large dataset
0 ≤ N ≤ 100.1 ≤ M ≤ 100.
Sample
Input | Output |
3 |
Solution
To attack this problem we can use the famous TRIE data structure. We have already seen enough about this data structure in steps and this becomes so handy in this particular problem.
A TRIE data structure simply stores the data that becomes easy for retrieval. But lets not bother about the retrieval part in this particular problem. We will simply increase our counter whenever we add a new folder to the existing directory structure. The place where we optimize this problem involves in its running time. Whenever we see a new directory structure, we simply pass through it in the length times instead which is linear in time jargons. The solution is given below.
private int solve(String[] already, String[] fresh) {
Trie trieDSA = new Trie();
for(int i=0;i<already.length;i++){
trieDSA.insert(already[i],false);
}
for(int i=0;i<fresh.length;i++){
trieDSA.insert(fresh[i], true);
}
return trieDSA.counter;
}
So clean, isnt it? The TRIE and the Node classes are given below. Remember, please go through my TRIE data structure introduction for few minutes, its definitely a knowledgeable one! Trust me :)
class Node {
String currentPath;
boolean marker;
Collection<Node> child;
public Node(String path){
child = new HashSet<Node>();
marker = false;
currentPath = path;
}
public Node subNode(String path){
if(child!=null){
for(Node eachChild:child){
if(eachChild.currentPath.equals(path)){
return eachChild;
}
}
}
return null;
}
}
class Trie{
private Node root;
public int counter = 0;
public Trie(){
root = new Node("");
}
public void insert(String pathArray, boolean shouldTrack){
Node current = root;
String[] paths = pathArray.substring(1).split("\\/");
if(paths.length==0){
//DO NOTHING
current.marker=true;
}
for(int i=0;i<paths.length;i++){
Node child = current.subNode(paths[i]);
if(child!=null){
current = child;
}
else{
current.child.add(new Node(paths[i]));
current = current.subNode(paths[i]);
if(shouldTrack){
counter++;
}
}
// Set marker to indicate end of the word
if(i==paths.length-1)
current.marker = true;
}
}
}
Complete source code : here
Sample Input : here
Cheers!!
Bragaadeesh.
10 April 2010
TRIE data structure Part 6 : A Sample UI
Lets have a look at a sample application in Swing. We load the TRIE data structure using a dictionary word list. And using the application we can perform a search. This application is a very much prototypical just to check the insert() and search() operations.
Input file : words.txt
Sample Application
There are two possible outcomes to our search criteria, one true and another false. The following is the dialog shown when a word "article" is entered.
And when a word something like "dasdsa" is entered the following dialog is shown.
The Java code for the Trie Loader,
package dsa.tries;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class TrieLoader {
public static Trie trieDSA;
public static void main(String[] args) {
TrieLoader trieLoader = new TrieLoader();
trieLoader.load();
new TrieTestFrame();
}
public void load(){
trieDSA = new Trie();
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(new File("src/properties/words.txt")));
String eachLine = null;
while((eachLine=br.readLine())!=null){
trieDSA.insert(eachLine);
}
} catch (Exception e) {
e.printStackTrace();
} finally{
if(br!=null){
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
The TrieTestFrame,
package dsa.tries;
import java.awt.Dimension;
import java.awt.Toolkit;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.JButton;
import javax.swing.JFrame;
import javax.swing.JOptionPane;
import javax.swing.JPanel;
import javax.swing.JTextField;
public class TrieTestFrame extends JFrame {
private static final long serialVersionUID = 1L;
private JPanel basePanel = new JPanel();
private JTextField textField = new JTextField(20);
private JButton button = new JButton("Check");
public TrieTestFrame(){
designUI();
}
private void designUI(){
this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
this.setTitle("TRIE Test");
button.addActionListener(new ActionListener() {
@Override
public void actionPerformed(ActionEvent e) {
if(TrieLoader.trieDSA.search(textField.getText())){
JOptionPane.showMessageDialog(basePanel, "The word you entered exist!!");
}else{
JOptionPane.showMessageDialog(basePanel, "The word you entered does not exist!!","Error",JOptionPane.ERROR_MESSAGE);
}
}
});
basePanel.add(textField);
basePanel.add(button);
add(basePanel);
this.setSize(400,100);
Toolkit tk = Toolkit.getDefaultToolkit();
Dimension screenSize = tk.getScreenSize();
int screenHeight = screenSize.height;
int screenWidth = screenSize.width;
setLocation(screenWidth / 2 - 200, screenHeight / 2 - 50);
this.setVisible(true);
}
}
For the Node and Trie class please click here.
Cheers,
Bragaadeesh.
TRIE data structure Part 5 : Complexity Analysis
Now that we've seen the basic operations on how to work with a TRIE, we shall now see the space and time complexities in order to get a real feel of how good a TRIE data structure is. Lets take the two important operations INSERT and SEARCH to measure the complexity.
INSERT operation first. Lets always take into account the worst case timing first and later convince ourselves of the practical timings. For every Node in the TRIE we had something called as Collection
Having these in mind, for inserting a word of length 'k' we need (k * 26) comparisons. By Applying the Big O notation it becomes O(k) which will be again O(1). Thus insert operations are performed in constant time irrespective of the length of the input string (this might look lik an understatement, but if we make the length of the input string a worst case maximum, this sentence holds true).
Same holds true for the search operation as well. The search operation exactly performs the way the insert does and its order is O(k*26) = O(1).
TRIE data structure Part 4 : The SEARCH Algorithm
In the previous section we saw how to insert string into TRIE. In this section, we shall see how to perform a search in the TRIE when a string or key is passed.
Consider the following TRIE as usual.
The search alogirthm involves the following steps
- For each character in the string, see if there is a child node with that character as the content.
- If that character does not exist, return false
- If that character exist, repeat step 1.
- Do the above steps until the end of string is reached.
- When end of string is reached and if the marker of the current Node is set to true, return true, else return false.
- See whether "d" is present in the current node's children. Yes its present, so set the current node to child node which is having character "d".
- See whether "o" is present in the current node's children. Yes its present, so set the current node to child node which is having character "o".
- Since "o" is the end of the word, see whether marker is set to true or false. Marker is set to false which means that "do" is not registered as a word in the TRIE. So, return false.
Using the same algorithm, lets perform a search for the key "ball"
- See whether "b" is present in the current node's children. Yes its present, so set the current node to child node which is having character "b".
- See whether "a" is present in the current node's children. Yes its present, so set the current node to child node which is having character "a".
- See whether "l" is present in the current node's children. Yes its present, so set the current node to child node which is having character "l".
- See whether "l" is present in the current node's children. Yes its present, so set the current node to child node which is having character "l".
- Since "l" is the end of the word, see whether marker is set to true or false. Marker is set to true which means that "ball" is registered as a word in the TRIE. So, return true
public boolean search(String s){
Node current = root;
while(current != null){
for(int i=0;i<s.length();i++){
if(current.subNode(s.charAt(i)) == null)
return false;
else
current = current.subNode(s.charAt(i));
}
/*
* This means that a string exists, but make sure its
* a word by checking its 'marker' flag
*/
if (current.marker == true)
return true;
else
return false;
}
return false;
}
The above java code does the search operation in the TRIE data structure. We shall look more into the running time and efficiency of TRIE in the next part.
Cheers,
Bragaadeesh.
02 April 2010
TRIE data structure Part 3 : The INSERT Algorithm
In this section we shall see how the insert() method on the TRIE data structure works. We shall take a specific case and analyze it with pictorial representation.
Before we begin, assume we already have an existing TRIE as shown below.
Lets see the steps on how to insert a word "bate". Any insertion would ideally be following the below algorithm.
- If the input string length is zero, then set the marker for the root node to be true.
- If the input string length is greater than zero, repeat steps 3 and 4 for each character
- If the character is present in the child node of the current node, set the current node point to the child node.
- If the character is not present in the child node, then insert a new node and set the current node to that newly inserted node.
- Set the marker flag to true when the end character is reached.
public void insert(String s){
Node current = root;
if(s.length()==0) //For an empty character
current.marker=true;
for(int i=0;i<s.length();i++){
Node child = current.subNode(s.charAt(i));
if(child!=null){
current = child;
}
else{
current.child.add(new Node(s.charAt(i)));
current = current.subNode(s.charAt(i));
}
// Set marker to indicate end of the word
if(i==s.length()-1)
current.marker = true;
}
} Now lets see how the word "bate" is getting inserted. Since the word "bate" is having length greater than zero, we can start inspecting each word.
- See whether "b" is present in the current node's children (which is root). Yes its present, so set the current node to the child node which is having the character "b".
- See whether "a" is present in the current node's children. Yes its present, so set the current node to the child node which is having the character "a".
- See whether "t" is present in the current node's children. Yes its present, so set the current node to the child node which is having the character "t".
- See whether "e" is present in the current node's children. No, its not present, so create a new node with character set to "e". Since "e" is the end of the word, set the marker flag to true.
The above picture shows how the word "bate" is inserted into the existing TRIE data structure. This example clearly shows how the insertion in a TRIE happens.
We shall take a look on the Search operation in the next part.
Cheers,
Bragaadeesh.
TRIE data structure Part 2 : Node and TRIE class in Java
In the first part of the TRIE ADT, we saw the basics of the TRIE data structure. In this section, lets get our hands dirty by directly looking at the TRIE data structure implemented in Java.
We already saw the Node structure of the TRIE ADT had a content (char), a marker (boolean) and collection of child nodes (Collection of Node). It now has one more method called as subNode(char). This method takes a character as argument would return the child node of that character type should that be present.
package dsa.tries;
import java.util.Collection;
import java.util.LinkedList;
/**
* @author Braga
*/
public class Node {
char content;
boolean marker;
Collection<Node> child;
public Node(char c){
child = new LinkedList<Node>();
marker = false;
content = c;
}
public Node subNode(char c){
if(child!=null){
for(Node eachChild:child){
if(eachChild.content == c){
return eachChild;
}
}
}
return null;
}
}
Now that we've defined our Node, lets go ahead and look at the code for the TRIE class. Fortunately, the TRIE datastructure is insanely simple to implement since it has two major methods insert() and search(). Lets look at the elementary implementation of both these methods.
package dsa.tries;
public class Trie{
private Node root;
public Trie(){
root = new Node(' ');
}
public void insert(String s){
Node current = root;
if(s.length()==0) //For an empty character
current.marker=true;
for(int i=0;i<s.length();i++){
Node child = current.subNode(s.charAt(i));
if(child!=null){
current = child;
}
else{
current.child.add(new Node(s.charAt(i)));
current = current.subNode(s.charAt(i));
}
// Set marker to indicate end of the word
if(i==s.length()-1)
current.marker = true;
}
}
public boolean search(String s){
Node current = root;
while(current != null){
for(int i=0;i<s.length();i++){
if(current.subNode(s.charAt(i)) == null)
return false;
else
current = current.subNode(s.charAt(i));
}
/*
* This means that a string exists, but make sure its
* a word by checking its 'marker' flag
*/
if (current.marker == true)
return true;
else
return false;
}
return false;
}
}
We shall look into the detailed working of the insert() and search() methods in separate posts, but I will tell you the gist of both those methods here. The insert() methods adds a new words to the already existing TRIE data structure. The search() method would return true or false based on whether the search string we specified exist or not.
Take a quick look at the java code above because we are going to use this in the posts that follow.
Cheers,
Bragaadeesh.
TRIE data structure Part 1: The TRIE ADT in Java
TRIE is an interesting data-structure used mainly for manipulating with Words in a language. This word is got from the word retrieve. TRIE (pronounced as 'try') has a wide variety of applications in
- Spell checking
- Data compression
- Computational biology
- Routing table for IP addresses
- Storing/Querying XML documents etc.,
The main abstract methods of the TRIE ADT are,
public void insert(String s); public boolean search(String s);
In this data-structure, each node holds a character instead of a String. Each node has something called as 'marker' to mark the end of a word. And each node has a Collection of child nodes. This Collection can be either a Set or a List based on the speed vs space criterion.
The basic element - Node of a TRIE data structure looks like this,
char content; boolean marker; Collection<Node> child;
A TRIE tree would typically look like the following
The above TRIE is constructed by inserting the words ball, bat, doll, dork, do, dorm, send, sense. The markers are denoted on the Node using a red star(*). We shall look into more about the TRIE data structure in depth in the next post.
Subscribe to:
Posts (Atom)








