Friday, January 17, 2014

TextAnalyzer: Adding a Lexicon Using Java's HashMap

This will be part 2 of the TextAnalyzer post.  In this one, we will create a second analysis method.

TextAnalyzer

We will be adding a lexicon method to the TextAnalyzer class.  The lexicon is a useful and interesting tool, as it will analyze the text and return how many times each word is used in it.  For instance, if we called the lexicon method on a file with "hello hello world" in it, the lexicon will return the following:
hello: 2
world: 1

To produce this information we will use Java's HashMap.  The HashMap allows you to associate a key (the word) with a value (number of times the word is used).  We will traverse the ArrayList, and at each item in the list we will see if it is already in the HashMap.  If the word is in the HashMap, we will update it's associated value to be the value + 1 (because we have found it one more time).  If it is not in the HashMap, we will add it in with the value being 1, because it is the first time we have seen it.  The code for the lexicon method is as follows:

public String lexicon(boolean writeOut) {
 //output will be stored in this StringBuffer
 StringBuffer lexStringBuf = new StringBuffer("Lexicon: \n \n");
 HashMap<String, Integer> lexMap = new HashMap<String, Integer>();
 //following for loop iterates through
 //each item in the ArrayList
 for (String s : contents) {
  //remove punctuation except "'" and 
  //convert string to lowercase
  s = s.toLowerCase();
  if (lexMap.containsKey(s)) {
   lexMap.put(s, (lexMap.get(s) + 1) );
  } else {
   lexMap.put(s, 1);
  }
 }
 //now lexMap contains lexicon
 //use an iterator to traverse the map
 //and add to the StringBuffer
 Iterator<String> iterator = lexMap.keySet().iterator();  
  
 while (iterator.hasNext()) {  
  String key = iterator.next();  
  String value = lexMap.get(key).toString();      
  lexStringBuf.append(key + ": " + value + "\n");
 }
 //stringbuffer now has full output
 //create new file if writeout is true
 if (writeOut) {
  try {
   writer = new PrintWriter(fileName + "-lexicon.txt", "UTF-8");
  } catch (FileNotFoundException | UnsupportedEncodingException e) {
   System.out.println("Error: " + fileName + " lexicon failed");
   e.printStackTrace();
  }
  writer.print(lexStringBuf.toString());
  writer.close();
  }
 //return string
 return lexStringBuf.toString();
}

The first portion creates a StringBuffer (making it easy to create a large string by simply calling append()), and the HashMap which will map Strings to Integers.

The for loop goes through each String in the contents ArrayList (from the first TextAnalyzer post), converts it to lowercase (to avoid The and the getting separately matched), and then adds it to the HashMap.  If the word is already in the map, it updates the value to value+1, otherwise it adds it to the map with a value of 1.

With the HashMap filled out, the Iterator portion is where we traverse the HashMap to get all of the key/value pairs and add them to our StringBuffer.  The Iterator is initialized to the Iterator of the key set of our HashMap.  This means that the Iterator will be traversing the key set.  The key set is the set of keys from our HashMap; in this case the set of words in the HashMap.  While there is a key, it grabs the key and the associated value, and then adds it to our StringBuffer.  

The format of writing to the file will be improved in a later post to list the most common words first, and to give the percentage of the total word count that each word makes up.  But for now we will stick with the simple example.

After creating the string, we again face the choice of simply returning it or writing it to a file and returning it, which is again handled by the if (writeOut) statement.  This will complete the lexicon() method for now!


Tester

It's now time to update our Tester.  In the main method, add the following line after the creation of myAnalyzer:

myAnalyzer.lexicon(true);

Now go ahead and run your program and see what you get!
For the Moby Dick example, the output should look something like:

Lexicon: 
brave: 18
morgana: 1
approving: 1
champions: 1
unaccountable: 16
cripple: 3
writerof: 1
jew: 1
...... and so on!

There you have it!
Again, the lexicon will be improved later, but will focus on using Comparators for custom classes.

TextAnalyzer: Reading and Writing Text Files in Java

This will be a quick tutorial on text file input/output in Java.  For this project I will be creating a short program that will take text files and return a file of statistics such as number of times each word occurs, number of words, etc.  The text files I will be using come from Project Gutenberg, an online project which allows you to download many books as text files.

This project will be constructed with 4 classes: Tester, FileTextReader, TextAnalyzer
- Tester will be the class in which we run all of the tests on the other files.
- FileTextReader will be responsible for reading the file and returning an ArrayList of strings of the contents.
- TextAnalyzer will take the file name and call on FileTextReader to get the ArrayList of file contents.  It will then perform the analysis on them, creating a new file summarizing the findings.


FileTextReader

First we will begin with the constructor. The FileTextReader will need a File and a Scanner.  The Scanner will be used to parse the text in the given File.  The File will be initialized in the constructor from a file name like so:

private File fileToRead  = null;
private Scanner myScanner = null; 

public FileTextReader(String fileName) {
fileToRead = new File(fileName);
}

Next we will write the readText() method of the FileTextReader class, which will split the contents of the file up by spaces and return them in an ArrayList of strings:

public ArrayList<String> readText() {
 ArrayList<String> textList = new ArrayList<String>();
 String toAdd = null;
 try {
  //get scanner
  myScanner = new Scanner(fileToRead);
  //while there is a token to take
  while(myScanner.hasNext()) {
   //if token is made of legal chars
   toAdd = myScanner.next().replaceAll("[^a-zA-Z\' ]", "");
   if (toAdd.length() > 0) {
    //add it!
    textList.add(toAdd);
   }
  }
 } catch (FileNotFoundException e) {
  System.out.println("Error: File not found");
  e.printStackTrace();
 } 
 return textList;
}

readText() uses the Scanner to parse through the input file and save the contents to an ArrayList.  The next() method of the Scanner object will return the next sequence of characters up to a space in the input file.  For instance, if the input file was "foo bar", the first call to next() would produce "foo", and the second would produce "bar".  The hasNext() method returns true if there is a token left in the input file, and false otherwise.  Thus we used the hasNext() method as the test case for the while loop, making sure we grabbed each token of the file and placing it into the ArrayList.

replaceAll() is a String method which replaces any characters matching the regex in the first parameter with the character in the second parameter.  In this case, we are replacing any character that is not a letter or an apostrophe with an empty space.  We then check to ensure the length of the string is greater than 0.  This ensures that stray bits of characters such as "----" or "1." do not get added into the text file.  This will keep us from getting false word counts.

This will complete the FileTextReader class.


TextAnalyzer

Next we will write the TextAnalyzer class.  This class will be responsible for analyzing the ArrayList generated in the FileTextReader class.  To begin, we will create the following constructor:

private ArrayList<String> contents = null;
private String fileName = null;
private PrintWriter writer = null;

public TextAnalyzer(String fName) {
 contents = (new FileTextReader(fName)).readText();
 fileName = fName;
}

The TextAnalyzer takes a file name, and then stores the ArrayList of the contents of that file by creating a FileTextReader object and calling readText() on it.  This saves time later, because if we store this information at the beginning, we will not have to recalculate it every time we want to perform a different analysis.  Next, it saves the file name, which will be used to name the text files we will output.

With the constructor complete, it's time to write our first analysis method.  For now, we will make a simple word count.

public String wordCount(boolean writeOut) {
 String count = "Word count: " + contents.size();
 if (writeOut) {
  try {
   writer = new PrintWriter(fileName + "-word-count.txt", "UTF-8");
  } catch (FileNotFoundException | UnsupportedEncodingException e) {
   System.out.println("Error: " + fileName + " word count failed");
   e.printStackTrace();
  }
  writer.println(count);
  writer.close();
 }
 return count;
}

This method begins by storing a string with the current word count.  The method gets the word count by using the size() method of the ArrayList contents which was initialized in the constructor.  The portion that creates a text file output of the word count is encased in an if statement because we may not always want to produce an output file.  For instance, later we will write a method which will perform all of the analysis methods we have created on a given file.  For this use, we do not want all of the files generated; rather we will want all of the information placed within a single output file.

To create the output file, we create a PrintWriter.  The parameters used to create the PrintWriter are the name of the output file (be sure to include the extension), and the format.  Once the PrintWriter has been successfully initialized, we can write to the output file using methods such as print() and println().  The difference between these methods is println() will create a new line at the end of the input, where as print() will not terminate the line after printing the string.

Now that we have a basic method to analyze the text, it's time to create the Tester and make sure everything is good so far.


Tester

The tester will be a simple class that will run our code.  It does not need a constructor; we will only be writing a main() method for it.  We will also need to write a short test text file.  For the initial test, I just made a small file 'test.txt' that contained the text "hello world".  Save this file in the same folder as your code.  The Tester class will only need this in it for now:

public static void main(String[] args) {
 TextAnalyzer myAnalyzer = new TextAnalyzer("test.txt");
 myAnalyzer.wordCount(true);
}

Now when you run the Tester file, you should see a new file  'test.txt-word-count.txt' in your directory with the contents:
Word count: 2

Running this program on a download of Moby Dick from Project Gutenberg gives a word count of 214872.

Congratulations! Our initial text analyzer works.  In following posts more analysis tools will be added, along with the option of analyzing entire folders at a time instead of single files.

Thursday, January 16, 2014

Enable Debugging Settings on Android 4.2 and later

In Android 4.2 and later, the debugging settings options are hidden.  To unlock them, follow these steps:

  1. Go to the settings screen.  
  2. Under the 'more' tab, select 'About Device'
  3. Find 'Build number' and tap it 7 times (after a few times text will pop up telling you how many to go)
  4. Go back to the settings menu, and 'Developer Options' will be right above 'About Device'

Wednesday, January 15, 2014

Setting Up LuaJava to Integrate Lua and Java in Windows

LuaJava is a library used to interface Lua and Java files.  Once installed, Lua files can call on Java files, and vice versa.  This can be useful for scripting and for integrating files that are coded in the two languages.  Installing LuaJava for Windows is very simple.  This tutorial assumes you already have Java and Eclipse installed. I used a 32 bit version of Java (jdk 1.7.0_45).

First, download LuaJava from here.  Select luajava-1.1-win32-lua51.zip.  Once it has been downloaded, extract everything to a new folder.

Next, open Eclipse and create a new Java project.  Name the project whatever you like; I went with 'LuaJavaTest'.  Once you have created your project, right-click on it in the 'Package Explorer' and select 'Properties' (the last item). This will bring up the Properties pane.  On the left side, select 'Java Build Path'.  Now you will have a box with 4 tabs on the right side.  Select the 'Libraries' tab.

There are two things we need to do here.  First, click 'Add External JARs'.  In the dialog that follows, navigate to the location where you extracted the download to and select the 'luajava-1.1' file.

Once you have selected the library, we now will add it to the Native Library Location.  Back on the Properties pane, click the triangle next to 'JRE System Library'.  When it expands, select 'Native library location', and click edit.  In the dialog that follows, click External Folder, and change it to the directory in which you extracted the laujava-1.1.jar.  Confirm these changes and exit out of the Properties pane.


With everything set up, all that's left is to test it!
(This test comes from the LuaJava examples page.)

First things first, we'll make the lua file.  Right-click on the project and create a new file.  Name the file 'hello.lua', and place the following in it:

print("Hello world from lua!")

Next, we will create the Java file.  Create a new class 'hello'.  In hello, put this source code:

public class Hello {

public static void main(String[] args)
 {
   LuaState L = LuaStateFactory.newLuaState();
   L.openLibs();
   
   L.LdoFile("hello.lua");
   
   System.out.println("Hello World from Java!");
 }
}

Now run the project! When the option comes up on whether to run your file or the console in the LuaJava files, choose to run your file.  You should receive the following output on your console in Eclipse:

Hello World from Java!
Hello world from lua!


If you have any questions, feel free to email!

Monday, January 13, 2014

LoggerTest: Java Log Files

Logging is a way to track what is going on in a program throughout execution.  It can be extremely helpful in debugging by showing the order of execution and values at various points in the program.   Java can output the log to xml and text files.  For this project, we will output the log to a text file.

First, we need to create a Formatter to tell Java how to output the log entries.  In this example, we will create a simple class LogFormatter and tell it that we want it formatted with the date, time, level (importance of message), and then the message.

public class LoggerTest {
private static class LogFormatter extends Formatter {
             private DateFormat dateFormat = 
                  new SimpleDateFormat("MM/dd/yyyy HH:mm");
        
             public String format(LogRecord record) {
                     StringBuffer sb = new StringBuffer();

                     //Get date from record 
                     //and append to StringBuffer
                     Date date = new Date(record.getMillis());
                     sb.append(dateFormat.format(date));
                     sb.append(" ");

                     // Get level and append to StringBuffer
                    sb.append(record.getLevel().getName());
                    sb.append(": ");

                    // format message and append to StringBuffer,
                    // then add new line
                    sb.append(formatMessage(record));
                    sb.append("\n");

                     return sb.toString();
            }
       }
}

The StringBuffer will hold the String that we will build and return in the format method.  First, the date is retrieved from the record and formatted by our SimpleDateFormat.  Next, the level name is retrieved and appended onto the String.  Finally, the message is formatted by the default Formatter guidelines, and appended onto the string.  The String is then returned.

Now that we have a Formatter to format each of our entries, it's time to create our Logger.  We will create a Logger data field and initialize it in the constructor for the LoggerTest class.  This is demonstrated below.  The code is to be included in the LoggerTest class:

private Logger captainsLog = null;
public LoggerTest() {
try {
//create file
FileHandler textLog = 
                     new FileHandler("Captain's-Log.txt");
//set formatter
textLog.setFormatter(new LogFormatter());
//set levels which will be recorded in log
textLog.setLevel(Level.ALL);
//get logger
captainsLog = Logger.getLogger("Captain's Log");
//attach FileHandler
captainsLog.addHandler(textLog);
//set Logger level
captainsLog.setLevel(Level.ALL);
} catch (SecurityException | IOException e) {
// TODO Auto-generated catch block
System.out.println("Log creation failed!");
e.printStackTrace();
}
//log an info level message
captainsLog.info("Initializing Captain's Log");
//log a FINE level message
captainsLog.log(Level.FINE, "First entry successful!");
}

The constructor creates the file we will be writing to and then tells it to use the Formatter we created above.  The setLevel call tells the file what importance level to record.  The Level.ALL says that we want to record all messages in this file.  If the FileHandler receives a message which is below the FileHandler's level, it will ignore the message.

Next, we create the Logger. the Logger.getLogger() method creates a new Logger with the given name.  If a Logger already exists with the given name, it returns the already created Logger.  Once we have the Logger, we add our handler to it so that it knows where to write to.  Next, we set what level of messages will be reported by this Logger.  The Logger may have multiple Handlers and will send its messages to all of them simultaneously.  This is a way to filter what messages you want sent to any of the Handlers.  This could be useful if you wanted multiple Handlers for different levels.  You could have the Logger send all messages, and then the individual Handlers could filter which they want to record. 

The final two lines log initial statements saying that the log has been initialized.


There are two quick steps left.  The first is to create a main method in our LoggerTest class so that we can run it, and then to add a few methods to demonstrate using the Logger in separate methods.  

First, we will create the main method.   The main method is the method which will be run whenever you select to run the Java application.  Our main method is as follows:

public static void main(String[] args) {
LoggerTest myTester = new LoggerTest();
}

This is currently very simple.  Right now it just creates an object of type LoggerTest; but since the constructor initializes the Logger and creates the log file, this is enough to ensure that everything is working correctly so far.  Now when you run this file, a "Captain's Log.txt" file should appear in the same directory, and should contain the text:
01/13/2014 18:16 INFO: Initializing Captain's Log
01/13/2014 18:16 FINE: First entry successful!


Now that we see our log file is working successfully, the final step is to update the log in several different methods so that we can ensure that our log works across the whole file.  After all, a log which only works in the constructor isn't much help!

We will create two quick methods: foo and bar.  They will be in the LoggerTest class as follows:

public void bar() {
captainsLog.log(Level.FINE, "In bar method!");
captainsLog.info("Exiting bar...");
}

public void foo() {
captainsLog.log(Level.FINE, "In foo method!");
captainsLog.info("Exiting foo...");
}

Next, we will add a call to these methods in the main method.  Add these lines in after the creation of myTester in the main method:

myTester.foo();
myTester.bar();

Now when you save and run the file, you will see the "Captain's Log.txt" file in the same directory, and its contents will be:
01/13/2014 18:30 INFO: Initializing Captain's Log
01/13/2014 18:30 FINE: First entry successful!
01/13/2014 18:30 FINE: In foo method!
01/13/2014 18:30 INFO: Exiting foo...
01/13/2014 18:30 FINE: In bar method!
01/13/2014 18:30 INFO: Exiting bar...


There you have it! Creating simple text log files in Java.  Feel free to email me for the full source code.

-Dan


Friday, January 3, 2014

Compiling Java from Command Line with Multiple Jars

This blog is largely just a set of stuff to help me remember how I fixed problems in the past when working on my thesis and other projects.  If it helps out anyone else along the way, then that's super chill!

First up, compiling a Java file from the command line on Windows with multiple jar files.

1. Change directory to wherever your java file is located using the cd command like so:
  • >cd path\to\source\folder

2. Once you are in the source folder, use the following command to set the path.  This will allow your computer to find the Java commands in the next step.

  • >set path=%path%; "path\to\java\bin"
On my computer, the specific command is:

  • >set path=%path%; "C:\Program Files\Java\jdk1.7.0_45\bin"
3. With the path set, all that's left is to compile the file! To compile the file, we will use the javac command and set the -classpath parameter  to where our .jar files are (in this case, I have a folder of jar files).  In order to include all .jar files in the folder, the folder path will end in *.

  • >javac -classpath "path\to\jar\folder\*" FileToCompile.java
An example run of this would be:

  • >javac -classpath "C:\Users\owner\jars\*" HelloWorld.java


Hope this helps!