Amazon Honor System Click Here to Pay Learn More

The software on this page is released under the LGPL v3 LGPL or GPL v3 GPL licenses - as indicated by project.

Please read LGPL version 3 and GPL version 3 licenses before downloading any software from my web site. Individual open source projects on this page are labeled as LGPG v3 or GPG v3. I consider the LGPL to be a "commercial friendly" open source license while the GPL license does the most to promote software freedom. For either license, you are required to maintain my copyrights with the code, acknowledge the use of my software, and if you modify (improve) any of my LGPL/GPL libraries please let me know about your changes. You are allowed to mix Apache 2.0 licensed code with LGPL v3 and GPL v3 projects. If you have any questions, ask me. It is always better to use and write Free Software!

Mark hiking in Sedona
Mark hiking near his home office in Sedona Arizona

If you need to use any of my GPL licensed projects in a commercial setting where you simply can not use the GPL then please contact me: either for free or at a low cost I can give you the rights to use my code embedded in another specific project without distributing your code.

You accept that the software is available for use "as is" and that you accept the responsibility for its use. I also ask that you Email any bug fixes of improvements so that everyone can share any improvements that you make to these software packages.

I recommend:YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler

I earn my living as a Java consultant - please consider using my consulting services to modify any of these LGPL projects for your development projects.

I can not provide free support for any of my open source projects. If you need help then consider hiring me for a one or two hour consulting task. I offer a 25% discount on all work for GPL open source projects and a 10% discount when using other open source licenses.


  Java natural language processing tools LGPL

New 6/11/2008: version 2 uses Java collection classes, adds unit tests, and some code cleanup.

FastTag v2, normal English lexicon. JAR file contains source code, lexicon data, and compiled code

FastTag v2, with smaller medical term lexicon. JAR file contains source code, lexicon data, and compiled code

Older version 1:
FastTag a Java fast part of speech tagger. Includes a standard lexicon for normal use and the MEDPOST lexicon for medical applications.

  Pascal natural language processing tools LGPL

New 6/26/2008: FastTag a Pascal fast part of speech tagger. This is version 0.011 (second cut at converting from the Java version). This uses the FreePascal.org compiler.

  JRuby bindings for the PowerLoom AI reasoning and knowledge representation system LGPL

Everything you need in one zip file. I enclose the Java runtime for PowerLoom (GPL, LGPL, or Mozilla multiple license) and examples. I advise you to also download the entire PowerLoom distribution to get examples, source code, and support for C++, Common Lisp, and Java runtimes and development. I wrote about this on my AI blog.

  Ruby Utilities for part of speech tagging and text categorization LGPL

Ruby part of speech tagger and required data

Ruby part text categorizer and required data

4/4/2005: Tagger updated to version 0.1.1 (incorporated donated code improvements)

  Java NLP utility to identify proper nouns (human names and places) in text LGPL

Download JAR that contains source code, compiled code, and required data. Includes FastTag library. Also performs anaphora resolution (maps pronouns to the proper names)

  C++ and C# Utilities for part of speech tagging and text categorization LGPL

C++ NLP library includes a tagger and a simple categorization system that can also be trained with your own category data.

C# part of speech tagger.

Note: these libraries are old code and I no longer use them myself. If you need help using these libraries, I can only help you as a paid consultant.

  Java framework for using the Reuters news story corpus LGPL

Reuters News Service makes about 2 gigabytes of their marked up news stories available under a free license for research and other uses. This simple Java framework is some code that I hacked up in order to be able to leave the news stories in their ZIP archives and efficiently enumerate through them. A hack, but if you are a Java programmer, and you use the Reuters corpus, then you might find this useful stuff. I wrote this in 2000.

  CIA World FactBook as a PostgreSQL database dump file LGPL

Load FactBook.sql and then load the file with:

   psql -U postgres -f FactBook.sql

There are additional tables for data I spidered off of the web for companies and company directors. This data is several years old. An example: to find "overlapping directors" (i.e., people on the board of directors of more than one company) try:

	select A.firstname, A.lastname, A.stockticker, B.stockticker 
	  FROM companydirector A, companydirector B
	  WHERE A.firstName = B.firstName and A.lastname = B.lastname and A.stockticker < B.stockticker;

Note that this may not be accurate if two people in the database have the same name. The data on countries is self explanatory from the table column names.



  And, some (very) old Java open source projects:


  NLBean version 5: a natural language interface to databases! LGPL Java logo

A demo screen shot of the NLBean(tm).

The NLBean (tm) Open Source distribution is distributed as a ZIP file (198K) that contains the source code and a sample database. (I removed the UML class diagrams from the distribution to make the download smaller). Please note that this code is experimental code (not production code - you know what I mean: it is a hack! ). I wrote it in early 1997 to try out some ideas, and I offer it hoping that people will find it useful. It does work fairly well, and is packaged with InstantDB (and a small database that is already set up) so it only takes a minute to get it up and running.

There have been over 20000 downloads of the NLBean since its release.

Click here to read a short NLBean API document .

Some History

The first version of the NLBean contained JDBC code to access local databases and used RMI to implement a client/server mode. The resulting system was about 10,000 lines of Java code.

Version 2 eliminated the RMI client/server code and the JDBC code so that people could just use the NLP core.

Version 3 has JDBC code added back into the distribution. I also bundle Peter Hearty 's excellent pure Java InstantDB database product. Note: InstantDB was aquired by Lutris Corporation and is now a commercial product. Thanks to Lutris for gving me permission to distribute an older version of InstantDB in a jar file with NLBean. By editing the source file DBInterface.java, you can easily get NLBean to work with any JDBC supporting database; if anyone modifies NLBean to work with the HSQL Open source database, please let me know.

Version 4 had some cool, but unused experimental code removed. The code is still a hack, but is now a little easier to read. Download

  PicWeb Builder version 7 LGPL

Note: this is really old software that I stopped using about 5 years ago, but I am releasing in just in case someone finds it useful :-)

Thanks to Open Source contributors Achille Petrilli for adding directory search, font, and image size improvements, Reinout van Schouwen for HTML table generation, and Tony Allan for adding directory sorting and the use of a PicWeb.ini file!

There have been over 25000 downloads of PicWeb.

Java logo Certified 100% Pure Java by KeyLabs (www.keylabs.com)

This is a standalone Java program that creates a complete index of a disk directory containing GIF and JPEG pictures. Small JPEG images are generated for each picture. Sub-directories are recursively processed also. The top-most directory has links to generated HTML files in sub-directories.

Download PicWeb builder right now! Distribution only contains source code! You must have a Java development environment installed to compile and run PicWeb. (This is easy: 'javac *.java' compiles everything and 'java PicWeb' runs the program.)

Great way to organize thousands of pictures if you have a digital camera.

Java, 100% Pure Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.