I finished the basic text to vector, but in its current form it is kinda useless. It only outputs to system.out in a format that isn’t usefull yet. Also it doesn’t save the dictionary that it uses to generate the vectors. So you can’t generate any new word vectors to run comparisons with. I will be adding in features to save the generation dictionary, to import a saved dictionary, and to output the feature word vectors in the SVM light format. I also have recently found some cool java stemming software that I will add in to my project after I get the first usable version out. If your interested in the stemming software here are the links: Porter Stemming Algorythm Lancaster Stemmer

blog comments powered by Disqus
Dan Mayer Profile Pic
Welcome to Dan Mayer's development blog. I primary write about Ruby development, distributed teams, and dev/PM process. The archives go back to my first CS classes during college when I was first learning programming. I contribute to a few OSS projects and often work on my own projects, You can find my code on github.

Twitter @danmayer

Github @danmayer