Features:* from empty DB to fully functional (no manual data entry)* No code should contain hard coded value refrencing categories* Easy way to define where search should put results default (in a category or in the uncategorized)* Ability for user to say they think the categorization is incorrect (if enough users say this it should be moved to the new category)* Store all SVM results int he database* a way to visualize the distances between the categories..* how many categorizations per category have been correct or incorrect* a way to start over the correct and incorrect after generating new SVM models* SVM weighting with C towards the positive examples* a way to let a user create their own account and their own categories to manage themselves* When administrator is adding a URL to crawl. Should be able to pick depth to crawl and default category that all results will be placed in. * Search entire database or by category using Lucecne* Ability to add single page to a category* More administration features* Ability to start or stop auto categorization* caching the front page* Text2SVM integration* Text2SVM configuration file* ability to create the SVM categories from the web as administrator* create only the dictionary and store it as one function* use stored dictionary to create all needed spaces for SVM models* ability to distinguish between multiple categories instead of single boolean. Bugs:* If the user chooses to move a page from one category to another but doesn’t choose a new category it should do nothing and give them an error.* Crawling from the web doesn’t work anymore* SVM first word spacing?? with the first character removal??? is this still a bug?* Counts for the categories should be switch to be autocounted* Text2SVM runs out of memory on large examples* front page loads to slowly* categories are manually counted let MYSQL do the counting!

blog comments powered by Disqus
Dan Mayer Profile Pic
Welcome to Dan Mayer's development blog. I primary write about Ruby development, distributed teams, and dev/PM process. The archives go back to my first CS classes during college when I was first learning programming. I contribute to a few OSS projects and often work on my own projects, You can find my code on github.

Twitter @danmayer

Github @danmayer