« April 2004 | Main | June 2004 »

May 2004 Archives

May 1, 2004

Finals SUCK

I am in the lab trying to study for one class while printing a bunch of stuff out for another class. It sucks. I am not getting to put my full attention to either of the two task so it seems to make it suck worse. I dont know how but it does. I have to study my butt off today and all tomorrow afternoon and night.... grrrr says I.

I should be done soon, and it wil be great to be free. I am really looking forward to this summer even though it looks like I will be really busy with various things. I am sure i will have time to have some fun and relax though.

Either way finals are really dragging me down right now and I am hoping things will start to look up once they pass.... so good luck to everyone out there dealing with this as well.

May 2, 2004

Europe stuff

I have a bunch of stuff i need to get done before I leave for europe. I am going to put my list here so that I will remember to take care of evrything and if i forget something obvious someone can make me aware.

-Purchase phone charger (DONE)
-purchase europe/american adapter
-Get beta spam filter and newshaker online
-Install new pocket pc OS (DONE)
-Pick and Convert a bunch of music for the trip
-Order larger SD card for more music on trip (DONE)
-Get a decent digital camera that uses SD for the trip. (DONE)
-Find out about Tmobile internet in europe get back on internet plan
-Finish resume before I leave let people look over it and work on it while gone
-Reinstall mobil blogging ap so i can update blog from europe
-choose luggage and amount of stuff to bring
-pay off all credit cards and get enough money in my account
-Make sure rent will be taken care of while I am gone
-Go to scotts military graduation (DONE)
-Go to matt and jesse's graduation
-Find 1 book to bring and read
-Convert and bring a few TV shows on pocket pc to watch
-Get many batteries of whatever type I will need (DONE)
-Take care of IL jury duty i was called for
-have short list of things I want to do in europe
-make and pack travel grooming kit (toothbrush, razor, medicine,...)

Is there anything that i should be thinking about here?

May 4, 2004

Yeaaah I am done

I am done with school... I am at work already... arggg oh well I figure that if I get alot of work done at the begining of the summer I will have more to show of and put on my resume as I begin searching for a job later in the summer. Also, I will be able to relax a little more towards the end of summer, If I know that I have accomplished alot here. I am really close to finishing up some cool projects so it is fun.

My current fun away message on aim:
"I am off making a turtle race a hare to prove and old story wrong."

Perhaps it is not funny to you, but I love it.

I got an email from a girl in Munich who posted on my blog. How cool is that, how the web can really change everyone's interactions. I will get back to you soon, Su.

I celebrated my freedom well last night, and we will see how much more I end up party the next week, I am sure far to much. Oh well good times with friends.

May 7, 2004

it is funny

it is a funny thing when readers choose to post more on one post or another. Do that mean where they post is better? Does it mean they just have more opinions about that topic. Does it mean you just sparked more of a interest in what you were saying? I dont know that answer to anythign I asked but I would like to know the answer to everything one day.

grrr technology

I am running into all sorts of problems with programming my work project right now. It is bothering me. Last night I was upgrading my phone and fried it for a bit so that it wouldn't even boot up or turn on. I fixed that this morning but I was without a phone for 12 hours which was annoying. My phone is back online and I am slowly figuring more and more of the little bugs out on my project at work, but I am nervous. I thought I would have alot more done, by the time I left to europe but now it doesn't appear to be moving as fast as I thought it would. Part of it is the relatively long testing times, since my program is working on such a large amount of data it runs fairly slow for each test.

Anyways today was graduation, I watched people walking around in their outfits. I went to the CS afterparty lunch and talked with some professors. It made me feel a little sad that I wasn't graduating on time. Like i had messed up or something and should be done. Oh well no worries. My roommate Scott graduated today so a congrats to him. Also My friend Dave walked today so congrats as well.

I have been in a very wierd mood since last night. I am not really sure if it is good or bad, but hopefully i will figure things out soon.

SVMMail

I reached a great milestone with SVMMail today. I will be doing more test and releasing more information next week. The initial results are a 97% accuracy on the filter. Also with real world testing (so far a low number of 65 mails), there was only 2 errors (1 false positive) in prediction. I had training data of 550 real emails and 713 spam emails (all of which i collected in the 3 weeks or so that I ahve been working on this project.) I am really excited that I have past the stumbling blocks that I was on the last 3 days where I was actually getting a 0% accuracy because a bug the was generating a pretty much random model.

There is currently no web interface and it is all just run directly from java (jbuilder in my case). I will add features like that and the ability to track how many of each type of error my system makes later.

This is a great I am really happy with how this is working out.

May 9, 2004

Happy Mothers Day

I just taught my mom how to find my webpage. So hopefully she will see this. Happy mothers day mom. I love you. You have always been amazing and inspirational in my life.

I made this painting today, inspired by my mom's fiery spirit, and fiery red hair.

mdaypaint2.jpg

May 11, 2004

SVM Mail testing

Today I got a chance to let SVM mail sort 252 messages that it has never seen before. The messages were moved sucessfully fromt he inbox to either my real folder, or my spam folder. Out of the 252 messages the system incorrectly categorized 8 emails. Three of the 8 were actually the same message sent out from the frienster network. I have added those messages to real training model. This means on the first really good real world test my filter achieved an 96.8% accuracy. This comparess well to the 98% estimated acuuracy of the model, by categorizing the training data 98% correctly.

The system has been through about 400 emails, and my accuracy is now at 97.25 %. I have only retrained the model once including little bits of the new info I have collected. I places all of the incorrectly identified emails in the proper categories and retrained the system. Then just to see if that would make it correctly identify those emails I recategoried them, it cut the misclassifications in half, but some where still missclassified. It seems that forwarded messages with attachments are what it will missclassify still. Other emails seem good.

I am currently not working on or extending this project because it was just a testing project for some of my code, which I am now focusing back on my News Shaker project, which is using SVM classification to create a google news like site.

May 12, 2004

Punk Rock Show

Today I am heading out to a punk rock show. I have actually had alot to say in write, but I just haven't written anything in the last couple days. I am finally starting to settle into summer I think, which is nice. Although I am sure I will get bored with the routine soon enough.

Random Dan Deep thought(tm):
So much to do, so little time. So many moments to spend with you, but I am never there by your side. I am always struggling to work towards the next unseen problem as if it is more important, than the life I live alone everyday.

May 14, 2004

I see the future...

Well I claimed it and it has began. I said that google going into email will make mail better for everyone... Yahoo claims unrelated to Gmail announcing 1 GB of email storage for users... that is was planning on offering 100mb over the summer. Anyways, it is a good thing whatever the REAL reason is. So how long before hotmail is revampted with a better spam filter and more storage for it's free users like Gmail and yahoo? I expect not to long, but if not I think hotmail will be dieing out pretty quick.

Yahoo announces 100 mb of storage.

I am telling you all I should spend all my time predicting trends and investing in the stock market... I could do it... but then when would I get to program really cool stuff?

Why I hate our laws

Our laws are not set up to defend people or make things just. Most of them are set up to defend money. Keep money in the hands of the rich. Recently there was over 2 billion in tax cuts for gas and electric companies. 2 billion corporations don't have to pay, and I am sure they are loosing money, but paying their administration MILLIONS! anywas the most recent thing to bring my outrage is this: RIAA has man court ordered to pay $4,000 for downloading 5 songs on the internet. The entire music industry was caught in price fixing, to extort millions apon millions from its costumers. It paid out a few million for getting caught. The eventual settlement was about $10 for each person that sent in info for the class action suit. This man was ordered to pay $750 for each of the copyrighted songs violations. So wait if a person breaks the law and takes a song they can pay $750 a song (about 700x the value of the song). If a coporations breaks the law they pay $10 (about 2/3 the cost of a new CD.) Realize the CD cost about 6 cents to make so even after paying you $10s back they still made 5 dollars off of your for a single CD you purchased (let alone the fact that you most likely purchased more than one CD.) Can ANYONE tell me why in the hell a person can be sued for 700x the value of a stolen object and a company can only be sued for 2/3 the cost of an item. This is our laws in action. This is our government. This is corporate lobbying and control. This is money and greeds influence.

Yes there are more important problems going on in the world. They are harder to show the direct problems iwth our goverment. The underlying problem with most of our issues is greed, corporate corruption, and lobbyist control. This is bullshit and I think the entire recording industry should be put our of business. Any business person that has ever profitted from it should have their entire savings stripped from them.

May 17, 2004

Welcome Nicole everybody

Well Nicole moved in yesterday. 90% of her stuff is still in boxes. She is in the house and starting to get stuff unpacked. Her family is here with her as well until tuesday. So I now am living with a girl, a situation I have been in once before and she really didn't stay in the place at all... just kinda used it for storage, changing, and visiting. So I shall now welcome Nicole to our messed up little family. WELCOME NICOLE!

Also I should warn you for everything that shows up and makes our house appear more girly, I shall retalliate by adding other images posters and such promoting the male (Football, Gun, and Pussy) perspective. So I shall find something soon to fight back against your wierd hanging shower monstrousity that says a woman showers here all over it.

Random Dan Deep Thoughts(tm):
If two people had a chance to make something happen, but then it was cut short for some reason. Do they still have a chance when they meet again later? I don't know if they really do, I would like to beleive it is possible, but timing in relationships is really everything. You could meet the perfect person for you but be in a bad relationship that was coming to an end, and you would miss your chance. You could meet the perfect person, who at that time has a strong crush on someone they won't even remember later in the future. I guess if timing is everything there might still be a chance the second time around?

May 18, 2004

News Shaker Update

After doing some initial work with the 8 category problem, I have run into some problems. Nothing that can’t be solved but just some initial hiccups as expected. The very first run through I was getting approximately 30% accuracy on my categorizations. Better than random guessing but still pretty worthless. After changing to a different layout of the model, I am now getting around 43%. Which 43% (on average of the 8 models some are higher) also sucks. I now have about 5 different ideas after talking with a professor at CU at how to improve my overall percents. I am trying to get over 75% accuracy once I have about that level (which isn’t that high) I am hoping with some user feedback on the site that the model will train and improve itself. Which would be really cool, and possible since pretty much the whole process is automated now.

I first was taking all of the categories and creating a positive and a negative vector. The positive was all of the categorized data in the model. The negative was all of the other data in all the other categories. This wasn’t doing so well, so I removed the general category from the negative vector. I also removed the uncategorized data from the negative vector since it is possible these could fit in the category. Doing this increased my model accuracy from the 30% to the 43%.

I am now considering other things I could do to improve the accuracy. One of the things I am considering is a two level model. The first would only say if the model relates to special education the second level would then categorize within the special education category. This would allow me to quickly dump anything I know isn’t related to special education at all. It would also allow me on the final site to have users help with the categorization process. Anything that couldn’t be categorized better than just special education related could be placed in a general category. The general category users could view and then place in the proper category which would in turn help train the system.

I am also now considering a move from SVMlight to libSVM. Apparently libSVM offers some better options and optimizations, but still uses the same input format. This is important because text2SVM, took awhile and was written with SVMlight in mind. I have done some other optimizations on text2SVM which isn’t included in the released source because the project has begun to become less general and more specific to my project. It has improved and become far faster though. If I move to libSVM this would allow me to get results of a categorization attempt as a percent. If I had percents I could compare the results to different models better which would be useful since the value comparison between models isn’t scaled the same.

One of the problems I am running into is testing time. It takes about 2 1/2 hours or so to create and run a new test. It requires a few different steps. If I run them all at once my machine runs out of memory and crashes. So I have to run the steps one at time even though the code is completely automated, it can’t run as such without time to dump the memory. Perhaps I will have to start looking around CU for a gigantic machine that I can use to do testing much faster.

The spam filter has gone through over 500 emails now and has an accuracy of 97.5% on unseen new email. This is great, if it wasn’t so specialized to my mail I would make the filter available to everyone.

That’s it for now. The good news is I think I am still headed in the right direction and I think I will end up with a capable system. The bad news is that I think it is going to be harder and more time consuming than originally planned. I will be busy with some other stuff and out of town over the next 3 weeks so there will probably be little updated information available on the project.

argg one night

Steve,
One night with MY mom and I am upset. I am sure that it wouldn't come off as especially new or original so I am not going to really get into any of my points. I am just going to say you have forgotten what it is like to be alone. You skipped that step taking the easy way out. It is hard especially when you thought you were done being alone. That is all I really needed to say tonight, so I am going to sleep. I just thought that I should keep you up to date so nothing comes as a shock to you.
peace,
Dan Mayer

May 19, 2004

SVMMail vs Apple Mail

Well it looks like my SVM mail is very similar to apple's Mail. Apples mail also uses a vector representation and training to achieve 98%+ accuracy (the claim). After reading through this, it makes sense why my filter is achieving such good accuracy. They are using a little more complicated vector analysis tool than I. I use SVMlight, while Apple is using LSA (Latentic Semantic Analysis), which i used to work with but I found the tools far less developed and harder to work with. It was causing me all sorts of problems to tell LSA to do the simple clustering I was doing very simple with SVMlight. The main reason it looks like they are using LSA is they first reduce the space vector and then using LSA on the reduced vector they are claiming a major performance increase. This is really believable, especially since LSA offers quick tools for folding new information into a model instead of recreating it. Anyways, I am happy to learn that the approach I took with my mail filter appears to be very similar to one of the best computer companies out there. It gives me even more reason to believe that I am on the right trail with all of my various Vector based learning algorithms.

Explaining the Apple Mail Filter

P.S. This means that my filter that has barely been developed and has many enhancements possible is achieving about .5% less than apples filter!

May 20, 2004

SVM Mail feature vector

I have gotten a few emails and questions from others researchers in the community and I decided that I would begin to answer questions on my site rather than through email so any others could also benefit from the answer. So here is the first response I am posting on the web. Feel free to contact me if you have any other questions and I can try to respond.

I found your project while googling for various alternatives to spam
filtering; I've been thinking about trying SVM for mail filtering myself,
but I'm slightly at loss as to what features to use.

A bag-of-words model comes naturally to mind, but it is not the most
efficient computationally; are you perhaps using it or something similar?

Continue reading "SVM Mail feature vector" »

May 21, 2004

Dreams against me

After a rough night of dreams I am not feeling al that great today. It is 10:50 in the morning and it has already been a ruff day. I feel entirely drained with nothing really here to recharge me. What at this point am I supposed to look towards for strength? Hopefully, I will still manage to be able to have a good afternoon. I feel as if everything has just past me by, half of it I even let go past. Now when I am working to keep a hope for myself, have to help keep hope for others. It is difficult when everyone tells you things will work out, you will find what your searching for... and you know it really isn't true and that everyone saying that really has found what their looking for.

It is impressive how much dreams can effect you. I mean i was fine last night and then my dreams were so real and so amazing, that waking up to reality was a dissappointment. A realization that your mind can create and want things that you will never find.

Sorry for the depressing post, but if you saw the world in my dreams last night that seemed so real you might understand.

May 24, 2004

SVMMail testing update

I have now gone through about 800 emails with my SVM / SMO mail filter. I am still getting about 97.5% accuracy. I have blocked over 700 spams from my mail account. I am taking off (to europe) and I am sure my mail will fill up with about 600 emails while I am gone so I will have a much larger test results when I return. Good luck to anyone else out there working with or thinking about SVMs / SMOs for spam filters. It looks like it is a winning combination. I hope my various code and articles can help you on your way.

peace,
Dan Mayer

Off to europe

Well tomorrow is my last day. I am pretty much all ready to go with one small problem. I have no way to charge my phone/pocketpc/ALL of my music and games for the trip! They sent the wrong thing from Tmobile and I didn't look at it today and then they can't get a new piece here before I take off wends. So I might be out of music the entire trip. This doesn't make me happy. I am hoping to fix this tomorrow by saudering together some stuff and building a charging cable but I dont know how that will go. I could also look into charging via putty on my external battery as i did to extend the life of my previous pocket pc.

Off to europe to see London, Paris, Amsterdam, and Munich. I am really excited it should be a great time. I hope everything turns out perfect. I am sure we will have a blast. Who knows if i will have net access there if i do great I will update my blog a few times. If i dont i will have a journal that i write in that i will put little blurbs about europe in. I am really looking forward to the trip. I hope i am not to exhausted when i return but i probably will be.

peace, love, and have a great time everyone... see you in about 2 weeks.

PHONE FIXED

I am so happy it took over an hour of sitting around and talking my way through hoops at the Tmobile store in boulder. They gave me the piece I need for free. I just have to return it when i get back from europe. This rules because it is a $20 piece that just couldnt be shipped to me in time. Thanks so much people of the Tmobile store. They hooked me up big time and I owe them. I guess that is Tmobile being nice and paying me back for my HUGE phone bill as I went WAY over last month with calls to my mom. I have music and fun back in my europe trip time to celebrate... where's the beer! hehe

Now everything is perfect!!! This trip is going to be great!!!! like tony the tiger says.... GREEEAAATT!

peace out happy as can be,
Dan "with music" Man

May 30, 2004

London Calling

London Calling
It is a far away land.
It frees your mind,
and makes you take a stand.


Or something like that. Hey everybody back home. Hope your all doing well and having fun. Anyways today is my last full day in london tomorrow I am headed towards Paris.

London has been awesome. We have seen basically everything (Westminster abby, Dali mueseum, Tower Bridge, History museum, Big Ben, Buckinham Palace, guards in big fuzzy hats, the underground, Pubs, etc. We have partied a pubs alot trying all sorts of crazy beers. I even found a apple based beer that tastes good. Everything is so expensive here though. Man it has just cost so much for everything. Saldly the food sucks as you have always heard. The best meal i have had in London is our hotels free breakfast, all you can drink OJ and eggs and hashbrowns. Mmmm breakfast.

Nothing really crazy has happened yet so i don't have any really insane stories. Last night was jesse's birthday so we bought a round of shots and for these locals we were tlaking to. We alll did that and then the locals to off to go to another pub. Then some crazy guy started talking to jesse and was saying how he is jewish (which he isn't) and that his mom is from the same town as he is (in isreal). So they were family. This guy was scary and a little violent so he went to get another drink and we all ran out an emergency exit and jumped over a roped off thing. and ran off down the street. Very odd it was... crazy bastard.

Mass public transportation rules in london. It is seriously amazing, I never want to own a car again. We should all be able to use underground train systems to get everywhere. The problem there is very frequent �3 beer which is 6 bucks... So we have probably spent an arm and a leg drinking. Oh well... Rock on and talk to you all later.

Well I guess that is all I can report for now.
peace, fleece in greece,
Dan "Wanker" Man