Congrats to Hawaii for Winning the LLWS

Congratulations to the team from Hawaii for winning the Little League World Series yesterday. The tournament takes almost two months, starting out in early July with league all-star teams from around the world playing local, regional and national tournaments, culminating with the WS tournament that finished this weekend. There is something magical about the LLWS for me. I love the international flavor. I love the raw emotions, the joy and the tears. I love that these kids seem pretty normal, that most of them will never play professionally, but for this one summer they are the best in the world.

I played in the tournament twice but my teams got knocked out the first week after winning just one game. League policy prohibited us from throwing curveballs during league play. This protected our arms but also meant we didn’t hit the curve very well. We saw tons of curveballs during those games, and I’m sure the fans enjoyed some mighty wiffs and embarrassing, unnecessary batters box bailouts. But we had the time of our lives and loved every minute of it.

Backing Up Delicious With Cron and Subversion has fundamentally changed the way I browse the web. Pre-delicious, if I found a web site I wanted to come back to later, I’d add it to my Favorites in an appropriate sub-folder, and then proceed never to find the link again. Favorites as folders and links just doesn’t work for me. Its too hard to find things even when I keep them decently organized. That’s where delicious comes in – its an online replacement for Favorites/Bookmarks that not only stores your links online but lets you "tag" them so you can find them again by keyword/topics/meme/whatever. Not only that, but you can see how other people have tagged links and find related sites. I have just a couple of  sites on my links toolbar now, where before I had hundreds.

I’m so addicted to delicious that I would be in "a world of hurt" if the site ever disappeared. So, I backup my data everyday. Here’s how I do it.

My main development box runs Windows XP Pro but I also have a local ubuntu linux server that I use for file sharing, backups, development web servers, and version control using subversion. My subversion repository get backed up nightly and stored offsite weekly, so its a natural place to backup my delicious links. Plus I get the added bonus of being able to recover my links for any date in time and do diffs/comparisons (yes – that’s incredibly geeky).

Delicious has an XML api that lets you download your links as a single xml file. You can use wget to save all your links to a local file like this:

To set this up, I created a project in subversion called delicious (no trunk, branches, tags subdirs necessary), and imported the delicious.xml file.

On sidebar (my ubuntu box) I created this script in my home directory. It creates a temporary working directory, checks out the delicious project with its one delicious.xml file, grabs the new delicious xml file, commits it to subversion, and then cleans up the temp directory.

# make our temporary work directory
mkdir dtmp

# check out delicious to temp work directory
svn co svn://sidebar/stilesoft/delicious dtmp

# download our backup file
wget -O dtmp/delicious.xml

# move into temp and checkin new file
cd dtmp
svn commit -m "Daily backup"

# clean up temp work directory
cd ..
rm -Rf dtmp

The script could have been simpler if we left a permanent working directory on the box and just committed the file each day, but this keeps things cleaner.

Make sure the script is executable (chmod +x backup_delicious) and then setup crontab to run it every night at 3 AM.

$ crontab -e

And add this line, substituting your home directory, then save.

0 3 * * * /home/adam/backup_delicious

Technorati tags: , ,

Death of IE7 Phishing Filter Predicted

I have a prediction to make – the Phishing Filter, as currently implemented in IE7 beta 1, will be radically re-engineered (or even removed) from the final release of IE7. Keep reading and I’ll tell you why.

I haven’t installed IE7 beta, but I have read a blog post (and comments) on the IEBlog from the IE team that gives details about the IE7 Phishing Filter (PF).

A little background. The browser is the last line of defense against phishing attacks, and therefore the most important. Spam filters can be bypassed (and bayesian filters actually help phishers get through to likely targets), and network filters are only as effective as their most recent blocklist update. Most network filters don’t understand javascript and are easily fooled.

Browsers can use two complimentary methods to defend against phishing sites.

The first method requires a block list. Whenever you initiate a navigation (click a link, type a URL in the address bar, select a Favorite) the browser checks the host or url of the requested site against a block list. If found, the browser warns the user and blocks the navigation. The advantage to blocking sites pre-navigation is that you stop users from navigating to sites that could also dump malware/trojans/etc on their machine. Pre-navigation protection is only as effective as the block list it relies upon. If your block list is out-of-date, and phishing sites come and go quickly, or you are the first user to find a phishing site then you are not protected. As block lists become more widely deployed, phishers will find ways to thwart them: dynamically generating unique urls or obfuscating urls so that every combo must be added to the block list or risk misses or false positives.

The second method is post-navigation heuristics and/or content analysis. Whenever a new page is loaded, the browser analyzes that page and tries to determine whether it is a phishing site. The earliest AP products used very simple rules to warn users based on things like geolocation or funny url characteristics. These simple rules do a decent job of detecting phishing sites, but have a high false positive rate. One downside is that the toolbar approach requires constant attention and never gives the user a definitive recommendation. Second generation anti-phishing products like ScamAlarm use content analysis and extensive rule sets to identify phishing sites. ScamAlarm heuristics currently identify over 99% of phishing sites. If you have a good heuristics implementation, you don’t need a block list.

So, where does the IE7 Phishing Filter fit in? It seems that the primary tech in IE7 is a real-time block list. A real-time block list gets around the my-block-list-is-out-of-date problem by requiring the browser to phone home on every navigation request. Don’t miss this. Every navigation will generate a request to an MS server, ask the MS server if the URL (or a hash of the URL or host) in question is OK, and then warn the user if there is a problem. That’s a ton of traffic! If the request is synchronous, it slows down your browsing experience. If its asynchronous, then the warning may not load until its too late. The IE team likes to talk about 400 million IE users. If that number is correct, we’re talking tens of billions of requests daily, all
routing through an MS server. Not only does this create tons of extra traffic and a single point of failure susceptible to denial-of-service attacks, but the privacy problems are unprecendented – do you want MS knowing the URL of every site you visit? Even hashed url/host data can be recovered by a determined data miner. I can imagine every reviewer of IE7 cautioning users to turn off the Phishing Filter because Microsoft is watching their every move on the internet.

The IEBlog also hints at a strategy to use some post-navigation rules:

[E]ven if a site hasn’t been reported yet, Internet Explorer will warn you about sites that might look a “little bit phishy” because they use some features commonly used on phishing sites.

How this works out will remain to be seen, but the tone of the description leads me to think IE will rely on first generation rules (is the site hosted on an IP address, does it use a non-standard port, is it hosted in an axis-of-evil nation?). MS doesn’t sound confident that IE7 will be able to definitively identify phishing sites, just that it can recognize sites that look a "little bit phishy". This doesn’t sound like a trainable system that can detect specific sites and targets as much as something that recognizes generic characteristics – the characteristics that phishers are getting more and more adept at avoiding.

The fact that MS has been working on a solution demonstrates that they think phishing is a huge problem. Its too bad that their privacy-busting implementation (and ineffective rules?) may lead users to turn it off.

FBI Gets Spoofed

Here’s an interesting spoof we saw today. The target? The FBI. If this works, I’m sure we’ll see lots more like this one – the CIA, Homeland Security, NASA (have aliens stolen my identity?).