I have a prediction to make – the Phishing Filter, as currently implemented in IE7 beta 1, will be radically re-engineered (or even removed) from the final release of IE7. Keep reading and I’ll tell you why.
I haven’t installed IE7 beta, but I have read a blog post (and comments) on the IEBlog from the IE team that gives details about the IE7 Phishing Filter (PF).
Browsers can use two complimentary methods to defend against phishing sites.
The first method requires a block list. Whenever you initiate a navigation (click a link, type a URL in the address bar, select a Favorite) the browser checks the host or url of the requested site against a block list. If found, the browser warns the user and blocks the navigation. The advantage to blocking sites pre-navigation is that you stop users from navigating to sites that could also dump malware/trojans/etc on their machine. Pre-navigation protection is only as effective as the block list it relies upon. If your block list is out-of-date, and phishing sites come and go quickly, or you are the first user to find a phishing site then you are not protected. As block lists become more widely deployed, phishers will find ways to thwart them: dynamically generating unique urls or obfuscating urls so that every combo must be added to the block list or risk misses or false positives.
The second method is post-navigation heuristics and/or content analysis. Whenever a new page is loaded, the browser analyzes that page and tries to determine whether it is a phishing site. The earliest AP products used very simple rules to warn users based on things like geolocation or funny url characteristics. These simple rules do a decent job of detecting phishing sites, but have a high false positive rate. One downside is that the toolbar approach requires constant attention and never gives the user a definitive recommendation. Second generation anti-phishing products like ScamAlarm use content analysis and extensive rule sets to identify phishing sites. ScamAlarm heuristics currently identify over 99% of phishing sites. If you have a good heuristics implementation, you don’t need a block list.
So, where does the IE7 Phishing Filter fit in? It seems that the primary tech in IE7 is a real-time block list. A real-time block list gets around the my-block-list-is-out-of-date problem by requiring the browser to phone home on every navigation request. Don’t miss this. Every navigation will generate a request to an MS server, ask the MS server if the URL (or a hash of the URL or host) in question is OK, and then warn the user if there is a problem. That’s a ton of traffic! If the request is synchronous, it slows down your browsing experience. If its asynchronous, then the warning may not load until its too late. The IE team likes to talk about 400 million IE users. If that number is correct, we’re talking tens of billions of requests daily, all
routing through an MS server. Not only does this create tons of extra traffic and a single point of failure susceptible to denial-of-service attacks, but the privacy problems are unprecendented – do you want MS knowing the URL of every site you visit? Even hashed url/host data can be recovered by a determined data miner. I can imagine every reviewer of IE7 cautioning users to turn off the Phishing Filter because Microsoft is watching their every move on the internet.
The IEBlog also hints at a strategy to use some post-navigation rules:
[E]ven if a site hasn’t been reported yet, Internet Explorer will warn you about sites that might look a “little bit phishy” because they use some features commonly used on phishing sites.
How this works out will remain to be seen, but the tone of the description leads me to think IE will rely on first generation rules (is the site hosted on an IP address, does it use a non-standard port, is it hosted in an axis-of-evil nation?). MS doesn’t sound confident that IE7 will be able to definitively identify phishing sites, just that it can recognize sites that look a "little bit phishy". This doesn’t sound like a trainable system that can detect specific sites and targets as much as something that recognizes generic characteristics – the characteristics that phishers are getting more and more adept at avoiding.
The fact that MS has been working on a solution demonstrates that they think phishing is a huge problem. Its too bad that their privacy-busting implementation (and ineffective rules?) may lead users to turn it off.