Fooling the MS Phishing Filter

About a month ago I predicted the death of the IE7 phishing filter based on sketchy details of the MS implementation. Since that time, MS has also released an antiphishing plugin for the MSN toolbar and the IE blog has released more details about how the phishing filter works. After analysing the blog posts I stand by my prediction.

My previous reservations still stand and I have several others I will cover later. Here’s the zinger for today: for sites that aren’t on the block list, the MS approach will be easy for phishers to circumvent.

A little background. In an attempt to "anonymize" the data it sends home, IE7 removes query strings from URLs. The query string is anything after the (?) in the URL. Instead of phoning home they will send home Its great the MS wants to protect privacy, but they’ve also opened up an easy way for phishers to beat the system.

A smart phisher will return different content to end users (browsing the URL with the query string intact) than it will to the MS phishbot. The end user gets a scam site and the MS phish bot gets innocous content.

The fundamental flaw in the MS approach is that the analysis is performed on a server instead of on the client, and the client and server may be looking at entirely different content. Smart, client-side phishing detection engines like ScamAlarm don’t have this problem.


Death of IE7 Phishing Filter Predicted

I have a prediction to make – the Phishing Filter, as currently implemented in IE7 beta 1, will be radically re-engineered (or even removed) from the final release of IE7. Keep reading and I’ll tell you why.

I haven’t installed IE7 beta, but I have read a blog post (and comments) on the IEBlog from the IE team that gives details about the IE7 Phishing Filter (PF).

A little background. The browser is the last line of defense against phishing attacks, and therefore the most important. Spam filters can be bypassed (and bayesian filters actually help phishers get through to likely targets), and network filters are only as effective as their most recent blocklist update. Most network filters don’t understand javascript and are easily fooled.

Browsers can use two complimentary methods to defend against phishing sites.

The first method requires a block list. Whenever you initiate a navigation (click a link, type a URL in the address bar, select a Favorite) the browser checks the host or url of the requested site against a block list. If found, the browser warns the user and blocks the navigation. The advantage to blocking sites pre-navigation is that you stop users from navigating to sites that could also dump malware/trojans/etc on their machine. Pre-navigation protection is only as effective as the block list it relies upon. If your block list is out-of-date, and phishing sites come and go quickly, or you are the first user to find a phishing site then you are not protected. As block lists become more widely deployed, phishers will find ways to thwart them: dynamically generating unique urls or obfuscating urls so that every combo must be added to the block list or risk misses or false positives.

The second method is post-navigation heuristics and/or content analysis. Whenever a new page is loaded, the browser analyzes that page and tries to determine whether it is a phishing site. The earliest AP products used very simple rules to warn users based on things like geolocation or funny url characteristics. These simple rules do a decent job of detecting phishing sites, but have a high false positive rate. One downside is that the toolbar approach requires constant attention and never gives the user a definitive recommendation. Second generation anti-phishing products like ScamAlarm use content analysis and extensive rule sets to identify phishing sites. ScamAlarm heuristics currently identify over 99% of phishing sites. If you have a good heuristics implementation, you don’t need a block list.

So, where does the IE7 Phishing Filter fit in? It seems that the primary tech in IE7 is a real-time block list. A real-time block list gets around the my-block-list-is-out-of-date problem by requiring the browser to phone home on every navigation request. Don’t miss this. Every navigation will generate a request to an MS server, ask the MS server if the URL (or a hash of the URL or host) in question is OK, and then warn the user if there is a problem. That’s a ton of traffic! If the request is synchronous, it slows down your browsing experience. If its asynchronous, then the warning may not load until its too late. The IE team likes to talk about 400 million IE users. If that number is correct, we’re talking tens of billions of requests daily, all
routing through an MS server. Not only does this create tons of extra traffic and a single point of failure susceptible to denial-of-service attacks, but the privacy problems are unprecendented – do you want MS knowing the URL of every site you visit? Even hashed url/host data can be recovered by a determined data miner. I can imagine every reviewer of IE7 cautioning users to turn off the Phishing Filter because Microsoft is watching their every move on the internet.

The IEBlog also hints at a strategy to use some post-navigation rules:

[E]ven if a site hasn’t been reported yet, Internet Explorer will warn you about sites that might look a “little bit phishy” because they use some features commonly used on phishing sites.

How this works out will remain to be seen, but the tone of the description leads me to think IE will rely on first generation rules (is the site hosted on an IP address, does it use a non-standard port, is it hosted in an axis-of-evil nation?). MS doesn’t sound confident that IE7 will be able to definitively identify phishing sites, just that it can recognize sites that look a "little bit phishy". This doesn’t sound like a trainable system that can detect specific sites and targets as much as something that recognizes generic characteristics – the characteristics that phishers are getting more and more adept at avoiding.

The fact that MS has been working on a solution demonstrates that they think phishing is a huge problem. Its too bad that their privacy-busting implementation (and ineffective rules?) may lead users to turn it off.

FBI Gets Spoofed

Here’s an interesting spoof we saw today. The target? The FBI. If this works, I’m sure we’ll see lots more like this one – the CIA, Homeland Security, NASA (have aliens stolen my identity?).


How Not To Deface a Phishing Site

My job developing anti-phishing software is an almost constant source of amusement. Clueless phishers provide the most enjoyment, but sometimes we see clueless vigilantes.

DefaceHere’s a screenshot of a PayPal spoof that looks like its been defaced by a vigilante or sysadmin. Most defacers will warn users and disable the phishing site so it can’t hurt anyone. In this case, the defacer just posted a warning (at the top, and also gives a phone number to call in case anyone wants to help catch the phisher) but then he/she leaves the site intact, so it can still swipe user credentials. That’s like finding a hole in road that someone could fall into and only putting up a warning sign – fill in the hole with dirt too!

So here’s defacing-a-phishing-site law #1: when defacing a phishing site, make sure you break it so no one can get hurt.