May 16, 2006

The Database and the Telephone Company

Kim DuToit has the definitive smackdown on the whole NSA Telephone database hype. From The Other Side of Kim:
Database ClueBat
I don’t know much about a lot of stuff, but I know a great deal about databases and how to use them—and I especially know a great deal about how to manage usage of terrabytes of data. In a past life, I ran a customer database of grocery purchases (those annoying little loyalty cards that most supermarkets use to collect your data).

Just so we’re all clear on this concept: the average supermarket carries about 40,000 different items (called stock-keeping units, or SKUs), and the average supermarket processes about one million transactions (sometimes called “baskets") a year. The chain I last did this for on a full-time basis had just under 300 stores, and a database of about 3 million active customers ("active" defined as anyone who shopped with us at least once over the past six months).

A lot has been written about how these programs intrude on people’s privacy, and how this means that your shopping purchases can be tracked. Allow me to reassure you: almost nobody ever looks at a single customer’s item purchases—there are just too many items, and too many customers.

What I did was design ways to make data management easier—it’s what I still do—and I always operated on the 80:20 principle (that 20% of the people will account for 80% of the activity).

Which meant that I ignored 80% of all customers’ information. I was only interested in those people who spent a lot of money with us (the 20%), because the data showed that not only did those people account for 80% of sales, they accounted for about 98% of our profits.

And the reason I only looked at that group was that if I could effect a change in their behavior (get them to spend a little more each week, for instance), the effect on the entire business was disproportionate to the effort involved.

More to the point, in all that time, I can count on two hands the number of times each year that I ever looked at any single customer’s purchases—and even then, it was to check the data, or for a merchandising purpose. Here’s an example: suppose the buyers decided that a particular item wasn’t selling, and they decided to discontinue ("de-list") the item in favor of one which was selling more, or to give the slow item’s shelf space to an existing best-seller. Good, sound merchandising.

However, if that item was being bought by our best customers, then I would argue for the item to be kept in stock, because if the customer didn’t find it at our store, she would go and find it somewhere else and we could, potentially, lose that “best” customer to our competitor—which was our biggest nightmare.
Bona-Fides established, he then gets on to the meat of the matter:
The reason they’ve been collecting this data since 9/11 was because someone at NSA was being really, really smart: if terrorists are communicating by phone, it’s possible to establish linkages between numbers, and install pattern-recognition software to collect those linkages. And the reason that this was a smart thing to do is a simple one: the phone company doesn’t store this data beyond (maybe) a few years—the amount is just too massive to hold forever—and lest we forget, we’re coming up on the 5th anniversary of 9/11 already.

Note that none of this requires any names, nor the content of the calls—that would be the privacy of the thing, and that’s where it seems that the NSA, if they’re telling the truth, has been quite circumspect.

But what this data gives the smart analyst is that when you establish that (357) 243-3006 belonged to Abdul El-Bomba, who received a call from his brother Aziz, a known member of Hezbollah in Syria, you now have the ability to focus only on all the calls Abdul made and received, to see who was calling him and whom he was calling. That would be a couple hundred calls, out of the (literally) tens of billions of records you’ve collected.

Here’s the Big Clue for the Clueless: if you don’t collect all the data, you can’t narrow the search at all. And it’s only once you’ve established that Abdul is a Bad Guy that you ascertain his number, and the numbers of his correspondents, and their names. Most of the calls will be innocent: the dry cleaners, the gas company, the liquor store, whatever.

But out of the couple hundred calls, you may find five that are to Mohamed Semmteks, and to Tariq Pilota, who are also terrorists, and whose calls you can now start investigating.

So from tens of billions to a couple hundred to five. And in these cases, it’s NOW when you, as the investigator, can get a warrant for a wiretap so you can start listening to actual content, which, out of all the data mentioned so far, is the only part protected by the First Amendment.

That’s how to do it—and more importantly, that’s the only way to do it when you’re starting from scratch.
No violation of 1st Amendment and it is helping to keep this nation secure. What's not to love... If you feel outraged by this, go here now. Or emigrate like many of you promised but so few of you actually did. Posted by DaveH at May 16, 2006 12:27 AM
Comments

Not the point. The point is that we have a history of misusing every tool for political purposes. That is the nature of man, an untrustworthy creature. It is why the Founding Fathers designed the Constitution to protect us from our own government.

Remember your Benjamin Franklin: "They who would give up an essential liberty for temporary security, deserve neither liberty or security."

Posted by: Don McArthur at May 16, 2006 6:26 AM
Post a comment









Remember personal info?