Êíèãà: Code 2.0
Architectures of Identification
Architectures of Identification
Most who use the Internet have no real sense about whether their behavior is monitored, or traceable. Instead, the experience of the Net suggests anonymity. Wikipedia doesn’t say “Welcome Back, Larry” when I surf to its site to look up an entry, and neither does Google. Most, I expect, take this lack of acknowledgement to mean that no one is noticing.
But appearances are quite deceiving. In fact, as the Internet has matured, the technologies for linking behavior with an identity have increased dramatically. You can still take steps to assure anonymity on the Net, and many depend upon that ability to do good (human rights workers in Burma) or evil (coordinating terrorist plots). But to achieve that anonymity takes effort. For most of us, our use of the Internet has been made at least traceable in ways most of us would never even consider possible.
Consider first the traceability resulting from the basic protocols of the Internet — TCP/IP. Whenever you make a request to view a page on the Web, the web server needs to know where to sent the packets of data that will appear as a web page in your browser. Your computer thus tells the web server where you are — in IP space at least — by revealing an IP address.
As I’ve already described, the IP address itself doesn’t reveal anything about who you are, or where in physical space you come from. But it does enable a certain kind of trace. If (1) you have gotten access to the web through an Internet Service Provider (ISP) that assigns you an IP address while you’re on the Internet and (2) that ISP keeps the logs of that assignment, then it’s perfectly possible to trace your surfing back to you.
How?
Well, imagine you’re angry at your boss. You think she’s a blowhard who is driving the company into bankruptcy. After months of frustration, you decide to go public. Not “public” as in a press conference, but public as in a posting to an online forum within which your company is being discussed.
You know you’d get in lots of trouble if your criticism were tied back to you. So you take steps to be “anonymous” on the forum. Maybe you create an account in the forum under a fictitious name, and that fictitious name makes you feel safe. Your boss may see the nasty post, but even if she succeeds in getting the forum host to reveal what you said when you signed up, all that stuff was bogus. Your secret, you believe, is safe.
Wrong. In addition to the identification that your username might, or might not, provide, if the forum is on the web, then it knows the IP address from which you made your post. With that IP address, and the time you made your post, using “a reverse DNS look-up[4]”, it is simple to identify the Internet Service Provider that gave you access to the Internet. And increasingly, it is relatively simple for the Internet Service Provider to check its records to reveal which account was using that IP address at that specified time. Thus, the ISP could (if required) say that it was your account that was using the IP address that posted the nasty message about your boss. Try as you will to deny it (“Hey, on the Internet, no one knows you’re a dog!”), I’d advise you to give up quickly. They’ve got you. You’ve been trapped by the Net. Dog or no, you’re definitely in the doghouse.
Now again, what made this tracing possible? No plan by the NSA. No strategy of Microsoft. Instead, what made this tracing possible was a by-product of the architecture of the Web and the architecture of ISPs charging access to the Web. The Web must know an IP address; ISPs require identification before they assign an IP address to a customer. So long as the log records of the ISP are kept, the transaction is traceable. Bottom line: If you want anonymity, use a pay phone!
This traceability in the Internet raised some important concerns at the beginning of 2006. Google announced it would fight a demand by the government to produce one million sample searches. (MSN and Yahoo! had both complied with the same request.) That request was made as part of an investigation the government was conducting to support its defense of a statute designed to block kids from porn. And though the request promised the data would be used for no other purpose, it raised deep concerns in the Internet community. Depending upon the data that Google kept, the request showed in principle that it was possible to trace legally troubling searches back to individual IP addresses (and to individuals with Google accounts). Thus, for example, if your Internet address at work is a fixed-IP address, then every search you’ve ever made from work is at least possibly kept by Google. Does that make you concerned? And assume for the moment you are not a terrorist: Would you still be concerned?
A link back to an IP address, however, only facilitates tracing, and again, even then not perfect traceability. ISPs don’t keep data for long (ordinarily); some don’t even keep assignment records at all. And if you’ve accessed the Internet at an Internet caf?, then there’s no reason to believe anything could be traced back to you. So still, the Internet provides at least some anonymity.
But IP tracing isn’t the only technology of identification that has been layered onto the Internet. A much more pervasive technology was developed early in the history of the Web to make the web more valuable to commerce and its customers. This is the technology referred to as “cookies.”
When the World Wide Web was first deployed, the protocol simply enabled people to view content that had been marked up in a special programming language. This language (HTML) made it easy to link to other pages, and it made it simple to apply basic formatting to the content (bold, or italics, for example).
But the one thing the protocol didn’t enable was a simple way for a website to know which machines had accessed it. The protocol was “state-less.” When a web server received a request to serve a web page, it didn’t know anything about the state of the requester before that request was made.[5]
From the perspective of privacy, this sounds like a great feature for the Web. Why should a website know anything about me if I go to that site to view certain content? You don’t have to be a criminal to appreciate the value in anonymous browsing. Imagine libraries kept records of every time you opened a book at the library, even for just a second.
Yet from the perspective of commerce, this “feature” of the original Web is plainly a bug, and not because commercial sites necessarily want to know everything there is to know about you. Instead, the problem is much more pragmatic. Say you go to Amazon.com and indicate you want to buy 20 copies of my latest book. (Try it. It’s fun.) Now your “shopping cart” has 20 copies of my book. You then click on the icon to check out, and you notice your shopping cart is empty. Why? Well because, as originally architected, the Web had no easy way to recognize that you were the same entity that just ordered 20 books. Or put differently, the web server would simply forget you. The Web as originally built had no way to remember you from one page to another. And thus, the Web as originally built would not be of much use to commerce.
But as I’ve said again and again, the way the Web was is not the way the Web had to be. And so those who were building the infrastructure of the Web quickly began to think through how the web could be “improved” to make it easy for commerce to happen. “Cookies” were the solution. In 1994, Netscape introduced a protocol to make it possible for a web server to deposit a small bit of data on your computer when you accessed that server. That small bit of data — the “cookie” — made it possible for the server to recognize you when you traveled to a different page. Of course, there are lots of other concerns about what that cookie might enable. We’ll get to those in the chapter about privacy. The point that’s important here, however, is not the dangers this technology creates. The point is the potential and how that potential was built. A small change in the protocol for client-server interaction now makes it possible for websites to monitor and track those who use the site.
This is a small step toward authenticated identity. It’s far from that, but it is a step toward it. Your computer isn’t you (yet). But cookies make it possible for the computer to authenticate that it is the same machine that was accessing a website a moment before. And it is upon this technology that the whole of web commerce initially was built. Servers could now “know” that this machine is the same machine that was here before. And from that knowledge, they could build a great deal of value.
Now again, strictly speaking, cookies are nothing more than a tracing technology. They make it simple to trace a machine across web pages. That tracing doesn’t necessarily reveal any information about the user. Just as we could follow a trail of cookie crumbs in real space to an empty room, a web server could follow a trail of “mouse droppings” from the first entry on the site until the user leaves. In both cases, nothing is necessarily revealed about the user.
But sometimes something important is revealed about the user by association with data stored elsewhere. For example, imagine you enter a site, and it asks you to reveal your name, your telephone number, and your e-mail address as a condition of entering a contest. You trust the website, and do that, and then you leave the website. The next day, you come back, and you browse through a number of pages on that website. In this interaction, of course, you’ve revealed nothing. But if a cookie was deposited on your machine through your browser (and you have not taken steps to remove it), then when you return to the site, the website again “knows” all these facts about you. The cookie traces your machine, and this trace links back to a place where you provided information the machine would not otherwise know.
The traceability of IP addresses and cookies is the default on the Internet now. Again, steps can be taken to avoid this traceability, but the vast majority of us don’t take them. Fortunately, for society and for most of us, what we do on the Net doesn’t really concern anyone. But if it did concern someone, it wouldn’t be hard to track us down. We are a people who leave our “mouse droppings” everywhere.
This default traceability, however, is not enough for some. They require something more. That was Harvard’s view, as I noted in the previous chapter. That is also the view of just about all private networks today. A variety of technologies have developed that enable stronger authentication by those who use the Net. I will describe two of these technologies in this section. But it is the second of these two that will, in my view, prove to be the most important.
The first of these technologies is the Single Sign-on (SSO) technology. This technology allows someone to “sign-on” to a network once, and then get access to a wide range of resources on that network without needing to authenticate again. Think of it as a badge you wear at your place of work. Depending upon what the badge says ( “visitor” or “researcher”) you get different access to different parts of the building. And like a badge at a place of work, you get the credential by giving up other data. You give the receptionist an ID; he gives you a badge; you wear that badge wherever you go while at the business.
The most commonly deployed SSO is a system called Kerberos. But there are many different SSOs out there — Microsoft’s Passport system is an example — and there is a strong push to build federated SSOs for linking many different sites on the Internet. Thus, for example, in a federated system, I might authenticate myself to my university, but then I could move across any domain within the federation without authenticating again. The big advantage in this architecture is that I can authenticate to the institution I trust without spreading lots of data about myself to institutions I don’t trust.
SSOs have been very important in building identity into the Internet. But a second technology, I believe, will become the most important tool for identification in the next ten years. This is because this alternative respects important architectural features of the Internet, and because the demand for better technologies of identification will continue to be strong. Forget the hassle of typing your name and address at every site you want to buy something from. You only need to think about the extraordinary growth in identity theft to recognize there are many who would be eager to see something better come along.
To understand this second system, think first about how credentials work in real space[6]. You’ve got a wallet. In it is likely to be a driver’s license, some credit cards, a health insurance card, an ID for where you work, and, if you’re lucky, some money. Each of these cards can be used to authenticate some fact about you — again, with very different levels of confidence. The driver’s license has a picture and a list of physical characteristics. That’s enough for a wine store, but not enough for the NSA. The credit card has your signature. Vendors are supposed to use that data to authenticate that the person who signs the bill is the owner of the card. If the vendor becomes suspicious, she might demand that you show an ID as well.
Notice the critical features of this “wallet” architecture. First, these credentials are issued by different entities. Second, depending upon their technology, they offer different levels of confidence. Third, I’m free to use these credentials in ways never originally planned or intended by the issuer of the credential. The Department of Motor Vehicles never coordinated with Visa to enable driver’s licenses to be used to authenticate the holder of a credit card. But once the one was prevalent, the other could use it. And fourth, nothing requires that I show all my cards when I can use just one. That is, to show my driver’s license, I don’t also reveal my health insurance card. Or to use my Visa, I don’t also have to reveal my American Express card.
These same features are at the core of what may prove to be the most important addition to the effective architecture of the Internet since its birth. This is a project being led by Microsoft to essentially develop an Identity Metasystem — a new layer of the Internet, an Identity Layer, that would complement the existing network layers to add a new kind of functionality. This Identity Layer is not Microsoft Passport, or some other Single Sign-On technology. Instead it is a protocol to enable a kind of virtual wallet of credentials, with all the same attributes of the credentials in your wallet — except better. This virtual wallet will not only be more reliable than the wallet in your pocket, it will also give you the ability to control more precisely what data about you is revealed to those who demand data about you.
For example, in real space, your wallet can easily be stolen. If it’s stolen, then there’s a period of time when it’s relatively easy for the thief to use the cards to buy stuff. In cyberspace, these wallets are not easily stolen. Indeed, if they’re architected well, it would be practically impossible to “steal” them. Remove the cards from their holder, and they become useless digital objects.
Or again, in real space, if you want to authenticate that you’re over 21 and therefore can buy a six-pack of beer, you show the clerk your driver’s license. With that, he authenticates your age. But with that bit of data, he also gets access to your name, your address, and in some states, your social security number. Those other bits of data are not necessary for him to know. In some contexts, depending on how creepy he is, these data are exactly the sort you don’t want him to know. But the inefficiencies of real-space technologies reveal these data. This loss of privacy is a cost of doing business.
The virtual wallet would be different. If you need to authenticate your age, the technology could authenticate that fact alone — indeed, it could authenticate simply that you’re over 21, or over 65, or under 18, without revealing anything more. Or if you need to authenticate your citizenship, that fact can be certified without revealing your name, or where you live, or your passport number. The technology is crafted to reveal just what you want it to reveal, without also revealing other stuff. (As one of the key architects for this metasystem, Kim Cameron, described it: “To me, that’s the center of the system.[7]”) And, most importantly, using the power of cryptography, the protocol makes it possible for the other side to be confident about the fact you reveal without requiring any more data.
The brilliance in this solution to the problems of identification is first that it mirrors the basic architecture of the Internet. There’s no central repository for data; there’s no network technology that everyone must adopt. There is instead a platform for building identity technologies that encourages competition among different privacy and security providers — TCP/IP for identity. Microsoft may be leading the project, but anyone can build for this protocol. Nothing ties the protocol to the Windows operating system. Or to any other specific vendor. As Cameron wisely puts it, “it can’t be owned by any one company or any one country . . . or just have the technology stamp of any one engineer.[8]”
The Identity Layer is infrastructure for the Internet. It gives value (and raises concerns) to many beyond Microsoft. But though Microsoft’s work is an important gift to the Internet, the Identity Layer is not altruism. “Microsoft’s strategy is based on web services”, Cameron described to me. “Web services are impossible without identity.[9]” There is important public value here, but private interest is driving the deployment of this public value.
The Identity Layer would benefit individuals, businesses, and the government, but each differently. Individuals could more easily protect themselves from identity theft[10]; if you get an e-mail from PayPal demanding you update your account, you’ll know whether the website is actually PayPal. Or if you want to protect yourself against spam, you could block all e-mail that doesn’t come from an authenticated server. In either case, the technology is increasing confidence about the Internet. And the harms that come from a lack of confidence — mainly fraud — would therefore be reduced.
Commerce too would benefit from this form of technology. It too benefits from the reduction of fraud. And it too would benefit from a more secure infrastructure for conducting online transactions.
And finally, the government would benefit from this infrastructure of trust. If there were a simple way to demand that people authenticate facts about themselves, it would be easier for the government to insist that they do so. If it were easier to have high confidence that the person on the website was who he said he was, then it would be cheaper to deliver certain information across the web.
But while individuals, commerce, and government would all benefit from this sort of technology, there is also something that each could lose.
Individuals right now can be effectively anonymous on the Net. A platform for authenticated identity would make anonymity much harder. We might imagine, for example, a norm developing to block access to a website by anyone not carrying a token that at least made it possible to trace back to the user — a kind of driver’s license for the Internet. That norm, plus this technology, would make anonymous speech extremely difficult.
Commerce could also lose something from this design. To the extent that there are simple ways to authenticate that I am the authorized user of this credit card, for example, it’s less necessary for websites to demand all sorts of data about me — my address, my telephone numbers, and in one case I recently encountered, my birthday. That fact could build a norm against revealing extraneous data. But that data may be valuable to business beyond simply confirming a charge.
And governments, too, may lose something from this architecture of identification. Just as commerce may lose the extra data that individuals need to reveal to authenticate themselves, so too will the government lose that. It may feel that such data is necessary for some other purpose, but gathering it would become more difficult.
Each of these benefits and costs can be adjusted, depending upon how the technology is implemented. And as the resulting mix of privacy and security is the product of competition and an equilibrium between individuals and businesses, there’s no way up front to predict what it will be.
But for our purposes, the only important fact to notice is that this infrastructure could effectively answer the first question that regulability requires answering: Who did what where? With an infrastructure enabling cheap identification wherever you are, the frequency of unidentified activity falls dramatically.
This final example of an identification technology throws into relief an important fact about encryption technology. The Identity Layer depends upon cryptography. It thus demonstrates the sense in which cryptography is Janus-faced. As Stewart Baker and Paul Hurst put it, cryptography “surely is the best of technologies and the worst of technologies. It will stop crimes and it will create new crimes. It will undermine dictatorships, and it will drive them to new excesses. It will make us all anonymous, and it will track our every transaction.[11]”
Cryptography can be all these things, both good and bad, because encryption can serve two fundamentally different ends. In its “confidentiality” function it can be “used to keep communications secret.” In its “identification” function it can be “used to provide forgery-proof digital identities.[12]” It enables freedom from regulation (as it enhances confidentiality), but it can also enable more efficient regulation (as it enhances identification).[13]
Its traditional use is secrets. Encrypt a message, and only those with the proper key can open and read it. This type of encryption has been around as long as language itself. But until the mid-1970s it suffered from an important weakness: the same key that was used to encrypt a message was also used to decrypt it. So if you lost that key, all the messages hidden with that key were also rendered vulnerable. If a large number of messages were encrypted with the same key, losing the key compromised the whole archive of secrets protected by the key. This risk was significant. You always had to “transport” the key needed to unlock the message, and inherent in that transport was the risk that the key would be lost.
In the mid-1970s, however, a breakthrough in encryption technique was announced by two computer scientists, Whitfield Diffie and Martin Hellman[14]. Rather than relying on a single key, the Diffie-Hellman system used two keys — one public, the other private. What is encrypted with one can be decrypted only with the other. Even with one key there is no way to infer the other.
This discovery was the clue to an architecture that could build an extraordinary range of confidence into any network, whether or not the physical network itself was secure[15]. And again, that confidence could both make me confident that my secrets won’t be revealed and make me confident that the person using my site just now is you. The technology therefore works to keep secrets, but it also makes it harder to keep secrets. It works to make stuff less regulable, and more regulable.
In the Internet’s first life, encryption technology was on the side of privacy. Its most common use was to keep information secret. But in the Internet’s next life, encryption technology’s most important role will be in making the Net more regulable. As an Identity Layer gets built into the Net, the easy ability to demand some form of identity as a condition to accessing the resources of the Net increases. As that ability increases, its prevalence will increase as well. Indeed, as Shawn Helms describes, the next generation of the Internet Protocol — IPv6 — “marks each packet with an encryption ‘key’ that cannot be altered or forged, thus securely identifying the packet’s origin. This authentication function can identify every sender and receiver of information over the Internet, thus making it nearly impossible for people to remain anonymous on the Internet.[16]”
And even if not impossible, sufficiently difficult for the vast majority of us. Our packets will be marked. We — or something about us — will be known.
- 3.2.13. Other Architectures
- 16.4.1. Other Architectures
- 1.4 Microcontroller Architectures
- 8.1 Speed Identification on the Bus
- 6.5.1 User and terminal identification
- How Architectures Matter and Spaces Differ
- ESX Network Storage Architectures: Fibre Channel, iSCSI, and NAS
- Chapter 4. Architectures Of Control
- Who did What, Where?