CHAPTER 1	Introduction


1.1 The 
fundamental 
premise

Technology does not exist in a social vacuum. The design and patterns of use of any 
particular technological artifact have implications both for the direct users of the tech-
nology, and for society at large. Decisions made by technology designers and imple-
mentors thus have political implications that are often ignored. If these implications 
are not made a part of the design process, the resulting effects on society can be quite 
undesirable.


The research advanced here therefore begins with a political decision: It is almost 
always a greater social good to protect personal information against unauthorized dis-
closure than it is to allow such disclosure. This decision is expressly in conflict with 
those of many businesses and government entities. Starting from this premise, a 
multi-agent architecture was designed that uses both strong cryptography and decen-
tralization to enable a broad class of Internet-based software applications to handle 
personal information in a way that is highly resistant to disclosure. Further, the design 
is robust in ways that can enable users to trust it more easily: They can trust it to keep 
private information private, and they can trust that no single entity can take the system 
away from them. Thus, by starting with the explicit political goal of encouraging 
well-placed user trust, the research described here not only makes its social choices 
clear, it also demonstrates certain technical advantages over more traditional 
approaches.


We discuss the political and technical background of this research, and explain what 
sorts of applications are enabled by the multi-agent architecture proposed. We then 
describe a representative example of this architecture---the Yenta matchmaking sys-
tem. Yenta uses the coordinated interaction of large numbers of agents to form coali-
tions of users across the Internet who share common interests, and then enables both 
one-to-one and group conversations among them. It does so with a high degree of pri-
vacy, security, and robustness, without requiring its users to place unwarranted trust in 
any single point in the system.


The research advanced here attempts to break a false dichotomy, in which systems 
designers force their users to sacrifice some part of a fundamental right -- their pri-
vacy -- in order to gain some utility -- the use of the application. We demonstrate that, 
for a broad class of applications, which we carefully describe, this dichotomy is 
indeed false -- that there is no reason for users to have to make such a decision, and no 
reason for systems designers to force it upon them.


If systems architects understand that there is not necessarily a dichotomy between pri-
vacy and functionality, then they will no longer state a policy decision -- whether to 
ask users to give up a right -- as a technical decision -- one required by the nature of 
the technology. Casting decisions of corporate or government policy as technical 
decisions has confused public debate about a number of technologies. This work 
attempts to undo some of this confusion.


The research presented here is thus intended to serve as an exemplar. The techniques 
presented here, and the sample application which demonstrates them, are intended to 
serve as examples for other systems architects who design systems that must manipu-
late large quantities of personal information.

1.2 What's ahead?

In this chapter, we shall:

Section 1.3

o	Describe which type of privacy we are most interested in protecting

Section 1.4

o	Discuss the concept of privacy as a right, not a privilege

Section 1.5

o	Show some of the technical, social, and political problems with centralized manip-
ulation of personal information

Section 1.6

o	Show some of the advantages of a decentralized solution

Section 1.7

o	Discuss the components of the work presented here, specifically its architecture, 
the sample application of that architecture, the implementation of that application, 
and issues of deployment and evaluation

Section 1.7

o	Briefly summarize the remaining chapters of this dissertation


Later chapters will:

Chapter 2

o	Discuss the system architecture for the general case

Chapter 3

o	Analyze user privacy and system security

Chapter 4

o	Detail the sample application -- the matchmaking system Yenta

Chapter 5

o	Discuss the evaluation of the architecture and of Yenta

Chapter 6

o	Examine some related work

Chapter 7

o	Draw some general conclusions

1.3 What are we 
protecting?

Privacy means different things to different people, and can be invoked in many con-
texts. We define privacy here as the protection of identifiable, personal information 
about a particular person from disclosure to third parties who are not the intended 
recipients of this information. This sentence deserves explanation, and we shall 
explain it below. We shall also touch upon some related concepts, such as trust and 
anonymity, which are required in this explanation.

Protection

Protecting a piece of information means keeping it from being transmitted to certain 
parties. Which parties are not supposed to have the information is dependent upon the 
wishes of the information's owner. This process is transitive -- if party A willingly 
transmits some information about itself to party B, but party B then transmits this 
information to some party C, which A did not wish to know it, then the information 
has not been protected. Such issues of transitivity thus lead to issues of trust (see 
below) and issues of assignment of blame -- whether the fault is in A (who trusted B 
not to disclose the information, and had this trust violated) or in B (who disclosed the 
information without authorization to C), or in both, depends on our goal in asking the 
question.

Identifiability

Unlinkability

In many cases, disclosure of information is acceptable if the information cannot be 
traced to the individual about whom the information refers -- we refer to this as 
unlinkability. This is obvious in, for example, the United States Census, which, ide-
ally, asks a number of questions about every citizen in the country. These answers to 
these questions are often considered by those who answer them to be private informa-
tion, but they are willing to answer them for two reasons: The collection of the infor-
mation is deemed to have utility for the country as a whole, and the collectors of the 
information make assurances that the information will not be identifiable, meaning 
that it will not be possible to know which individual answered any given question in 
any particular way -- the respondents are anonymous. Because the Census data is 
gathered in a centralized fashion, it leads to a concentration of value which makes 
trust an important issue: central concentrations of data are more subject to institu-
tional abuse, and make more tempting targets for outsiders to compromise.

Particular person

Whether or not the information is about a particular person -- someone how is identi-
fiable and is linkable to the information -- or is instead about an aggregate can make a 
large difference in its sensitivity to disclosure. Aggregate information is usually con-
sidered less sensitive -- although cross-correlation between separate databases which 
talk about the same individuals can often be extremely effective at revealing individu-
als again in the data, and represent a serious threat to systems which depend for their 
security solely on aggregation of data [169].

Personal information

When we use the term personal information, we mean information that is known by 
some particular individual about himself, or which is known to some set of parties 
who that individual considers to be authorized to know it. If no one else knows this 
information yet, the individual is said to control this information, since its disclosure 
to anyone else is presumably, at this moment, completely up to the individual himself. 
We are not referring to the situation whereby party A knows something about party B 
that B does not know about himself. Such situations might arise, for example, in the 
context of medical data which is known to a physician but has not yet (or, perhaps is 
not ever) revealed to the patent. In this case, B cannot possibly protect this informa-
tion from disclosure, for two reasons: B does not have it, and because the information 
is known by someone who may or may not be under A's control.

Disclosure

If personal information about someone is not disclosed, then it is known only to the 
originator of that information. In this case, the information is still private. One of the 
central problems addressed by this dissertation is how to disclose certain information 
so that it may be used in an application, while still giving the subject control over it.

Third parties

Many existing applications which handle personal information do so by surrendering 
it, in one way or another, to a third party. This work attempts to demonstrate that this 
is not always required. In many instances, there is no need to know -- knowledge of 
this information by the third party will not benefit the person whom this information 
is about. We usually use the term third party to mean some other entity which does 
not have a compelling need to know.

Intended recipients

The intended recipient of some information is the party which the subject desires to 
have some piece of personal information. If the set of intended recipients is empty, 
then the information is totally private, and, barring involuntary disclosures such as 
search and seizure, the information will stay private. The work presented here con-
cerns cases where, for whatever reason, the set of intended recipients is nonempty.

Trust

Whenever private information is surrendered to an intended recipient, the subject 
trusts the recipient, to one degree or another, not to disclose this information to third 
parties. (If the subject has no trust in the recipient at all, but discloses anyway, either 
the subject is acting against his own best interests, or the information was not actually 
private to begin with -- in other words, if the information is public and it does not mat-
ter who knows it, then there is no issue of trust.) Trust can be misplaced. A robust 
solution in any system, social or technological, that handles private information gen-
erally specifies that trust be extended to as few entities, in as minimal a way as possi-
ble to each one. This minimizes the probability of disclosure and the degree of 
damage that can be done by disclosure due to a violation of the trust extended by the 
subject.

Anonymity and
pseudonymity

In discussing unlinkability of information, such as that expected by respondents to the 
US Census, we mentioned that the respondents trust that they are anonymous. To be 
fully anonymous is to know that information about oneself cannot be associated with 
one's physical extension -- the actual individual's body -- or with any other anony-
mous individual -- all anonymous individuals, to a first approximation, might as well 
be the same person. This also means that the individual's real-world personal reputa-
tion, and any identities in the virtual world (such as electronic mail identification), are 
similarly dissociated from the information. Full anonymity is not always possible, or 
desired, in all applications -- for example, most participants in a MUD are pseudony-
mous [20][33][49][59][60][116]. This means that they possess one or more identities, 
which may be distinguished from other identities in the MUD (hence are not fully 
anonymous), but which may not be associated with the individual's true physical 
extension. The remailer operated at penet.fi.net [77], for example, also used pseud-
onyms. There are even works of fiction whose primary focus is the mapping between 
pseudonyms and so-called true names in a virtual environment [176].

Reputations

The reason why the distinction between anonymity, pseudonymity, and true names 
matters has to do with reputations. In a loose sense, one's reputation is some collec-
tion of personally-identifiable information that is associated, across long timespans, 
with one's identity, and is known to a possibly-large number of others. In the absence 
of any sort of pseudonymous or anonymous identities, such reputations are directly 
associated with one's physical extension. This provides some degree of accountability 
for one's behavior, and can be either an advantage or a disadvantage, depending on 
that behavior -- those with good reputations in their community are generally afforded 
greater access to resources, be they social or physical capital, than those with poor 
reputations. Pseudonymous and anonymous identities provide a degree of decoupling 
between the actions of their owners and the public identity. Such decoupling can be 
invaluable in cases where one wishes to take an action that might land the physical 
extension in trouble. This decoupling has a cost: because a pseudonym, and, particu-
larly, an anonym, is easier to throw away than one's real name or one's body, they are 
often afforded a lower degree of trust by others.

A legal definition

Another way to look at the question of what we are protecting is to examine legal def-
initions. For a US-centric perspective, consider this definition from Black's Law Dic-
tionary [14]:


Privacy, Right of:


The right to be let alone; the right of a person to be free from unwanted pub-
licity; and right to live without unwarranted interference by the public in 
matters with which the public is not necessarily concerned. Term 'right of 
privacy' is generic term encompassing various rights recognized to be inher-
ent in concept of ordered liberty, and such right prevents governmental inter-
ference in intimate personal relationships or activities, freedoms of 
individual to make fundamental choices involving himself, his family, and 
his relationship with others. Industrial Foundation of the South v. Texas 
Indus. Acc. Bd., Tex., 540 S.W.2d 668, 679. The right of an individual (or 
corporation) to withhold himself and his property from public scrutiny, if he 
so chooses.


It is said to exist only so far as its assertion is consistent with law or public 
policy and in a proper case equity will interfere, if there is no remedy at law, 
to prevent an injury threatened by the invasion of, or infringement upon, this 
right from motives of curiosity, gain, or malice. Federal Trade Commission 
v. American Tobacco Co., 264 U.S. 298, 44 S.Ct. 336, 68 L.Ed. 696. While 
there is no right of privacy found in any specific guarantees of the Constitu-
tion, the Supreme Court has recognized that zones of privacy may be created 
by more specific constitutional guarantees and thereby impose limits on gov-
ernmental power. Paul v. Davis 424 U.S. 693, 712, 96 S.Ct. 1155, 1166, 47 
L.Ed.2d 405; Whalen v. Roe, 429 U.S. 589, 97 S.Ct. 869, 51 L.Ed.2d 64. See 
also Warren and Brandeis, The Right to Privacy, 4 Harv.L.Rev. 193.


Tort actions for invasion of privacy fall into four general classes: Appropria-
tion, consisting of appropriation, for the defendant's benefit or advantage, of 
the plaintiff's name or likeness. Carlisle v. Fawcett Publications, 201 Cal. 
App2d 733, 20 Cal. Rptr 405. Intrusion [ . . . ] Public disclosure of private 
facts, consisting of a cause of action in publicity, of a highly objectionable 
kind, given to private information about the plaintiff, even though it is true 
and no action would lie for defamation. Melvin v. Reid 112 Cal. App. 285, 
297 P. 91. [ . . . ] False light in the public eye [ . . . ]


1.4 The right to 
privacy

Why is personal privacy worth protecting? Is it a right, which cannot be taken away, 
or a privilege, to be granted or rescinded based on governmental authority?

Constitutional arguments

In the United States, there is substantial legal basis that personal privacy is considered 
a right, not a privilege. Consider the Fourth Amendment to the US Constitution, 
which reads:


The right of the people to be secure in their persons, houses, papers, and 
effects, against unreasonable searches and seizures, shall not be violated, and 
no Warrants shall issue, but upon probable cause, supported by Oath or affir-
mation, and particularly describing the place to be searched, and the persons 
or things to be seized.

While this passage is the most obvious such instance in the Bill of Rights, it does not 
explicitly proclaim that privacy itself is a right.


There are ample other examples from Constitutional law, however, which have 
extended the rights granted implicitly by passages such as the Fourth Amendment 
above. Supreme Court Justice Brandeis, for example, writing in the 1890's and later, 
virtually created the concept of a Constitutional right to privacy [180]. For example, 
consider this quote, from Olmstead v. United States [130], writing about the then-new 
technology of telephone wiretapping:


The evil incident to invasion of the privacy of the telephone is far greater 
than that involved in tampering with the mails. Whenever a telephone line is 
tapped, the privacy of the persons at both ends of the line is invaded, and all 
conversations between them upon any subject, and although proper, confi-
dential, and privileged, may be overheard. Moreover, the tapping of one 
man's telephone line involves the tapping of the telephone of every other 
person whom he may call, or who may call him. As a means of espionage, 
writs of assistance and general warrants are but puny instruments of tyranny 
and oppression when compared with wire tapping.


Later examples supporting this view include Griswald v. Connecticut [71], in which 
the Supreme Court struck down a Connecticut statue making it a crime to use or coun-
sel anyone in the use of contraceptives; and Roe v. Wade [147], which specified that 
there is a Constitutionally-guaranteed right to a personal sphere of privacy, which may 
not be breached by government intervention.

Moral and functional 
arguments

But the laws of the United States are not the only basis upon which one may justify a 
right to privacy -- for one thing, they are only valid in regions in which the United 
States government is sovereign. It is the author's contention that there is a moral right 
to privacy, even in the absence of law to that effect, and furthermore that, even in the 
absence of such a right, it is a social good that personal privacy exists and is pro-
tected -- in other words, that personal privacy has a functional benefit. In other words, 
even if one were to state that there is no legal or moral reason to be supportive of per-
sonal privacy, society functions in a more productive manner if its members are 
assured that personal privacy can exist. For example, there are spheres of privacy sur-
rounding doctor/patient and attorney/client information which are viewed as so 
important that they are codified into the legal system of many countries. Without such 
assurances of confidentiality, certain information might not be exchanged, which 
would lead to an impairment of the utility of the consultation. 


One might also argue that the fear of surveillance is itself destructive, and that privacy 
is a requirement for many sorts of social relations. For example, consider Fried [64]:


Privacy is not just one possible means among others to insure some other 
value, but . . . it is necessarily related to ends and relations of the most funda-
mental sort: respect, love, friendship and trust. Privacy is not merely a good 
technique for furthering these fundamental relations; rather without privacy 
they are simply inconceivable.


For the purposes of this work, we shall take such moral and social-good assertions as 
axioms, e.g., not requiring further justification.

Implications for systems 
architects

Those who design systems which handle personal information therefore have a spe-
cial duty: They must not design systems which unnecessarily require, induce, per-
suade, or coerce individuals into giving up personal privacy in order to avail 
themselves of the benefits of the system being designed. In other words, system archi-
tects have a moral, ethical, and perhaps even -- in certain European countries, which 
have stronger data privacy laws than the US -- legal obligations to design such sys-
tems from a standpoint that is protective of individual privacy when it is possible to do 
so.


There may be strong motives not to design systems in such a fashion that they are pro-
tective of personal privacy. We shall investigate some of the motives, with examples, 
in the next section, but overall themes include:

See Section 1.5.

o	It is often conceptually far simpler to design a system which centralizes informa-
tion, yet such systems are often easily compromiseable, either through accident, 
malice, or subpoena.


o	The architects of many systems often have an incentive to violate users' privacy, 
often on a large scale. The business models of many commercial entities, especially 
in the United States, depend on the collection of personal information in order to 
obtain marketing or demographic data, and many entities, such as credit bureaus, 
exist solely to disseminate this information to third parties. The European Union 
has data-protection laws forbidding this [47].


o	Government intervention may dictate that users' privacy be compromised on a 
large scale. CALEA [21] is a single, well-known example; it requires that US tele-
phone switch manufacturers make their switches so-called tap-ready.

Hiding policy decisions 
under a veil of techno-
logical necessity

An example from the 
Intelligent Transportation 
System infrastructure

In many instances, the underlying motives which lead to a system design that is likely 
to compromise users' privacy are hidden from view. Instead of being clearly articu-
lated as decisions of policy, they are presented as requirements of the particular tech-
nological implementation of the system. For example, consider most Intelligent 
Transportation Systems [18], such as automated tollbooths which collect fees for use 
of roads. These systems mount a transponder in the car, and a similar unit in the toll-
booth. It is possible, using essentially the same hardware on both the cars and in the 
tollbooths, to either have a cash-based system or a credit-based system. A cash-based 
system works like Metrocards in many subways -- users fill up the card with cash (in 
this case, cryptographically-based electronic cash in the memory of the car's tran-
sponder), and tollbooths instruct the card to debit itself, possibly using a crypto-
graphic protocol to ensure that neither the tollbooth nor the car can easily cheat. A 
transaction-based system, on the other hand, assigns a unique identifier to each car, 
linked to a driver's name and address, and the car's transponder then sends this ID to 
the tollbooth. Bills are sent to the user's home at the end of the month.

Cash vs credit

In other words, a cash-based system works like real, physical cash, and can be easily 
anonymous -- users simply go somewhere to fill up their transponders, and do not 
need to identify themselves if they hand over physical cash as their part of the transac-
tion. Even if they use a telephone link and a credit card to refill their transponders at 
home, a particular user is not necessarily linked to a particular transponder if the 
cryptography is done right. And even if there is such a linkage between users and tran-
sponders, there is no need for the system as a whole to know where any particular 
transponder has been -- once the tollbooth decides to clear the car, there is no reason 
for any part of the system to remember that fact. On the other hand, a credit-based 
system works like a credit card -- each tollbooth must report back to some central 
authority that a particular transponder went through it, and it is extremely likely that 
which tollbooth made this report will be recorded as well.

Same hardware either 
way; cash is actually 
simpler

Both cash- and credit-based systems can use the same hardware at both the car and 
the tollbooth; the difference is simply one of software. In fact, the cash-based system 
is simpler, because each tollbooth need not communicate in real-time with a central 
database somewhere. (Tollbooths in either system must have a way of either detaining 
cars with empty or missing transponders, or logging license plates for later enforce-
ment, but the latter need not require a real-time connection for the tollbooth to func-
tion.) Furthermore, a cash-based system obviates the need for printing and mailing 
bills, processing collections, and so forth.

ITS RFP's implicitly 
assume that drivers should 
be tracked

Yet it is almost invariably the case that requests for proposals, issued when such sys-
tems are in the preliminary planning stages, simply assume a credit-based system, and 
often disallow proposals which can enable a cash-based system. This means that such 
systems, from the very beginning, are implicitly designed to enable tracking the 
movements of all drivers who use them, since, after all, each tollbooth must remember 
this information for billing purposes. Furthermore, drivers are likely to demand item-
ized bills, so they can verify the accuracy of the data. (After all, it is no longer the case 
that they need worry only about the contents of their local transponder -- they must 
worry about the central database, too.) Yet such a system can easily be used, either by 
someone with access to the bill mailed to an individual, or via subpoena or compro-
mise at the central database, to stalk someone or to misuse knowledge about where 
the individual has been, and when. Large-scale data mining of such systems can 
infringe on people's freedom of assembly, by making particular driving patterns 
inherently suspicious -- imagine the case whereby anyone taking an uncommon exit 
on a particular day and time is implicitly assumed to have been going to the nearby 
political rally. And even the lack of a record of a particular transit has already been 
used in court proceedings [18].

ITS RFP's are setting 
policy, not responding to 
technological necessity

The aim of the work presented in this dissertation is the demonstration that many, if 
not most, of these systems can be technically realized in forms that are as protective 
of users' individual privacy as one might wish. Therefore, designers of systems who 
fail to ensure their users' privacy are making a policy decision, not a technical one: 
they have decided that their users are not entitled to as much personal privacy as is 
possible to provide, and are implementing this decision by virtue of the architecture 
of the system.

Unnecessary polarization 
of the terms of the debate

While it is the author's contention that most such decisions are, at best, misguided, 
and at worst unethical, the fact that they are often disguised as purely technical issues 
polarizes the debate unnecessarily and is not a social good. If some system, whose 
capabilities would improve the lives of its users, is falsely presented as necessarily 
requiring them to give up some part of a fundamental right in order to be used, then 
debate about whether or not to implement or use the system is likewise directed into a 
false dichotomy. By allowing debate to be thus polarized, and by requiring users to 
trade off capabilities against rights, it is the author's contention that the designers and 
implementors of such a system are engaging in unethical behavior.

Legitimate reasons against 
absolute personal privacy

There may be many legitimate reasons why absolute privacy a system's users is unde-
sirable. It is not the aim of this work to assert that there are no circumstances under 
which personal privacy may be violated; indeed, the moral and legal framework of the 
vast majority of countries presupposes that there must be a balancing between the 
interests of the individual in complete personal privacy, and those of the state or sov-
ereign state in revealing certain information about an individual to third parties.

This work aims to 
decouple technical 
necessity from decisions 
of policy

However, we should be clear about the nature of this balancing. It should be dictated 
by a decision-making process which is one of policy. In other words, what is the 
desired outcome? It should not instead be falsely driven by assertions about what the 
technology forces us to do. The aim of this research is to decouple these two issues, 
for a broad class of potential applications, and to demonstrate by example that techno-
logical issues need not force our hand when it comes to policy issues. Such a demon-
stration by example, it is hoped, will also make clearer the ethical implications of 
designing a system which is insufficiently protective of the personal privacy of its 
users.

1.5 The problems 
with centralized 
solutions

It is often the case that applications which must handle information from many 
sources choose a centralized system architecture to accomplish the computation. 
Using a single, central accumulation point for information can have a number of 
advantages for the developer:

Why centralized solutions 
are handy

o	It is easy to know where the information is

o	Many algorithms are easy to express when one may trivially map over all the data 
in a single operation


o	There is no problem of coordination of resources -- all clients simply know where 
the central server is, and go there


Unfortunately, such a centralized organization has two important limitations, namely 
reliability and trust. Reliability is an issue in almost any system, regardless of the kind 
of information it handles, whereas trust is more of a serious concern in systems which 
must handle confidential information.

Reliability

A single, central point also implies a single point of failure. If the central point goes 
down, so does the entire system. Further, central points can suffer overload, which 
means that all clients experience slowdown at best, or failure at worst. And in systems 
where, for example, answering any query involves mapping over all or most of the 
database in a linear fashion, increasing the number of clients tends to cause load on 
the server to grow as O(n2).


Because of issues like this, actual large systems, be they software, business models, or 
political organizations, are often divided into a hierarchical arrangement, where sub-
stantial processing is done at nodes far from any center -- if there even is a center to 
the entire system. For example, while typical banks are highly centralized, single enti-
ties -- there is one master database of the value of each account-holder's assets -- there 
is not a single central bank for the entire world. Similarly, the Internet gets a great 
deal of its robustness from its lack of centralization -- for example, there is not a sin-
gle, central packet router somewhere that routes all packets in the entire network.

Trust

Of greater importance for this work, however, is the issue of trust. We use the defini-
tion of trust advanced in Section 1.3, namely, trust that private information will not be 
disclosed.


It is here that centralized systems are at their most vulnerable. By definition, they 
require that the subject of the information surrender it to an entity not under the sub-
ject's direct control. The recipient of this information often makes a promise not to 
disclose this information to unauthorized parties, but this promise is rarely completely 
trustworthy. A simple taxonomy of ways in which the subject's trust in the recipient 
might be misplaced includes:

How might trust be 
violated?

o	Deception by the recipient. It is often the case that the recipient of the information 
is simply dishonest about the uses to which the information will be put.

o	Mission creep. Information is often collected for one purpose, but then used later 
for another, unforeseen purpose. In many instances, there is no notification to the 
original subjects that such repurposing has taken place, nor methods for the sub-
jects to refuse such repurposing. For example, the US Postal Service sells address 
information to direct marketers and other junk-mailers -- it gets this information 
when people file change-of-address forms, and it neither mentions this on the form, 
nor provides any mechanism for users to opt out. Often, the organization itself fails 
to realize the extent of such creep, since it may take place slowly, or only in com-
bination with other, seemingly-separate data-collection efforts that do not lead to 
creep except when combined. Indeed, the US Federal Privacy Act of 1974 [175] 
recognizes that such mission creep can and does take place, and explicitly forbids 
the US government from using information collected for one purpose from being 
used for a different purpose -- how the USPO is allowed to sell change-of-address 
orders to advertisers is thus an interesting question. Note, of course, that this Act 
only forbids the government from doing this -- private corporations and individuals 
are not so enjoined.

o	Accidental disclosure. Accidents happen all the time. Paper that should have been 
shredded is thrown away unshredded, where it is then extracted from the trash and 
read. Laptops are sold at auction with private information still on their disks. Com-
puters get stolen. In one famous case in March 1998, it was revealed that GTE had 
inadvertently disclosed at least 50,000 unlisted telephone numbers in the southern 
California area -- an area in which half of all subscribers pay to have unlisted num-
bers. The disclosure occurred in over 9000 phonebooks leased to telemarketing 
firms, and GTE then attempted to conceal the mistake from its customers while it 
attempted to retrieve the books. The California Public Utilities Commission had the 
authority to fine GTE $20,000 per name disclosed, an enormous, $1B penalty that 
was not actually imposed [9]. In March of 1999, AT&T accidentally disclosed 1800 
email addresses to each other as part of an unsolicited electronic commercial mail-
ing; Nissan did likewise with 24,000 [26].

o	Disclosure by malicious intent. Information can be stolen from those authorized to 
have it by those intent on disseminating it elsewhere. Examples from popular me-
dia reports include, for example, IRS employees poking through the files of famous 
people, and occasionally making the information public outside of the IRS [173]. 
Crackers, who break into others' computer systems, may also reveal information 
that the recipient tried to keep private. There is often significant commercial value 
in the deliberate disclosure of other companies' data; industrial espionage and re-
lated activities can involve determined, well-funded, skilled adversaries whose in-
tent is to compromise corporate secrets -- perhaps to do some stock manipulation 
or trading based on this -- or to reveal information about executives which may be 
deemed damaging enough to be used for blackmail or to force a resignation. Intel-
ligence agencies may extract information in a variety of means, and entities which 
fail to exercise due diligence in strongly encrypting information -- or which are pre-
vented from using strong-enough encryption by rule of law -- may have informa-
tion disclosed while it is being transmitted or stored.

o	Subpoenas. Even though an entity may take extravagant care to protect information 
in its possession, it may still be legally required to surrender this information via a 
subpoena. For example, Federal Express receives several hundred subpoenas a day 
for its shipping records [178] -- an unfortunate situation which is not generally ad-
vertised to their customers. This leads to a very powerful general principle: If you 
don't want to be subpoenaed for something, don't collect it in the first place. Many 
corporations have growing concerns about the archiving of electronic mail, for ex-
ample, and are increasingly adopting policies dictating its deletion after a certain 
interval. The Microsoft antitrust action conducted by the US Department of Justice, 
for example, entered a great many electronic mail messages into evidence in late 
1998, and these are serving as excellent examples of when too much institutional 
memory can be a danger to the institution.


This is hardly a complete list, and many more citations could be provided to demon-
strate that these sorts of things happen all the time. The point here is not a complete 
itemization of all possible privacy violations -- such a list would be immense, and far 
beyond the scope of this work -- but simply to demonstrate that the issue of trusting 
third parties with private information can be fraught with peril.

Is this software, or a 
business model?

Note that the discussion above is not limited to software systems. Replace algorithm 
with business practice, client with customer, and central server with vendor, and you 
have the system architecture of most customer/vendor arrangements. However, we 
shall not further investigate these structural similarities, except to point out that busi-
ness models themselves often have a profound impact on the architecture of an appli-
cation.

1.6 Advantages of a 
decentralized 
solution

Decentralized solutions can assist with both reliability and trust. Let us briefly exam-
ine reliability, and consider a system which does not contain a single, central, physical 
point whose destruction results in the destruction of the system. By definition, there-
fore, a single, physical point of failure cannot destroy this system. This says nothing 
about the system's ability to survive either multiple points of failure, nor its ability to 
survive a single architectural failure (which may have been replicated into every part 
of the resulting system), but it does tend to imply that particular, common failure 
modes of single physical objects -- theft, fire, breakdown, accidents -- are much less 
likely to lead to failure of the system as a whole. This is nothing new; it is simply 
good engineering common sense.


The issue of trust takes more examination. If we can build a system in which personal 
data is distributed, and in which, therefore, no single point in the system possesses all 
of the personal data being handled, then we limit the amount of damage -- disclo-
sure -- that can be accomplished by any single entity, which presumably cannot con-
trol all elements of the system simultaneously. Systems which are physically 
distributed, for example, multiply the work factor required to accomplish a physical 
compromise of their security by the number of distinct locations involved. Similarly, 
systems which distribute their data across multiple administrative boundaries multiply 
the work factor required by an adversary to compromise all of the data stored. In the 
extreme case, for example, a system which distributes data across multiple sovereigns 
(e.g., governments) can help ensure that no single subpoena, no matter how broad, 
can compromise all data -- instead, multiple governments must collude to gain lawful 
access to the data. 

Cypherpunk remailer 
chains

Cypherpunks remailer chains [10][23][66] are example of using multiple sovereigns. 
A remailer chain operates by encrypting a message to its final recipient, but then 
handing it off to a series of intermediate nodes, ideally requiring transmission across 
multiple country boundaries. In one common implementation, each hop's address is 
only decodeable by the hop immediately before it, so it is not possible to determine, 
either before or after the fact, the chain of hops that the message went through. Prop-
erly implemented, no single government could thereby compromise the privacy of 
even a single message in the system, because not all hops would be within the zone of 
authority of any single government.

Costs of a decentralized 
solution

Of course, as applied to the applications we examine in this dissertation, the advan-
tages of a decentralized solution do not come for free. They require pushing intelli-
gence to the leaves -- in other words, that the users whose information we are trying to 
protect have access to their own computers, under their own control. Decentralized 
systems are also somewhat more technically complicated than centralized solutions, 
particularly when it comes to coordination of multiple entities -- for example, how are 
the entities supposed to find each other in the first place? And such solutions may not 
work for all applications formerly handled by centralized solutions, but only for those 
that share particular characteristics. We will investigate each of these issues in later 
chapters.

1.7 A brief summary 
of this research

The purpose of the work in this dissertation is to demonstrate that, for a class of simi-
lar applications, useful work that requires knowledge of others' private information 
may nevertheless be accomplished without requiring any trust in a central point, and 
without requiring very much trust in any single point of the system. In short, such a 
system is robust against violations of trust, unlike most centralized systems.


The work is therefore divided into several aspects, which will be discussed more fully 
in the chapters that follow, and which are summarized in this section:

Chapter 2
Chapter 3

o	An architecture which specifies the general class of applications for which we are 
proposing a solution -- what characteristics are common to those applications 
which we claim to assist? This architecture also includes our threat model -- what 
types of attacks against user privacy we expect, which of those attacks we propose 
to address, and how we will address them.

Chapter 4

o	A sample implementation of this architecture -- the matchmaking system Yenta.

Chapter 5

o	Evaluation of the sample application as deployed, an analysis of the risks that re-
main in the design and implementation, and some speculations on how certain oth-
er applications could be implemented using the architecture we describe.

Chapter 6

o	An examination of related work, both with regard to privacy protection via archi-
tecture, and the sample application's domain of matchmaking.


1.7.1 The architecture 
and its sample application

We present a general architecture for a broad class of applications. The architecture is 
designed to avoid centralizing information in any particular place, while allowing 
multiple agents to collaborate using information that each of them possesses. This 
collaboration is designed to form groups of agents whose users all share some set of 
characteristics. The architecture we describe is particularly useful for protecting per-
sonal information from unauthorized disclosure, but it also has advantages in terms of 
robustness and avoidance of single points of physical failure. In the description below, 
the architecture and the sample application described in this dissertation -- Yenta -- are 
described together.


Such an architecture assumes several traits shared by applications which make use of 
it, of which the most important are the existence of a peer application for each user 
who wishes to participate, running on the user's own workstation; the availability of a 
network; the availability of good cryptography; and a similarity metric which can be 
used to compare some characteristic of users to each other and which enables a partial 
ordering of similarity. The architecture derives much of its strength from its com-
pletely decentralized nature -- no part of it need reside on a central server. Users are 
pseudonymous by default, and agents are assumed to be long-lasting, with permanent 
state that survives crashes and shutdowns. Individual agents participate in a hill-
climbing, word-of-mouth exchange, in which they exchange messages between pairs 
of themselves -- with no central server participating in such exchanges. Agents which 
find themselves to be closely matched form clusters of similar other agents. An agent 
which is not well-matched to a peer can ask the peer for a referral to some other agent 
which is a better match, hence using word-of-mouth, based on the above partial order-
ing of similarities, to aid in the search for a compatible group of other agents.


Once clusters have been formed, agents may send messages into the clusters, commu-
nicating either one-to-one or one-to-many. Yenta uses this capability to enable users to 
have both private and public conversations with each other. Particularly close matches 
can cause one of the participating agents to suggest that the two users be introduced, 
even if the users have not previously exchanged messages -- this helps those who 
never send public messages to participate.


We carefully discuss the threat model facing the architecture and the sample applica-
tion, discussing which attacks are expected and the measures taken to defend against 
them. We also discuss what sorts of attacks are considered outside the scope of this 
research and for which we offer no solution. Strong cryptography is used in many 
places in the design, both to enable confidentiality and authenticity of communica-
tions, and as the infrastructure for a system designed to enable persistent personal rep-
utations. Because public evaluation can make systems significantly more robust and 
more secure, a separate system, named Yvette, was created to make it easier for multi-
ple programmers to publicly evaluate Yenta's implementation; Yvette is not special-
ized to Yenta and may be used to evaluate any system whose source code is public.

1.7.2 Evaluation

The architecture and the sample application have been evaluated in several ways, 
including via simulation and via a pilot deployment to real users. The qualitative and 
quantitative results obtained demonstrate that the system performs well and meets its 
design goals. In addition, several other applications which might make use of the 
underlying architecture are possible and speculations on how they might be imple-
mented are briefly described. We also perform a risk analysis of Yenta and describe 
potential security risks, including some which are explicitly outside of our threat 
model.


Finally, we describe related work, which includes other types of matchmaking sys-
tems, other decentralized systems, and other systems and software that have been 
designed for explicitly political purposes. We then draw some general conclusions.

1.8 Summary

This chapter has presented the social and political motivations for this work, namely 
the protection of certain civil liberties, such as privacy, by starting with such motiva-
tions and then designing technology that can help. We have described what personal 
privacy and its protection means, demonstrated some of the social, political, and tech-
nical problems with centralized solutions, and touched upon some of the advantages 
of decentralized solutions. We have then summarized, very briefly, the work that will 
be presented in later chapters.