CHAPTER 3	Privacy and Security


3.1 Introduction

This chapter addresses privacy and security concerns in the architecture we described 
in Chapter 2. It assumes knowledge of the contents of that chapter, but not necessarily 
in-depth knowledge of modern cryptography or computer security. 


We shall describe:

Section 3.2

o	The nature of the problem, including:

Section 3.2.1

	o	The threat model, such as what attacks we expect, and the difference between 
passive and active attackers

Section 3.2.2

	o	A discussion of how private we are trying to be

Section 3.2.3

	o	Some desiderata for security design in general, and how our architecture makes 
use of them

Section 3.2.4

	o	The problems we are not attempting to solve

Section 3.3

o	Some useful cryptographic techniques, including:

Section 3.3.1

	o	Symmetric encryption

Section 3.3.2

	o	Public-key encryption

Section 3.3.3

	o	Cryptographic hashes

Section 3.3.4

	o	Some issues in key distribution

Section 3.4

o	Some of the solutions we employ, including:

Section 3.4.1

	o	How anonymity and pseudonymity help

Section 3.4.2

	o	Various techniques against passive attackers

Section 3.4.3

	o	Various techniques against active attackers

Section 3.4.4

	o	Issues involved in protecting the distribution

Section 3.5

o	Selected additional topics which tie up some loose ends

3.2 The problem

This section discusses the types of attacks the architecture is likely to see, as well as 
the problems we are not trying to solve.

3.2.1 The threat model: 
what attacks may we 
expect?

Given the architecture described in the previous chapter, there are a wide variety of 
potential attacks which may be mounted by malicious or curious third parties. They 
generally break down into passive attacks, in which communications are merely mon-
itored, and active attacks, in which communications or the underlying agents them-
selves are subverted, via deletion, modification, or addition of data to the network.

Packet sniffing

Passive attacks. The most obvious attack is simple monitoring of packet data; such 
an attack is often accomplished with a packet sniffer, which simply records all packets 
transmitted between any number of sources. If such data includes users' mail mes-
sages or files, then two agents which are trading this information back and forth will 
leak information to an eavesdropper.

Traffic analysis

Even if the actual communications between agents are perfectly encrypted, however, 
passive attacks can still be quite powerful. The easiest such attack, in the face of 
encrypted communications, is traffic analysis, in which the eavesdropper monitors the 
pattern of packet exchange between agents, even if the actual contents of the packets 
are a mystery. This can be surprisingly effective: It was traffic analysis that alerted a 
pizza delivery service local to the Pentagon -- and thus the media -- when the United 
States was preparing a military action at the beginning of the Gulf War; when late-
night deliveries of pizza suddenly jumped, it became obvious that something was up 
[181]. (Even though [179] points out that press coverage of the pizza effect tends to 
quote unnamed sources, a very small number of individuals with personal stakes -- 
such as a Domino's manager in the area -- and other press reports, the continuing 
press coverage [151] and even military recommendations [174] surrounding such 
effects make it clear that this threat is taken seriously.)

Spoofing and replays

Active attacks. Active attacks involve disrupting the communications paths between 
agents, or attacking the underlying infrastructure. The most common such attack is a 
spoofing attack, in which one agent impersonates another, or some outside attacker 
injects packets into the communication system to simulate such an outcome. Often, 
spoofing is accomplished via a replay attack, in which prior communications between 
two agents are simply repeated by the outsider. Even if the plaintext of the encrypted 
contents of the communication are not known, such attacks can succeed so long as 
duplicate communications are allowed and the attacker can deduce the effect of such a 
repeat. For instance, if it is noticed that a cash-dispensing machine will always dis-
pense money if a particular (encrypted) packet goes by, a simple replay can spoof the 
machine into disgorging additional cash.

Subverted agents

More sophisticated attacks are certainly possible. Individual running agents might be 
subverted by a third party, such that they are no longer trustworthy. Such a subverted 
agent might use encryption keys which are known to the interloper, for example. 
Alternately, the attacker might create his or own own agent, which looks like a genu-
ine agent to the rest of the network, but pretends to have characteristics which match 
everything -- in Yenta, for example, such an agent might then be used to troll for peo-
ple interested in particular topics, and presumably also would be modified to disgorge 
anything interesting to its creator.

Subverted distribution

Finally, the actual distributed agent might be modified by a determined attacker at the 
source itself -- say, by subtly introducing a trojan horse into the application at its dis-
tribution point(s), either by modifying its source code, or by modifying any precom-
piled binaries which are being distributed. This is essentially a more-distributed and 
more-damaging version of the subverted-agent attack above. As an example, consider 
all the Web pages currently extant which proclaim, 'These pages are best viewed with 
Netscape x.y. Download a copy!' Now imagine what would happen if the link pointed 
to a carefully-modified version of Netscape that always supplied the same session 
key, known to the interloper: the result would be that anyone who took the bait would 
be running a version of Netscape with no security whatsoever, hence leaving them-
selves vulnerable to, e.g., a sniffing attack on their credit card number.

3.2.2 How private is 
private?

Consider the degrees of anonymity offered by the chart below: 


Figure 3: Degrees of anonymity

As defined in [141], the extremes of this chart range from absolute privacy, where the 
attacker cannot perceive the presence of communication, to provably exposed, where 
the attacker can prove the sender, receiver, or their relationship to others. The discus-
sion advanced in [141] is oriented more towards the perception of communication at 
all, whereas we are concerned with the contents of that communication as well, but 
the spectrum of possibilities is nonetheless useful. They define the rest of the chart as 
follows:


o	A sender is beyond suspicion if, though the attacker can see evidence of a sent mes-
sage, the sender appears no more likely to be the originator of that message than 
any other potential sender in the system.


o	A sender is probably innocent if, from the attacker's point of view, the sender ap-
pears no more likely to be the originator than to not be the originator. This is weaker 
than beyond suspicion in that the attacker may have reason to expect that the sender 
is more likely to be responsible than any other potential sender, but it still appears 
at least as likely that the sender is not responsible.


o	A sender is possibly innocent if, from the attacker's point of view, there is a non-
trivial probability that the real sender is someone else. While weaker than the 
above, it may prevent attackers from acting on their suspicions.


o	Finally, a sender is exposed if an attacker can see information which unambiguous-
ly identifies the sender. This is the default for almost all communications protocols 
on the Internet -- most such protocols are cleartext, and make no attempt to hide the 
addresses of senders or receivers. This is weaker than being provably exposed, 
however, since it is generally the identity of the computer that is revealed, rather 
than some nonrepudiable user identity.


The architecture discussed here, for the most part, attempts to ensure either possible 
innocence or probable innocence; we shall differentiate where useful. In addition, 
certain parts of the architecture may make it possible for the user to be beyond suspi-
cion to a local eavesdropper -- someone who can monitor some, but not all, communi-
cation links in the system.

3.2.3 Security design 
desiderata

The security architecture presented here is cognizant of several principles which are 
well-known in the security and cryptographic communities. This section discusses 
several of them, and demonstrates how they have motivated various decisions taken in 
the design.

Design must be open -- the 
importance of open source

Security through obscurity of design does not work. This means that any design which 
depends upon secrecy of the design is guaranteed to fail, since secrets have a way of 
getting out. Since this architecture is designed to be run by a large number of individ-
uals all across the Internet, its binaries must be public, hence security through obscu-
rity would be untenable anyway in the face of disassemblers and reverse-engineering. 
(In fact, the source code of Yenta, the sample application, is also public, which should 
increase confidence in the resulting system; see the discussion of Yvette in 
Section 3.4.4.)

Protect the keys

Keys are the important entity to protect. In good cryptographic algorithms, it is the 
keys that are the important data. Since keys are usually a small number of bits -- hun-
dreds or perhaps thousands at most. Because new keys are often trivial to generate, 
protecting keys is much easier than protecting algorithms. Unfortunately, however, 
key management -- keeping track of keys and keeping them from being accidentally 
disclosed -- is often the hardest and weakest point of a cryptosystem [6][15][24][34] 
[69]. Our architecture has a variety of keys and manages them carefully.

Use existing crypto

Good cryptography is hard to design and hard to verify. Most brand-new crypto-
graphic systems turn out to have serious flaws. Only when a system has been carefully 
inspected by a number of people is it reasonable to trust it. This is another reason why 
security through obscurity is a bad idea. We depend on well-established algorithms 
and protocols for our fundamental security, since they have been carefully scrutinized.

Whole-system design

Security is a function of the entire system, not individual pieces. This means that even 
good cryptography and system design is worthless if it can be compromised by brib-
ing or threatening someone. Part of the reason for the decentralized nature of this 
architecture is to avoid having a single point of compromise, as detailed in Chapter 1.

Poor design is dangerous 
too

Malevolence and poor design are sometimes indistinguishable. Many system failures 
that look like the result of malevolence are instead the result of the interaction of an 
accident and some unfortunate element of the design. For example, the entire ARPA-
net failed one Sunday morning in 1973 due to a double-bit error in a single IMP 
[121]. A similarly disastrous outcome from a simple, single error is aptly described by 
this quote: 'The whole thing was an accident. No saboteur could have been so wildly 
optimistic as to think he could destroy an airplane this way,' which described how an 
aircraft was demolished on a friendly airfield during World War II when someone 
ingeniously circumvented safety measures and inadvertently connected a mislabelled 
hydrogen cylinder to the plane's oxygen system [137].

Minimize collected 
information

If you don't want to be subpoenaed for it, don't collect it. As we mentioned in 
Chapter 1, Federal Express, a delivery service in the United States, receives (and 
hence is compelled to respond to) several hundred subpoenas a day for its shipping 
records [178]. The safest way to protect private data collected from others from such 
disclosure -- not to mention the hassle of responding to a stream of subpoenas -- is 
never to collect it in the first place. Both the lending records of most libraries, and the 
logfiles of MIT's primary mailers -- which are guaranteed to be thrown away irretriev-
ably when three days old [153] -- adhere to this rule. This also motivates our decen-
tralized design: any central point is a subpoena target.

We can't be perfect

Security is a spectrum, not an absolute. A computer can often be made perfectly 
secure by unplugging it -- not to mention vaporizing its disks -- and their backups. 
However, this is a high price to pay. Tradeoffs between security and functionality or 
performance are often necessary. It is also true that new attacks are constantly being 
invented; hence, while this research aims at a more-secure implementation than that 
which is possible without attending to these issues at all, we can never claim to be 
completely secure. We therefore aim for security that is good enough, and to do not 
harm -- such that user privacy is protected as well or nearly as well as it would be if 
the application was not running. We cannot hope for better -- doing better would 
imply that our application somehow magically improves the security of other, unre-
lated applications -- and may have to make some tradeoffs that nonetheless lead to a 
little bit of insecurity for a large benefit.

3.2.4 Problems not 
addressed

There are a number of problems which are not addressed in the security architecture 
presented here. The problems we are not addressing influence where we will and will 
not accept design compromises.

No mobile code

For instance, since each agent runs on a user's individual workstation, and each agent 
is not itself a mobile agent per se [29][35][70][170][183], we do not have the problem 
of executing arbitrary chunks of possibly-untrusted code on the user's local worksta-
tion.

No Byzantine failures

Further, it is assumed that, while some agents may have been deliberately compro-
mised, the vast majority of them have not. This mostly frees us from having to worry 
about the problems of Byzantine failure [53][131] in the system design, wherein a 
large portion of the participants are either malfunctioning or actively malicious.


We also assume, as in the Byzantine case, that not every other agent any particular 
agent communicates with is compromised. If this were not true, certain parts of the 
algorithm would be vulnerable to a ubiquitous form of the man-in-the-middle attack, 
wherein an interloper pretends to be A while talking to B, and B while talking to A, 
with neither of them the wiser. (Weaker forms of this, wherein there are only a few 
agents doing this, have reasonable solutions. In general, when dealing with Byzantine 
failures, the amount of work to cope with increasing numbers of hostile peers goes up 
quite rapidly -- exponentially in many cases. This means that dealing with a small 
number of miscreants is feasible, whereas the situation where most peers are untrust-
worthy becomes very difficult.)

Trusted path to binaries

The architecture provides no protection for the user if his or her copy of the applica-
tion has been compromised. It is generally trivial for a sophisticated attacker to com-
promise a binary -- for example, by substituting NFS packets on the wire as the 
application is loaded from the fileserver. We cannot be of any help in this case; a user 
without a trusted path to his or her binaries is already at the mercy of any good 
attacker, regardless of the application being run.

Cracked root

Along the same lines, a user who runs the application on untrusted hardware cannot 
expect that it can never be compromised -- this is analogous to not having a trusted 
path to one's binaries, since an attacker who has compromised the computer on which 
the application is being run can by definition either read or alter data in the running 
binary. Consider the example of Yenta, which runs as a daemon and remains resident 
in memory indefinitely. The user's secret key, which is the basis of his or her identity, 
must similarly remain in memory for long periods of time. If this were not the case, 
then the user would have to constantly type his or her passphrase for every operation 
which required an identity check, of which there are many. But this also means that 
any attacker who has root access to the user's workstation, for example, can read this 
key out of the process address space. Hence, if the user's workstation has poor secu-
rity in general, then Yenta's ability to keep the user's secrets from the attacker will be 
no better.

Poor passphrases

While this architecture tries to minimize the number of places where users can inad-
vertently compromise their own security, some user responsibility is nonetheless 
expected. For example, the agent must store its permanent state somewhere. If this 
data is to be private, it must be protected. Absent hardware solutions, the most reason-
able solution to this protection is to encrypt it with a passphrase -- but nothing can 
help us if the user chooses a poor passphrase, such as one that is too short or is easily 
guessed.

Government agencies or 
rubber-hose cryptanalysis

Similarly, this architecture is no protection against the resources of a government 
agency, or some similarly-equipped adversary. Such an adversary has no reason to 
attempt a subtle compromise of the distribution, the protocols, or the cryptography. It 
may instead physically bug the user's premises, compromise his hardware, or use rub-
ber-hose cryptography -- coercing the user's key(s) via implied or explicit threat of 
physical force. A possible solution to coercion is the use of deniable filesystems [22], 
but this is beyond the scope of the research presented here.

No denial-of-service

In addition, we do not explicitly deal with denial-of-service attacks, which are 
extremely difficult for any distributed system to address. Such attacks amount to, for 
example, dropping every packet between two agents which are trying to communi-
cate -- this attack looks like the network has been partitioned to the agents involved, 
and there is little defense.

No international export

Finally, we have the problem of our use of strong cryptography to protect users' pri-
vacy. The United States government currently regulates such cryptographic software 
as a munition, under EAR, the Export Administration Regulations [50] -- formerly 
ITAR, the International Treaty On Arms Regulations [87]. This means, for example, 
that the cryptographic portions of Yenta's software are currently unavailable outside 
the US unless added back in elsewhere. Solving the limitations of EAR/ITAR is not 
explicitly addressed here -- except to demonstrate how such governmental policies 
work against the sovereign rights of its citizens, as we detail in Chapter 1.

3.3 Cryptographic 
techniques

This section introduces some useful cryptographic techniques that will be used later. 
The techniques we discuss are used as black boxes, without proof that they properly 
implement the functionality described for the box and without the mathematical back-
ground which underlies them; those who wish to check these assertions may examine 
the citations where appropriate. In particular, for a much more complete introduction 
that includes an excellent survey of the field, see [155].

3.3.1 Symmetric 
encryption

One of the most straightforward cryptographic techniques uses symmetric keys. Algo-
rithms such as IDEA ([155] pp. 319-324) work this way. Given a 128-bit key, the 
algorithm takes plaintext and converts it to ciphertext. Given the same key, it also con-
verts ciphertext back into plaintext. Expressed mathematically, we can say that 
C=K(P) [the ciphertext C is computed from the plaintext P via a function of the key 
K], and similarly P=K(C) [the reverse also works].


IDEA is probably very secure. The problem comes in distributing the keys: we cannot 
just transmit the keys before the encrypted message -- after all, the channel is deemed 
insecure or we wouldn't need encryption in the first place -- hence users must first 
meet out-of-band, e.g., not using the insecure channel, to exchange keys. This is 
infeasible for a large variety of applications.

3.3.2 Public-key 
encryption

A better approach uses a public-key cryptosystem [PKC], such as RSA ([155] pp. 466-
473) or the many other variants of this technology. In a public key system, each user 
has two keys: a public key and a private key, which must be generated together -- nei-
ther is useful without the other. As its name implies, each user's public key really is 
public -- it can be published in the newspaper. The private key, on the other hand, is 
never shared, not even with someone the user wishes to communicate with.

Confidentiality

User A encrypts a message to B by computing C=KPB(P), e.g., a function involving 
B's public key. To decrypt, B computes P=KSB(C), e.g., B's private key. Note that, 
once encrypted, A cannot decrypt the resulting message, using any key A has access 
to -- the encryption acts one-way if A does not have B's private key -- and she 
shouldn't! [One important detail: since PKC's are usually slow, one usually creates a 
brand-new session key, transmits that using PKC, then uses the session key with a 
symmetric cipher such as IDEA or triple-DES to transmit the actual message. In addi-
tion, PKC's may sometimes leak bits if used to encrypt large amounts of data; 
encrypting only keys can avoid this problem.]

Authenticity

This scheme provides not only confidentiality -- third parties cannot read the mes-
sages -- but also authenticity -- B can prove that A sent the message. How does this 
work? Before A sends a message, she first signs the message by encrypting it (really a 
cryptographic hash of the message -- see below) with her own private key. In other 
words, A computes Psigned= KSA(P). Then, A encrypts the message to B, computing 
C=KPB(Psigned). B, upon receiving the message, computes Psigned=KSB(C), which 
recovers the plaintext, and can then verify A's signature by computing P=KPA(Psigned). 
B can do this, because he is using A's public key to make the computation; on the 
other hand, for this to have worked at all, A must have sent it, because only her private 
key could have signed the message such that her public key worked to check it. Only 
if someone had cracked or stolen A's private key could the signature have been fraud-
ulently created.

3.3.3 Cryptographic 
hashes

It is often the case that one merely wishes to know whether some message has been 
tampered with. One obvious solution is to transmit the message out of band -- via 
some channel which is not the same as the channel originally used to transmit the 
message. But this begs the question of how that channel is secured, and can be very 
inconvenient to implement in any case. 


An easy way to avoid out-of-band transmission is via a cryptographic hash, such as 
MD5 ([155], pp. 436-441) or the Secure Hash Algorithm (SHA, [155], pp. 442-445). 
These hash functions compute a short (128-bit or 160-bit, respectively) message 
digest of an unlimited-length original message. These functions have the unusual 
property that changing any single bit of the original message changes, on average, 
half of the bits of the digest. Further, they function in a one-way fashion -- it is infeasi-
ble, given a digest, to compute a message which, when hashed, would yield the given 
digest. 


On the other hand, anyone can compute the hash of a message, since the algorithm is 
public and uses no keys. This means that it is computationally easy to verify that a 
particular message does, in fact, hash to a particular value, even though it is infeasible 
to find a message which produces some particular hash.

Digital signatures

Since such hashes are compact yet give an unambiguous indication of whether the 
original message has been altered, they are often used to implement digital signatures 
such as in the RSA scheme above -- what is signed is not the actual cleartext message, 
but a hash of it. This also improves the speed of signing (since signing a 128- or 160-
bit hash is much faster than signing a long message), and the actual security of the 
cipher as well (because RSA is vulnerable to a chosen-plaintext attack; see [155], p. 
471).

3.3.4 Key distribution

One of the hardest problems of most cryptosystems, even public-key systems, is cor-
rectly distributing and managing keys. In a public-key system, the obvious attacks -- 
compromise of the actual private key -- are often relatively easy to guard against: keep 
the private key in memory as little as possible, encrypt it on disk using DES with a 
passphrase typed in by the user to unlock it [187], and keep it offline on a floppy if 
possible.


But consider this: Alice wishes to send a message to Bob. She looks up Bob's public 
key, but interloper Mallot intercedes and supplies his own public key. Alice has no 
way of knowing that Mallot has done so, but the result of her encryption is a message 
that only Mallot, and not Bob, can read! Even if one demands that Alice and Bob have 
a round-trip conversation to prove that they can communicate, Mallot could be play-
ing man-in-the-middle, simultaneously decrypting and re-encrypting in both direc-
tions as appropriate.

Webs of trust

To solve this problem, systems such as Privacy Enhanced Mail [92] use a centralized, 
tree-structured key registry, which is inconsistent with our decentralized, no-hierarchy 
architecture. On the other hand, PGP [187] functions with completely decentralized 
keys, by having users sign each other's keys -- this is the same mechanism used in the 
attestation system described in Section 2.11. When Alice gets 'Bob's' public key, she 
checks its signatures to see if someone she trusts has signed that key, or some short 
chain of trustable people, etc. If so, then this key must be genuine (or there is a con-
spiracy afoot amongst their mutual friends); if not, then the key may be a forgery. This 
practice of signing the keys of those you vouch for is called the PGP web of trust and 
is the primary safeguard against forged keys. Yenta, for example, uses this technique 
in signing attestations as part of its reputation system.

3.4 Structure of the 
solutions

This section presents solutions to some likely security problems in our architecture, 
using some of the technology mentioned previously. It presents a range of solutions; 
not every user in every application might want the overhead of the most complete pro-
tection, and the elements, while often solving separate problems, sometimes also act 
synergistically to improve the situation. Finally, for brevity, it omits some details 
present in the complete design.

3.4.1 The nature of 
identity

Uniqueness and confidentiality. It should not be possible to easily spoof the identity 
of an agent. For this reason, every agent sports a unique cryptographic identity -- a 
digital pseudonym. This identity corresponds, essentially, to the key fingerprint [187] 
of the individual agent's public key -- a short (128 bits) cryptographic hash of the 
entire key. In Yenta, this identity is referred to as the user's Yenta-ID or YID, and is 
effectively a random number -- knowing it does not tell anyone anything about whose 
real-life identity it is. In order to keep some interloper from stealing, say, agent A's 
pseudonym, any agent communicating with A encrypts messages using A's public 
key. A can prove that its pseudonym is genuine by being able to decrypt; further, such 
communications are interlocked [155] and have an internal sequence number -- itself 
encrypted -- both of which help prevent replay attacks by a man in the middle. Fur-
ther, of course, such encryption prevents an eavesdropper from intercepting the actual 
conversation. Thus, even though the actual identity of the user is not known, the user's 
pseudonym cannot be appropriated.

Pseudonymity and 
anonymity are corner-
stones of the design

The fact that users are by default pseudonymous, and often completely anonymous, is 
a critical aspect of the security of the architecture. Consider, for example, what would 
happen if characteristics that were offered during clustering (Section 2.8.2) automati-
cally identified the user identity that went along with them -- the user would be 
exposed, in the terminology of Section 3.2.2. Instead, given the plausible deniability 
feature described in Section 2.8.2, it is at least possible that any given characteristic 
does not correspond to the agent offering it, meaning that the user is probably, and at 
least possibly, innocent. In addition, if third-party subcomparisons and random-refor-
warding via cluster broadcasts, also as described in Section 2.8.2, are in use, the user 
may well be beyond suspicion.


The completely decentralized nature of our architecture complicates key distribution. 
The model adopted is the decentralized model used by PGP [187]. By not relying on a 
central registry, we eliminate that particular class of failures. And interestingly, the 
architecture partially eliminates the disadvantage of PGP's decentralized key distribu-
tion -- that of guaranteeing that any particular public key really does correspond to the 
individual for which it is claimed. In PGP, we care strongly about actual individuals, 
but in our architecture, and in the sample application, only the cryptographic ID's are 
important -- for example, Yenta tries to hide the true identity of its users unless they 
arrange to be known to each other.


Spamming and spoofing. Unfortunately, this pseudonymity comes at a price: For 
example, if we are about to be introduced to some user, how can we have any idea 
who we might be about to be introduced to? Can we know that the last 10 agents 
we've seen do not all surreptitiously belong the same individual? Can a given user of 
Yenta, for instance, know that this person won't spam us with junk mail once he dis-
covers our interest in a particular topic? And so forth.


We solve this problem with the attestation system, described in Section 2.11. This 
system provides a set of reputations, which are useful in verifying, if not the identity 
of a given user, at least whether he or she can make statements about himself or her-
self that other users will vouch for.

3.4.2 Eavesdropping

The generally-encrypted nature of inter-agent communication makes most eavesdrop-
ping, including some but not all man-in-the-middle attacks, quite difficult. However, 
traffic analysis is still a possibility -- for example, if an interloper knows what one 
Yenta is interested in, watching who it clusters with could be useful.


Fortunately, we have a solution to this, in the broadcasting paradigm mentioned in 
Section 2.10. In addition, we can use the techniques in Section 3.4.3 to provide addi-
tional security.

3.4.3 Malicious agents

If some malicious person was running a subverted version of an agent, what could he 
discover? The most important information consists of the identities of other agents in 
the cluster cache -- especially if those identities can be those of real users, e.g., their 
real names, and not digital pseudonyms -- and the contents in the rumor cache -- espe-
cially if, again, such text can be correlated to real people. There are therefore two gen-
eral strategies to combat this: hiding real identifying information as well as possible, 
and minimizing the amount of text stored in the rumor cache. We shall mention two of 
the simplest approaches below; other approaches to both problems, involving Mix-
master-style random-reforwarding, secret-sharing protocols, or diffusing pieces of 
characteristics out to large numbers of third parties for comparison, are possible but 
are more complicated than necessary for this discussion.

Hiding identities

Since users are pseudonymous by default, hiding their identities in large part centers 
around avoiding traffic analysis. Using the broadcasting strategies presented above 
suffices. For a more complete description, please see Section 2.10.

Mixing in other agents' 
data

A simple technique for protecting users' characteristics against possibly malicious 
agents is to mix in other agents' data when engaging in the comparison and referral 
process. For a more complete description of this process, please see Section 2.8.3.

A range of privacy is 
available

Depending on which of the strategies above are chosen, and the nature of the charac-
teristics handled by the application, it may be possible to arrange several degrees of 
user privacy. Using the terminology of Section 3.2.2, these could plausibly range from 
possible innocence to beyond suspicion.

3.4.4 Protecting the 
distribution

There is a final piece of the puzzle -- how do users of an agent know that their copy is 
trustworthy? The easiest approach, of course, is to cryptographically sign the binaries, 
such that any given binary may be checked for tampering with the authoritative distri-
bution point. But what if the program itself, at the distribution point, had a trojan 
horse inserted into its source, either by the implementors themselves, or by a mali-
cious third party who penetrates the development machine? Even though the source is 
freely distributed, and may be recompiled by end-users and checked against the 
binary, what individual user would want to read the entire source to check for mali-
cious inclusions? This is, of course, a problem for any software, and not just agents in 
the architecture we present here -- but applications such as Yenta are particularly diffi-
cult for a user to verify solely from their behavior. After all, they read sensitive files 
and engage in a lot of network traffic -- and even worse, the traffic is encrypted, so one 
cannot even check up on it with a packet sniffer.


In general, those who distribute software have chosen one of three models:


o	Trust us. Often used by those who do not provide source code at all.


o	Go ahead -- read every line of the source code yourself. This is an infeasible task 
for almost any reasonable application, and a huge burden.


o	Hope you hear something one way or the other on the net or in the press. This, too, 
is both infeasible, error-prone, and subject to a variety of false positives and false 
negatives.


The Yenta code vetter

There is another way. To demonstrate this, we have developed Yvette, a Web-based 
tool which allows multiple people to collaboratively evaluate an agent's source 
code -- in this case, Yenta's. A summary of Yvette's capabilities is presented below, 
and examples of its use are presented in Figure 4 and Figure 5.


Evaluators store cryptographically-signed -- hence traceable and non-spoofable -- 
comments on particular pieces of the source where others can view them. The signa-
ture covers both the comment and the exact text of the code being commented upon. 
Each individual need only check a small piece of the whole, yet anyone can examine 
the collected comments and decide whether their contents and coverage add up to an 
evaluation one can trust.


Yvette presents an interface, via the Web, which allows anyone to ask questions such 
as:


o	Who has commented on this particular piece of code? Are the comments mostly fa-
vorable, or not? What is the exact text of the comment(s)?


o	What regions have the most or least number of comments associated with them?


Yvette users may also take actions such as:


o	Download, for inspection and comment, a piece of the source, which can be a re-
gion of lines in a file, a subroutine, a set of subroutines, a set of files, or an entire 
directory tree.


o	Upload cryptographically-signed comments about some piece of downloaded 
source code.


Note that, since it distributes code that may include cryptographic routines whose 
export from the US and Canada is illegal [50][87], Yvette must also be aware of 
which sections of code are sensitive and must use address-based heuristics and ques-
tions of the user -- only for those parts of Yenta which are cryptographic -- to ensure 
that EAR/ITAR's export restrictions [50][87] are not violated. The heuristics used are 
the same as those used to control the export of PGP [187], which, while easy to cir-
cumvent, are informally viewed as sufficient by at least some of the relevant players 
in the US government [104].


Using Yvette, therefore, users who wish to help verify a distribution can bite off a 
small piece of the problem, asking the Yvette server for which pieces of source code 
have not yet been extensively vetted, perusing other people's comments, and so forth. 
Users with no programming experience, but who nonetheless wish to check the distri-
bution, may look at everyone else's comments to assure themselves of the integrity of 
the product.


Yvette thus attempts to encourage a whole-system approach to security, in which not 
only are the agents themselves secure, but their users -- who are also part of the sys-
tem -- may easily trust the agents' security and integrity. It is hoped that mechanisms 
such as Yvette will become more popular in software distribution in general, and that 
it encourages thinking about more than just protocols and cryptography -- if we expect 
widespread adoption of sophisticated agents, the sociology of how users can use and 
trust them matters, too.

3.5 Selected 
additional topics

There are a few loose ends in the architecture we present here that have not been ade-
quately addressed by the discussion so far. This section attempts to tie them up.

Central servers

Let us consider first the central servers that exist in the design, namely the bootserver 
(described in Section 2.7) and the statserver (described in Section 2.13). Both of 
these servers are safe, in the sense of the unlinkability of users' personal data and 
their actual identities, but in slightly different ways.


The bootserver knows IP addresses. Because of this, it could potentially lead an 
attacker directly back to an individual. However, the bootserver knows nothing else -- 
in particular, it knows nothing about any user's characteristics, save that the given 
user runs the application at all.


The statserver, on the other hand, potentially knows quite a bit about all users -- in 
Yenta, for example, it knows information such as how many clusters the user is in, 
how they tend to use the user interface, what machine architecture Yenta is being run 
on, and so forth. (Note that it still does not know the detailed contents of individual 
characteristics, because such information could compromise a user's privacy if 
revealed, and it is unlikely to be so useful for analysis of Yenta's behavior that the risk 
is worthwhile.) However, the statserver does not know user identities or IP addresses 
at all. Once the data has been stored on disk, the agent's identity and IP address are 
gone. The only data that the statserver has preserved is a unique random number 
which can be used to differentiate one agent from another, but nothing else.

Encrypted connections

As for the safety of the data getting to the statserver in the first place, or between any 
given pair of agents, note that we have specified that all communications are routinely 
encrypted. The only exception is in data which contains no personal user data, namely 
bootstrap requests and replies, either via broadcast or to and from the bootserver. For 
details of how Yenta performs such encryption, see Section 4.8.1.

Persistent state

Getting the data between agents is only part of the story, however; we must also con-
sider the storage of the agent's persistent state across shutdowns. In most applications, 
this is likely to stored be in a filesystem on a disk. If the agent handles personal infor-
mation, this storage point is a tempting target for an attacker. Furthermore, it is likely 
that the application may store users' private keys -- perhaps the basis of their iden-
tity -- in this file as well, meaning that an attacker who can read the file can not only 
violate the user's privacy, but impersonate him or her to other users as well, with 
potentially serious implications.


It is clear, therefore, that such data should be protected. Exactly how this is to be 
accomplished is application- and implementation-specific; how Yenta does so is 
described in Section 4.8.2. Note in particular that this is a rich source of possible 
security problems, for several reasons:


o	A network connection is necessarily a moving target -- if an eavesdropper fails to 
intercept the relevant packets, the attack fails. On the other hand, data stored on 
disk is vulnerable to compromise from the moment it is created until long after-
wards -- perhaps even after it is thought deleted, and, to a sophisticated adversary, 
even after the disk has been formatted [73]. Keeping backups around forever, and 
failing to adequately encrypt their contents or control physical access to them, only 
makes this worse.


o	If the data is stored encrypted, we have the often-difficult problem of how to se-
curely ask the user for the decryption key. Many environments provide no known-
secure method of eliciting such data; in particular, UNIX users who use the X Win-
dow System [152] or Telnet [136] are particularly vulnerable to simple packet sniff-
ing, and this is an extremely popular attack. While it is possible for knowledgeable 
users to use SSH [158] or Kerberos' ktelnet [127], there is often no way for the ap-
plication to ensure this -- and the consequences of a single instance of carelessness 
could lead to the user's privacy being unknowingly compromised forever after-
wards.


o	Encrypting the data with the same encryption key every time it is written to disk 
exposes it to a number of attacks if the data varies [155].


Random numbers

Finally, note that we have at many points mentioned the term random number -- 
whether explicitly, in Section 2.13's discussion of the ID's generated for statserver, or 
implicitly, whenever we talk about generating any sort of key -- session keys, public/
private key pairs, and so forth. We assume here that such random numbers are really 
pseudorandom, e.g., derived from deterministic software.


Where are these random numbers coming from? Certainly not from typical applica-
tion libraries; most random number sources provided with most operating systems are 
extremely poor when employed for cryptographic applications. Further, several high-
profile examples of poor decisions in sources of random numbers have come to light, 
such as an early Netscape attempt at SSL [63] which derived its 'random' numbers 
from easily-predictable values provided by the host operating system -- this meant that 
a browser's supposedly-secure, 128-bit-key connection to a server could be broken in 
around 25 seconds [67].


Thus, the implementor must take great care in selection and use of random numbers 
in the application. This is common sense in cryptographic circles, but it bears repeat-
ing here. Exactly where to find a good source of randomness is always implementa-
tion-specific; some operating systems make available random numbers which are 
derived from turbulent processes (such as disk head performance), but many do not.


For an illustrative example of how Yenta acquires, manages, and uses random num-
bers, see Section 4.8.3.

3.6 Summary

In this chapter, we have described the threat model -- what sorts of attacks we consider 
within the scope of this research. We then presented some background on modern 
cryptography and how it can help address many of the threats presented; we also dis-
cussed how decentralization of the architecture contributes greatly to the protection 
we afford. Finally, we presented a new method which makes collaboratively evaluat-
ing the source code of a critical application easier, and tied up some loose ends.