Note: This page is historical.

Current pages about Yenta are here. Please look at those pages first.

Yenta is still under active development, but this particular page is not. If you're interested in current research papers about Yenta, or obtaining a copy of Yenta, please start here instead.

This page is one of many that were written in late 1994 and early 1995, and are being preserved here for historical purposes. If you're viewing this page, you probably found it via an old link or are interested in the history of how Yenta came to be. These pages have not been actively maintained since 1995, so you'll find all sorts of older descriptions which may not match the current system, citations to old papers and old results, and so forth.

Dynamic clustering

Most of Yenta relies on being able to dynamically form clusters or communities of agents to bring people together. In general, such clusters are formed by computing the "nearness" of one chunk of text (for example, from an email message) to some other chunk. If several of the chunks from one user are similar enough to those of another user, we declare that we have a match, and use that match for whichever application we are trying to implement; in general, the existence of such a match is remembered by both agents, and used later to hook newcomers into the growing coalition.

There is a large literature concerned with dynamically forming clusters from data. The details of which clustering mechanism Yenta will use are not precisely established, and the architecture should support multiple clustering mechanisms, so as to explore this space.

Likely techniques for accomplishing the clustering include:

Statistical vector-space approaches, such as used by SMART, or
Semantic net approaches, such as used by WordNet.

Regardless of the clustering technique employed (and they each have their own advantages and disadvantages), the essential idea is that they form a multidimensional space. The space must support a partial ordering in any given dimension, to allow a "hill-climbing" or "gradient descent" approach to navigation.

This partial ordering in any given dimension is very important. Why? Because agents are distributed across the net as a whole. There must be no central registry to point the way to any particular agent's position in this space, since such a solution would not scale. Without such a registry, agents must rely on each other's referrals to point at "more appropriate" agents for any particular topic. In order for such referrals to be more efficient than random choice (which would be prohibitively inefficient), agents must be able to establish at least a local gradient in which to move. [This ignores the classicial problems of local versus global maxima; adding stochastic behavior to the gradient may help here, but it is not yet clear whether this problem will be trivialor enormous.]

Consider, for a moment, an individual agent. It "talks" only to neighboring agent. It can, however, pick up and move to a different neighborhood along some dimension, if it finds out from a neighbor that some nearby ad is closer to its meaning. In this way, agents migrate around the semantic space, always talking only to local neighbors, gradually forming clusters. Each agent can talk to multiple different local neighbors at once, where each such neighbor is a neighbor along a different dimension in concept space.

When we say that an agent "moves", we refer in this case to the making and breaking of (probably multiple) network connections to peer agents. The set of active network connecctions, in a very real sense, is the cluster.

There is no particular reason why we could not move the actual code of the agent as well, a la the active processes of Telescript, although such roving processes would then require more attention to the security of their connection back to the real users (probably easy enough to do with proper cryptographic protocols). Indeed, if we are legally allowed to embed Telescript applications into agents that freely redistributable via anonymous FTP, this might be an implementation option.

In Yenta, however, moving the agents themselves is less than ideal. For one thing, the agents need access to the local user's mail in order to decide what the user's interests are; moving the agent just gets it farther away from this source of data, slowing things down and making them less secure (since the data will have to be moved as well, presumably requiring encryption to preserve privacy).

Lenny Foner

Last modified: Mon Sep 25 18:05:50 1995