General description
GnutellaNet works by "viral propagation". I send a message to you,
and you send it to all clients connected to you. That way, I only
need to know about you to know about the entire rest of the
network.
A simple glance at this message delivery mechanism will tell you
that it generates inordinate amounts of traffic. Take for example
the defaults for Gnutella 0.54. It defaults to maintaining 25
active connections with a TTL (TTL means Time To Live, or the
number of times a message can be passed on before it "dies"). In
the worst of worlds, this means 25^7, or 6103515625 (6 billion)
messages resulting from just one message!
Well, okay. In truth it isn't that bad. In reality, there are
less than two thousand Gnutella clients on the GnutellaNet at any
one time. That means that long before the TTL expires on our
hypothetical message, every client on the GnutellaNet will have
seen our message.
Obviously, once a client sees a message, it's unnecessary for it
to process the message again. The original Gnutella designers, in
recognition of this, engineered each message to contain a GUID
(Globally Unique Identifier) which allows Gnutella clients to
uniquely identify each message on the network.
So how do Gnutella clients take advantage of the GUID? Each
Gnutella client maintains a short memory of the GUIDs it has seen.
For example, I will remember each message I have received. I
forward each message I receive as appropriate, unless I have
already seen the message. If I have seen the message, that means I
have already forwarded it, so everyone I forwarded it to has
already seen it, and so on. So I just forget about the duplicate
and save everyone the trouble.
Topology
The GnutellaNet has no hierarchy. Every server is equal. Every
server is also a client. So everyone contributes. Well, as in all
egalitarian systems, some servers are more equal than others.
Servers running on fast connections can support more traffic. They
become a hub for others, and therefore get their requests answered
much more quickly. Servers on slow connections are relegated to the
backwaters of the GnutellaNet, and get search results much more
slowly. And if they pretend to be fast, they get flooded to death.
But there's more to it than that.
Each Gnutella server only knows about the servers that it is
directly connected to. All other servers are invisible, unless they
announce themselves by answering to a PING or by replying to a
QUERY. This provides amazing anonymity.
Unfortunately, the combination of having no hierarchy and the
lack of a definitive source for a server list means that the
network is not easily described. It is not a tree (since there is
no hierarchy) and it is cyclic. Being cyclic means there is a lot
of needless network traffic. Clients today do not do much to reduce
the traffic, but for the GnutellaNet to scale, developers will need
to start thinking about that.
|