Gnutella Forums - View Single Post

guido · #1 (**permalink**) November 28th, 2001

At the present every Gnutella client has got a huge list of hosts of which the majority will refuse connection attempts. This not very practical.
A better thing would be to have a small host list with about 20 hosts which are very likely to accept your connection request.

Here are my ideas about how to achieve this:

Every node will keep the following information about every node in its host list:
-ip
-port
-latitude
-longitude
-amount of files reachable through this host at a TTL of 5
-number of incoming connections from Superpeers this node would have accepted at the time it provided this infomation
-the same for connections from clientpeers
-the average bandwidth caused by broadcast messages a connecting Superpeer could expect when connecting to this node
-the time when this information was last updated by this node in seconds from 01.01.1970, 00:00h, GMT
-this nodes uptime in seconds

-a host evaluation number (HEN, explained below)

The difference between the HEN and the other information fields is that while all the other information fields are provided directly by the node this whole information belongs to, the HEN is never passed over the network and is calculated every time a node receives this information, based on the receiving nodes individual needs. It is some kind of an indicator for the usefulness of a node.

A note about latitude/longitude:
These can be very rough figures. They should only be there to avoid too many Gnutella connections over the (very expensive) transcontinental WAN-lines. If some user lives in a country where the freedom of information (and thus the usage of a Gnutella node) is restricted, he may decide to fake these values. They aren't that important.

Now, every node keeps up 2 host lists, one high quality host list with about 20 entries and one 'raw' host list with about 500 entries (and maybe also one 'classical' list, as they are common now).
The entries of both lists are ordered by their HEN.

When a node receives information about another node, it passes this information to Algorithm A and maybe eventually to Algorithm B.

Algorithm A:
*Check whether this nodes IP and port number already appear in the high quality host list
**If Yes, check whether any of the information fields have changed since
***If Yes, delete the old entry
***If No, stop here
*Check whether the time indicated by the 'last updated' value was more than an hour ago
**If Yes, stop here
*Check whether the number of accepted incoming connections of the sort that the node which is now running this algorithm would like to request is 0
**If Yes, pass the received information to Algorithm B and stop here
*Calculate the HEN of this node
*Check, whether the high quality host list is already full
**If No, add the received information as a new entry to the high quality host list, sort the high quality host list and stop here
*Check, whether the HEN, which was just calculated is higher than the lowest HEN in the high quality host list
**If Yes, delete the entry with the lowest HEN from the high quality host list, add the new data as a new entry and sort the high quality host list
**If No, pass the new data to Algorithm B
*Stop here

Algorithm B:
*Check, whether the IP and port number already appear in the raw host list
**If Yes, check whether any of the information fields have changed since
***If Yes, delete the old entry
***If No, stop here
*Check wether this nodes HEN has already been calculated by Algorithm A
**If No, do that now
*Check whether the raw host list is already full
**If No, add the received information as a new entry to the raw host list, sort the raw host list and stop here
*Check whether the new HEN is higher than the lowest HEN in the raw host list
**If Yes, delete the entry with the lowest HEN from the raw host list, add the new data as a new entry and sort the raw host list
*Stop here

Then, there is still the question how the nodes will exchange these information.
I lately came to read a proposal on the gdf-mailinglist about how to add the possibility to search the Gnutella network not only for filenames but also for file hashes. Their trick was to append additional information which would be ignored by hosts that don't know what it means to search requests or replies.
The same trick could be used for this. A Supernode which has yet some incoming connection-slots to offer might append its node descriptor field to 2 or 3 three search requests/replies every 10 minutes or so

Guido