Gnutella Forums - View Single Post

afisk · #2 (**permalink**) April 10th, 2002

This is an excellent idea, and it is almost precisely how we've internally talked about implementing something like this in the past. We definitely plan on implementing it at some point.

For your reading pleasures, here are the points on this issue that one of the limewire.org developers brought up:

IP address blocks clustering may be a good idea, but only on private networks.
In the real Internet world, this will certainly not make things easier or faster, because the global IP space is more and more sparsely distributed, with smaller blocks as the IPv4 blocks are progressively scaled down to better fit the real usage of this highly wanted resource: each ISP must now justify their use of the global IP space, and this has been strenghtened by raising IP block leasing costs for large blocks. So many accesses use small blocks, sometimes as small as 16 addresses, which are allocated dynamically throughout the territories where an ISP has access points. Such technics use complex router management, and two IP addresses having the same first 24 bits may be very far from each other (in terms of IP ping time or hops count).
Also, as more and more users are connected from NAT routers and systems, the local IP address of the LimeWire host may have nothing in common with a supposedly far remote host, even if it uses the same ISP access point.
Managing the real IP hop count between hosts may be much better, however it supposes that an implementation can access to raw IP sockets to perform ICMP requests. More, most firewalls and ISP routers now implement strict policy on ICMP requests routing (because of possible "smurf" DoS attacks, as it has been demonstrated and raisonnable argued by the CERT.org advisories).
This makes ping requests fail when targetting other networks: a user does not have have to test remote networks, and an ICMP test ping message only makes sense between two adjascent networks. "traceroute" ICMP messages are thus considered unfair from basic users, and they fail within the core of the ISP network (which uses private IP addresses surch as 192.168.x.x on its internal links).
So traceroute technics will more and more often fail to complete in a reasonable time (traceroute will only succeed between two non NAT-routed agents, provided there's no firewall between them).
The only viable solution is to track the effective end-to-end ping time between hosts: the minimum, maximum, and last median time could be used to get an average value of it, and sort connections by network proximity.

Another improvementneeds to be done in the protocol: authenticate the published IP address, without requiring the user to configure it: a LimeWire sends a request to a remote host, using the IP address he knows about it, and explicitly sends that IP address to the remote host. The receiving agent accepts the connection and notes the IP address of the accepted host. It also notes the IP address indicated by the client in its message which is supposed to its own one. If there's a difference, it can assume that there's a NAT system between them. Then it publishes its own IP address within the answer message to the connecting host. The connecting host can also compare this received address with the IP which it used to connect to that host. Any difference is then immediately noted. In either case, the IP address of the remote host is not accurate, and should not be advertized when relaying Gnutella ping requests, but replaced by 0.0.0.0 (unknown IP) within the Gnutella message. This type of indication means that my local LimeWire client has found a remote host to connect to, which we know the GUID but not the IP address, so for further requests, you can send messages to me to proxy the messages to that host, unless you've got another confirmed IP to connect to that host directly.

This suggests changes in the way host GUIDs are managed in the routing tables: either the GUID is connected locally, then we can communicate with it directly using the existing connection. Or the GUID is not directly connected and we have a confirmed IP address for it so that we can connect to it directly. Or we have an IP which has not been confirmed: we can either try the connection, and if it fails use the proxying through an existing connection, and the previous IP is no more working and should be replaced by 0.0.0.0 (we will use the Gnutella proxying). The last option is that we don't have an IP for a GUID, but we still have the GUID of the proxying agent which advertized it: we need to recurse this algorithm to find a connection to the proxying agent.

In any case, publishing private addresses such as 192.168.0.2 on the Internet part of the Gnutella network should be considered as an error in the protocol implementation. This is only fair if the Agent known by its GUID can effectively be contacted at the published and tested address.

In fact this goes further than just the IP address: the port number must also be confirmed (as many NAT routers and firewalls also perform port number translation). So if the port number cannot be confirmed, the host cannot be contacted directly even if we have a confirmed IP address, and must be proxied through an existing connection. In the above discussion, you can implement this by just replacing the expression "IP" by "IP+port" which should be managed as an undissociable address.

When IPv6 will start running on Internet, many IPv4 addresses previously offered by ISP will become fully dynamic and will be using private IPv4 addresses (used only on a ISP subnetwork) that will be routed to Internet with IPv6 using NAT technics (simply because expensive public IPv4 addresse blocks won't be necessary for home-users, as the same fixed IP address service will be available through IPv6 addresses). The major change for web sites will be that their visitors will use changing IPv4 addresses, among several connections, but they could have a fixed IPv6 address to avoid setting permanent cookies into the visitor's browser. And this is already the case now, with just IPv4. In the Gnutella network, the GUID serves as the identification cookie to find the appropriate host, but IPv6 addresses could be an interesting alternative to find direct access to hosts, bypassing the limit of the current protocol with NAT-routed clients (most IPv4 NAT router or firewall implementations will try to keep the 104-bit network part of the 128-bit IPv6 address and will try to keep the private 24-bit host part of the IPv4 firewalled client in the public IPv6 address, so that the translation will be mostly direct and obvious without requiring the current complex management of IPv4 addresses leases on an IPv4-only ISP network).