Gnutella Forums - regional disparity in search results

Gnutella Forums (https://www.gnutellaforums.com/)

- General Gnutella Development Discussion (https://www.gnutellaforums.com/general-gnutella-development-discussion/)

- - regional disparity in search results (https://www.gnutellaforums.com/general-gnutella-development-discussion/4679-regional-disparity-search-results.html)

Unregistered

October 20th, 2001 08:20 AM

regional disparity in search results

hey.

first i want to give an example illustrating what I mean by the headline-- :

Me and a friend are both gnutella users- I live in Germany,
she's in California.

I once wanted her to listen to some German Rap, and advised her to search for a specific artist on Gnutella.
We were both connected to the network at that time, and I
could find many files matching the keyword.

My friend was unable to get any search results.

Naturally, the content on the users' machines differs from country to country.

I assume that most machines she had in her horizon at that time were set up in the anglo-american sphere and that most of the hosts she was directly connected to where in the US.

This doesn't sound logical because the hostcaches should connect people from everywhere, disregarding their geographical position.

Yet connections hosted by an ISP spatially close to your own location generally have a better quality, i.e. the US ISPs are better interconnected than they are connected to the European / Asian ISPs.

therefore, I guess that the hosts one stays connected to (the ones not timing out or disconnecting because of lacking bandwidth etc.) are either regionally close to yourself or within the same "virtual" region of well-interconnected networks.

The search queries have a limited TTL and number of hops, so the farther "away" (in the terms described above) a host is, the harder it becomes to reach that host and its content.

For example, I hardly ever get connected to Japanese hosts.
I do know that the content I'm looking for is more likely to be situated on japanese machines.
The example of my US pal not finding German rap in her "regional" gnutella subnetwork describes exactly the same situation.

did I make a mistake in that thinking model?

Is there a way to tweak clients so that they integrate themselves in a subnetwork of the user's choice?

The solution would be "regional" hostcaches that automatically connect users to corresponding clients and provide a neat horizon showing the other hosts in that particular sphere.

Are there such regional hostcaches??

I saw some txt lists once, but that seems a lil bit awkward considering the dynamics of the gnutella network.

k please tell me what you think about this and whether I made any mistakes (or came up with something you've discussed already.....if so, sorry)

noci

Moak	October 21st, 2001 05:40 PM

> Are there such regional hostcaches??

I don't think so.

You needs a big portion of luck to find the same gnutella horizon. Better try to connect direct to each others gnutella client (problematic if behind firewalls) - OR - try both to connect to at least one host parallel (exchange the IPs your client is connected to via IRC/ICQ/Email, after connected to the gnutella network).
If you wanna try a direct connection between servants: At least one peer has to allow 'incoming connections' (make sure your firewall doesn't block), then connect to the other peer's IP and port. The port is usually 6346 (check in your servants settings), the "real" IP of your internet connection may be harder to find out. Ask a friend who is familiar with your operating system... or you can try to meet your friend on IRC: Perform a /whois <nickname> and you'll know the IP/hostname of your opponent. Once you started your gnutella client, connect to your friends IP/host... your gnutella client of choice should provide a "add host" option to do this.

Good luck! Feel free to consult a gnutella documentation before asking more questions. :)

Oh, and I don't wanna forget to mention, that exchanging large files between friends is also nice via FTP. Why? Most FTP software is very efficient in exchanging and resuming large files. Using FTP is similar to the 'direct connection' described above (little bit more "geeky"). One side does use a FTP client to connect to a FTP server on the other side. You find everything at www.tucows.com

Good night, Moak :)

noci	October 22nd, 2001 08:12 AM

yeah if I personally know the other person using gnutella, I can manually connect.

but what happens if I don't know the IP number of any member of the "regional" subnetwork i want to connect to?

some sort of hostcache only providing IP numbers of a designated "origin" would come in handy.

i.e. only being provided a list of hostnames ending with *.jp would be a great help to really connect to machines actually situated in Japan and (hopefully) hosting the desired content.
same filtering could be done with almost any gTLD.

hey perhaps that could be a feature of the clients themselves- there should be a switch that forces the programme only to connect to clients from a certain gTLD.

n

Moak	October 22nd, 2001 09:05 AM

I like the idea of gnutella being a global network. :)

Saying this... it might be very good idea to specialize gnutella horizons a little bit. You know for sure, a gnutella user is never connected to the whole gnutella network, it's too big. Instead you can only see a crowd of other users, the so called "horizon" within the network. Having spezialized horizons you could find more "rare files" or meet people with same interest (regional interests, same music taste, cook recipes etc). Together with a chat feature maybe?

My suggestion to achieve/implement this extension within a upcoming gnutella client: Each user can edit some keywords to describe his interests, e.g. "synthpop, italian food, OpenGL" (case doesn't matter). On connection gnutella clients exchange keywords with each other. Now every client tries to establish at least 1-2 gnutella connections to clients with same interest, this is constantly done. As a result gnutella horizons are dynamically sorted with slightly specialized horizons _AND_ still building a global network (important if you don't want to be stuck in a regional segmentated network).
Looking into far future, it might be possible to do horizon travelling.... search movies in one horizon and then step further to get the latest outer space cook recipes in another horizon. yeah.

Hope you like my suggestion, Moak

noci	October 22nd, 2001 09:53 AM

yes, cool idea ;-)

faintly reminded me of the ID3 "genre" tag for Mp3s.
of course, the decision which musical genre a file represents is subjective.
But if Gnutella could search files by that tag, the results would still be averagely good because there is something like a "mental common ground" of users.

Same applies to the user interest groups- pools of interconnected users sharing the same interests would certainly form! ;)

if it was possible to combine the group-by-interest feature with gTLD filtering, gnutella queries would no longer resemble searching for a needle in a haystack.
let's see what the developers think about all this......

Daniel Stein

November 5th, 2001 05:53 PM

TTL

you know -- in the old versions of Lime wire you could manually set the TTL to something extraordinary like 14. Its not very nice, but it sure pulled in the results -- especially if you were looking for something specific. I wish someone would tell me how to set the TTL in the current version...
-D

Moak	November 5th, 2001 06:13 PM

You know, a lot of (or all?) servants will lower TTL to something healthy while they route it. So do not expect better results with 'extraordinary' high TTLs. :)

Unregistered

November 6th, 2001 11:09 AM

TTL

that's good to know. So TTL doesn't do so much anymore. I swear that it used to make a huge difference. So what can I do to make deep searches? Just keep trying? and manually connect to different hosts? Seems rather inefficient...

SRL	November 6th, 2001 02:44 PM

I had been thinking that it help if a client watched which of the connections you have were returning the most results after a search, and then just try to connect to more IPs from that source (using IPs returned by the searches, pongs, and so on).

This way people with similar interests might just naturally "clump together" so to speak. It should also work without any changes to the protocol.

Moak	November 13th, 2001 10:03 AM

> naturally "clump together

yes, sound interesting! A kind of horizon travelling or creeping until you find more hits, horizons could dynamically order themselfs.

John Blackbelt Jones

November 13th, 2001 10:14 AM

As far as I do understand gnutella, that would not be exactly a good idea, for it causes a lot of unwanted traffic giving you probably about 90% of the time the same results. It'd be also taking away incoming connections slots which the network is currently short of.

Moak	November 13th, 2001 10:33 AM

Hi, can you describe you conclusion more detailed please? Sorry, I currently do not see the connection between horizon travelling and increased traffic?

John Blackbelt Jones

November 13th, 2001 11:21 AM

As I see it, the problem is you don't know how those far hosts which are maybe 7 hops away from you, are connected to the rest of the network. They might not know other hosts than you do and they will forward your request to the hosts within their reach.

If you wanted to be sure to reach any new hosts, you would have to try more than one host and your horizon and it could as well be that your search-request reaches the same part of the network multiple times, causing additional traffic each time. If this was common place, the amount of search-requests could easily multiply, without necessarily improving the result of the searches to the same extend. This could be very ineffective.

I think supernodes are currently the best solution to increase your network horizon. But I'm no expert. Maybe one of the developers knows it better than me.

Moak	November 13th, 2001 11:49 AM

hehe, what you describe is multi-horizon-flooding. :)

Let's brainstorm: I don't think horizon travelling _must_ be unhealthy, it could be. See it like a sheep: once every grass stem on the meadow is eaten, it does walk to the next meadow nearby. A sheep does not constantly hop and run around like a crazy sheep and eat every stem it could get on the whole wide world. *grin-imaginating-a-crazy-sheep*

Modern gnutella client do allready generate automatic researchs (for resuming files e.g.), together with a highly recommended query cache this is nothing unhealthy. Going into far future (assuming query caches are present in every servant/superpeer - important!):

Horizon traveller could use hostcaches to find a brand new horizon (best thing) - or just creep along horizons by quering neighbour IPs with an TTL=1 IP time by time (not that good idea maybe, I guess horizon border hops are better, but how to achieve that?)... until they find new "rich medows". If automatic search queries are within limits, there is no significant higher traffic IMHO.
As a counterthesis: By dynamic restructuring of horizons with more specialized content, there will be a more efficient traffic, because searchqueries does only reach interested hosts not various hosts (seeing gnutella network at whole = multiple horizons).
So far my theory, maybe I'm wrong, that's what I ment with "sounds interesting". Propably the idea of 'specialized gnutella horizons' (described on top of this thread) looks like a more promising approach for horizon travelling. It does include a automatic horizon travelling, because connected hosts drop time by time and must be refreshed. However a seldom horizon travelling or parallel travelling on purpose must not be unhealthy: gimme music in one specialized horizon and this outer space cookie recipes in another specialized horizon.

Greets, Moak

Moak	November 13th, 2001 04:54 PM

PS: A possible attempt to enable inside 'horizon border travelling' (far future) could be to add another Descriptor 'SuperpeerInfo': A super peer sends this message e.g. once an hour to all hosts in the horizon (high TTL) and tells that he is available and how big he is (how many servants connected). Every client can collect those messages and then:

a) hop within the horizon
b) avoid to be double connected within the same local horizon (it must not happen that you connect two big super peers within a local horizon and create identical traffic)
c) find new super peers very fast when one drops! (important for modem users)

Hmm, completely nuts what I'm saying?

SRL	November 16th, 2001 07:33 PM

What I was thinking of wouldn't stress the network at all. All you need do is keep track of the following.

1) which connection search results come from
2) which peers you know about from this same connection

It wouldn't make any new activity, just watch what normally goes on, and when it comes time to make a new connection try IPs from the connection that returned the most search results first (instead of picking a random IP from the host cache as it does now).

Superpeers are find once everything supports them, but until then the horizon will always be limited. Even with them it's possible all SP's won't be able to see each-other or SP clients won't have enough memory to hold all the search results (which could get *really* big).

It would be much better if people looking for similar files tended to drift towards one another. Common files are easy to find, but it's getting nearly impossible to find anything only a few peers are sharing.

P.S. Actually, after thinking about it, even better and simpler approach would be to keep and try connecting first to IPs from the search results themselves (if non firewalled). All you'd need do is stick search result IP in the host cache. Also hosts downloading from you might be good too (under the theroy that if they're interested in what you've got perhaps you'll find what they've got interesting).

Moak	December 26th, 2001 04:46 PM

Specialized horizon may be a good extension to Gnutella protocoll v0.6 handshaking and so increase the community idea? And it helps to decrease overall backbone bandwith, by grouping similar content and similar search traffic together.

All times are GMT -7. The time now is 02:30 AM.