View Single Post
  #1 (permalink)  
Old April 29th, 2002
Sajma Sajma is offline
Disciple
 
Join Date: April 26th, 2002
Posts: 11
Sajma is flying high
Lightbulb show related content

One of the features I like about centralized content providers (like Amazon.com) is the ability to find "related content". For example, when I view an item on Amazon, I also see a list titled "People who bought this item also bought these other items."

Such a service would be very useful for finding content on Gnutella. However, we can't rely on (and wouldn't want) a centralized service to track downloads. Instead, consider the following intuition:

Since each user controls what content items their Gnutella node shares, those items are related (by that user's interests). Therefore, if I know I like item A, I'd like to find other items that are located on nodes that also share item A. For example, if I like song A and I find other servers that share both song A and song B, then I might like song B.

Note that the "Browse Host" feature can let me find related content manually. However, this is inefficent if the other host has only a small number of items I care about. Instead, let's have the system automatically filter out the cruft. I propose the following algorithm, based on content hashes (like those used in HUGE):[list=1][*]Each node creates an index of the content hashes all the items it shares (a SHA-1 hash is about 20 bytes, so an index of 1000 items is still only 20 KB).[*]To find the items related to item A, search for item A's content hash. Then retrieve the index from each node that shares item A.[*] For each item in any of those indexes, count the number of indexes it appears in. Those items that appear in the most other indexes are the ones most closely related to item A, so show them to the user (filtering out any items that I already have).[/list=1]
There are a number of possible improvements to this protocol, including:
  • it might be more efficient to exchange indexes whenever I contact another node
  • there's probably a better statistical analysis to check for "co-related" items
  • we can't just display content hashes to the user, so fetch item descriptions as needed
Limewire developers: If you're interested in implementing this feature and want to discuss it, let me know!

Last edited by Sajma; April 29th, 2002 at 02:22 PM.
Reply With Quote