Gnutella Forums - View Single Post - Proposal for development of Gnutella (hashs)

Nosferatu · #55 (**permalink**) April 14th, 2002

Hmm .. hash trees ...

What I have understood from searching the web:

Tree-hashes are also known as Merkle Trees. The idea was <A HREF="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=%274,309,569%27.WKU.&O S=PN/4,309,569&RS=PN/4,309,569">patented in 1979</A>, but I read somewhere the patent ran out in 1999. The tree works like this:

Hash tree

<PRE>
(Whole File)
|
/\
/ \
/ \
(m) (n)
/ \ / \
/ | | \
(i) (j) (k) (l)
/ | / \ / \ / \
(a)(b)(c)(d)(e)(f)(g)(h)
</PRE>

You get parts a and b of a file. You get the hash of the entire file, and the hash values for only j and n, and you can verify that a and b are part of the file by generating x, then n, then with p, the whole file hash.

But in gnutella you wouldn't do this - it's absolutely pointless. For it to work for all parts of the file, all values in the tree hash need to be stored centrally where you can get them to check. If you have available an index (<A HREF="http://bitzi.com/">bitzi</A> only stores the whole file hashes) you would just download the correct hash for section x and check it.

I can't see a feasible way to make that aspect work without a central server storing all the intermediate hash values, otherwise you might just as well do this:

If you didn't use a tree, you might store all the values a-h and download and check each one individually. For a big file this is a lot of downloading and checking.

So you might make a hash of all the hashes together and store that value - but that is useless unless you have downloaded all the sub parts.

So the tree covers these in-between cases.

BUT you need to have all these values available somewhere for verification for it to be useful, ie if you find only parts a, c and n, you still need someone to tell you the correct hashes for b and d and the whole hash to verify the whole hash. The idea is that you can download part of a file from someone who doesn't know what the whole file is, so the person you're downloading the file portion from might not know what the whole file looks like, so you have to find the info from someone else.

Now, to set up a service like Bitzi storing all the subtree values, I guess the storage would blow out. That's obviously why they don't do it.

And I can't see a sane way to track all these sub-portions within the gnutella protocol. I guess you could automatically make lists and store them in config files .. then automatically share them out as virtual files and automatically query for the virtual files .. but it sounds like a big headache to me.

Another option is calculating them on request, but this seems .. bad too.

So the hash tree idea doesn't seem helpful to me (except in maybe an all-out swarming effort .. which seems like too much work for not enough benefit at this stage).

Can anyone point out something I've missed? Is anyone implementing this as a way to do swarmed downloads?

I'm back to thinking that the easy and reliable solution which works is just query for a hash of a given byte-range.

This has an additional benefit I didn't mention before, you could ask for hash of mp3 files being offset by 64 bytes or 64k or whatever size the id3 tag is. Then files which are the same except for meta data could be easily found:

Search for files matching words blah foo bananas.

Query hash of file "foo bananas go blah.mp3" 2.35M offset by 64 bytes or whatever it is. Same with file "foo bananass goes blah.mp3" 2.37M. They match! Queue both as alternative downloads! Unfortunately "bananas go blah foo.mp3" 2.34M turned out to be a different file (must be 160bit or something ;P )

Nos
[Editted 15 Apr 2002 to clean up drawing - sorry pic looks OK in (gak!) IE but maybe not in your browser]