Gnutella Forums  

Go Back   Gnutella Forums > Gnutella News and Gnutelliums Forums > General Gnutella Development Discussion
Register FAQ The Twelve Commandments Members List Calendar Arcade Find the Best VPN Today's Posts

General Gnutella Development Discussion For general discussion about Gnutella development.


Welcome To Gnutella Forums

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content, fun aspects such as the image caption contest and play in the arcade, and access many other special features after your registration and email confirmation. Registration is fast, simple and absolutely free so please, join our community today! (click here) (Note: we use Yandex mail server so make sure yandex is not on your email filter or blocklist.)

If you have any problems with the Gnutella Forum registration process or your Gnutella Forum account login, please contact us (this is not for program use questions.) Your email address must be legitimate and verified before becoming a full member of the forums. Please be sure to disable any spam filters you may have for our website, so that email messages can reach you.
Note: Any other issue with registration, etc., send a Personal Message (PM) to one of the active Administrators: Lord of the Rings or Birdy.

Once registered but before posting, members MUST READ the FORUM RULES (click here) and members should include System details - help us to help you (click on blue link) in their posts if their problem relates to using the program. Whilst forum helpers are happy to help where they can, without these system details your post might be ignored. And wise to read How to create a New Thread

Thank you

If you are a Spammer click here.
This is not a business advertising forum, all member profiles with business advertising will be banned, all their posts removed. Spamming is illegal in many countries of the world. Guests and search engines cannot view member profiles.



           Deutsch?              Español?                  Français?                   Nederlands?
   Hilfe in Deutsch,   Ayuda en español,   Aide en français et LimeWire en françaisHulp in het Nederlands

Forum Rules

Support Forums

Before you post to one of the specific Client Help and Support Conferences in Gnutella Client Forums please look through other threads and Stickies that may answer your questions. Most problems are not new. The Search function is most useful. Also the red Stickies have answers to the most commonly asked questions. (over 90 percent).
If your problem is not resolved by a search of the forums, please take the next step and post in the appropriate forum. There are many members who will be glad to help.
If you are new to the world of file sharing please do not be shy! Everyone was ‘new’ when they first started.

When posting, please include details for:
Your Operating System ....... Your version of your Gnutella Client (* this is important for helping solve problems) ....... Your Internet connection (56K, Cable, DSL) ....... The exact error message, if one pops up
Any other relevant information that you think may help ....... Try to make your post descriptive, specific, and clear so members can quickly and efficiently help you. To aid helpers in solving download/upload problems, LimeWire and Frostwire users must specify whether they are downloading a torrent file or a file from the Gnutella network.
Members need to supply these details >>> System details - help us to help you (click on blue link)


Moderators

There are senior members on the forums who serve as Moderators. These volunteers keep the board organized and moving.
Moderators are authorized to: (in order of increasing severity)
Move posts to the correct forums. Many times, members post in the wrong forum. These off-topic posts may impede the normal operation of the forum.
Edit posts. Moderators will edit posts that are offensive or break any of the House Rules.
Delete posts. Posts that cannot be edited to comply with the House Rules will be deleted.
Restrict members. This is one of the last punishments before a member is banned. Restrictions may include placing all new posts in a moderation queue or temporarily banning the offender.
Ban members. The most severe punishment. Three or more moderators or administrators must agree to the ban for this action to occur. Banning is reserved for very severe offenses and members who, after many warnings, fail to comply with the House Rules. Banning is permanent. Bans cannot be removed by the moderators and probably won't be removed by the administration.


The Rules

1. Warez, copyright violation, or any other illegal activity may NOT be linked or expressed in any form. Topics discussing techniques for violating these laws and messages containing locations of web sites or other servers hosting illegal content will be silently removed. Multiple offenses will result in consequences. File names are not required to discuss your issues. If filenames are copyright then do not belong on these forums & will be edited out or post removed. Picture sample attachments in posts must not include copyright infringement.

2. Spamming and excessive advertising will not be tolerated. Commercial advertising is not allowed in any form, including using in signatures.

3. There will be no excessive use of profanity in any forum.

4. There will be no racial, ethnic, or gender based insults, or any other personal attacks.

5. Pictures may be attached to posts and signatures if they are not sexually explicit or offensive. Picture sample attachments in posts must not include copyright infringement.

6. Remember to post in the correct forum. Take your time to look at other threads and see where your post will go. If your post is placed in the wrong forum it will be moved by a moderator. There are specific Gnutella Client sections for LimeWire, Phex, FrostWire, BearShare, Gnucleus, Morpheus, and many more. Please choose the correct section for your problem.

7. If you see a post in the wrong forum or in violation of the House Rules, please contact a moderator via Private Message or the "Report this post to a moderator" link at the bottom of every post. Please do not respond directly to the member - a moderator will do what is required.

8. Any impersonation of a forum member in any mode of communication is strictly prohibited and will result in banning.

9. Multiple copies of the same post will not be tolerated. Post your question, comment, or complaint only once. There is no need to express yourself more than once. Duplicate posts will be deleted with little or no warning. Keep in mind a forum censor may temporarily automatically hold up your post, if you do not see your post, do not post again, it will be dealt with by a moderator within a reasonable time. Authors of multiple copies of same post may be dealt with by moderators within their discrete judgment at the time which may result in warning or infraction points, depending on severity as adjudged by the moderators online.

10. Posts should have descriptive topics. Vague titles such as "Help!", "Why?", and the like may not get enough attention to the contents.

11. Do not divulge anyone's personal information in the forum, not even your own. This includes e-mail addresses, IP addresses, age, house address, and any other distinguishing information. Don´t use eMail addresses in your nick. Reiterating, do not post your email address in posts. This is for your own protection.

12. Signatures may be used as long as they are not offensive or sexually explicit or used for commercial advertising. Commercial weblinks cannot be used under any circumstances and will result in an immediate ban.

13. Dual accounts are not allowed. Cannot explain this more simply. Attempts to set up dual accounts will most likely result in a banning of all forum accounts.

14. Video links may only be posted after you have a tally of two forum posts. Video link posting with less than a 2 post tally are considered as spam. Video link posting with less than a 2 post tally are considered as spam.

15. Failure to show that you have read the forum rules may result in forum rules breach infraction points or warnings awarded against you which may later total up to an automatic temporary or permanent ban. Supplying system details is a prerequisite in most cases, particularly with connection or installation issues.

Violation of any of these rules will bring consequences, determined on a case-by-case basis.


Thank You! Thanks for taking the time to read these forum guidelines. We hope your visit is helpful and mutually beneficial to the entire community.


Reply
 
LinkBack Thread Tools Display Modes
  #51 (permalink)  
Old April 13th, 2002
Gnutella Veteran
 
Join Date: March 14th, 2002
Location: Canada
Posts: 121
Smilin' Joe Fission is flying high
Default Re: Re: Re: Re: Back to the TOPIC

Quote:
Originally posted by Nosferatu
I just had a conversation on irc .. someone had a good idea, maybe some of you have heard it before.

Anyway, the idea is this: hash the first meg of the file as well as the whole file.

So that way you can tell that 'buffy vampire.divx' 20M is the same file as 'buffy vampyre.divx' 80M, and get at least the first 20M.

Then you repeat search later for files with first meg hash = x.

To implement this most reliably and sensibly would require instead of the HUGE proposal's technicque of always and only hashing the whole file, the best implementation would be to have a query 'please hash the file from range x-y'.
I believe this was part of the HUGE proposal as well... The part about using a Tiger tree to hash sections of a file. Is it not?

Quote:
Well, this is the question. Is the HASH indeed large enough to <i>have</i> a unique value for each individual permutation of a 1G file, and if not, does it really matter?
I believe it may be although I haven't verified it. However, what makes me think this is that the SHA1 hash is good for files up to 2<sup>64</sup> bits long, for which I would think it would generate a unique hash for each unique file.

Quote:
Certainly we are not going to generate each version of a 1G file that is possible .. ever (well, unless some pr!ck sits down in the far future and does it on purpose as a programming exercise using some newfangled superdupercomputer we can't even imagine yet .. but I stray from the topic). We do need a hash that has enough values that <i>most probably</I> each individual file we generate will have a unique value .. but it can't be known for sure unless you actually generate the hash for each file (ie generate each file).
Agreed.

Quote:
I think if you look at the file size and the hash, you have enough certainty to call it a definite match in searching for alternate download sources. Better techinuqe described above in first portion of post.
Personally, I would trust just the hash because a file of a different size should theoretically generate a different hash. But that's just my opinion.

Quote:
I did a quick one on my calculator based on figure for 'mass of observable universe' from O'Hanian 'Physics' text book .. and 1e70 would seem to be what "they" think (the scientists). But I agree about the drugs
Well, hopefully they'll do a count someday to find out an exact number. Heh.
Reply With Quote
  #52 (permalink)  
Old April 13th, 2002
Nosferatu's Avatar
Daemon
 
Join Date: March 25th, 2002
Location: Romania
Posts: 64
Nosferatu is flying high
Default

Quote:
Quote:
To implement this most reliably and sensibly would require instead of the HUGE proposal's technicque of always and only hashing the whole file, the best implementation would be to have a query 'please hash the file from range x-y'.
I believe this was part of the HUGE proposal as well... The part about using a Tiger tree to hash sections of a file. Is it not?
Can't find the word tiger or anything that looks like hashing of parts of the file at

http://rfc-gnutella.sourceforge.net/...-huge-0_92.txt or http://rfc-gnutella.sourceforge.net/...-huge-0_93.txt .
Perhaps it was discussed and then dropped .. ? Got a reference?I found http://bitzi.com/ propose/use <A HREF="http://bitzi.com/developer/bitprint">tiger-tree </A>as an attempt to index as many files as they can .. looks like a good project to incorporate into gnutella clients - have a bitzi index lookup.

Also found the <A HREF="http://www.cs.technion.ac.il/~biham/Reports/Tiger/">Tiger Hash algorithm homepage</A> and the <A HREF="http://sourceforge.net/projects/tigertree/">tiger-tree homepage</A>.

Unfortunately between these three sources I can't find a description of the tiger-tree process in words I can understand. <A HREF="http://bitzi.com/developer/bitprint">"TigerTree is based on the Tiger hash algorithm, applied to each 1024-byte block of a file, then combined up through a binary hash tree to a final summary value"</A> really doesn't cut it for me.

Anyone know what it means? They imply that it can be used for incremental portions of the file .. but I don't understand the process.

These bitzi guys are JUST doing hashing of files, and are covering any files you care to name .. so they probably have thrashed this issue out pretty well.

Also, if there aren't competing schemes to index all of filespace, then it really makes a lot of sense to use their hashing scheme so that you can link in and allow users to search bitzis index to see what it has to say about what the user receives in their search results. I think this is a <B><I>really exciting idea</I></B>. Could save a lot of bandwidth downloading broken mp3s etc, for example.
Quote:
Quote:
I think if you look at the file size and the hash, you have enough certainty to call it a definite match in searching for alternate download sources. Better techinuqe described above in first portion of post.
Personally, I would trust just the hash because a file of a different size should theoretically generate a different hash. But that's just my opinion.
The file size could be used to eliminate the case where two files have the same hash. It's possible to happen, but I would expect (and I don't know enough to say this is definitely the case) I would expect that the chances of two files of the same size having the same hash is much smaller than the chance of two files of differing sizes having the same hash. It's just a way to rule out 99% of files which could be duplicate hash but different file.

Nos
Reply With Quote
  #53 (permalink)  
Old April 14th, 2002
Gnutella Veteran
 
Join Date: March 14th, 2002
Location: Canada
Posts: 121
Smilin' Joe Fission is flying high
Default

Quote:
Originally posted by Nosferatu
Can't find the word tiger or anything that looks like hashing of parts of the file at

http://rfc-gnutella.sourceforge.net/...-huge-0_92.txt or http://rfc-gnutella.sourceforge.net/...-huge-0_93.txt .
Perhaps it was discussed and then dropped .. ? Got a reference?I found http://bitzi.com/ propose/use <A HREF="http://bitzi.com/developer/bitprint">tiger-tree </A>as an attempt to index as many files as they can .. looks like a good project to incorporate into gnutella clients - have a bitzi index lookup.
You're right... I remember reading about the value of the Tiger Tree hash and, without actually looking at the HUGE proposal again to verify it, I assumed it was the proposal where I originally saw it. However, the HUGE proposal does include a provision for a 39 character tiger tree value, but they don't explain how it is used or how it is generated.

Quote:
Also found the <A HREF="http://www.cs.technion.ac.il/~biham/Reports/Tiger/">Tiger Hash algorithm homepage</A> and the <A HREF="http://sourceforge.net/projects/tigertree/">tiger-tree homepage</A>.

Unfortunately between these three sources I can't find a description of the tiger-tree process in words I can understand. <A HREF="http://bitzi.com/developer/bitprint">"TigerTree is based on the Tiger hash algorithm, applied to each 1024-byte block of a file, then combined up through a binary hash tree to a final summary value"</A> really doesn't cut it for me.

Anyone know what it means? They imply that it can be used for incremental portions of the file .. but I don't understand the process.
I'll see if I can reiterate correctly how it works. Basically the Tiger algorithm hashes a file in 1024 byte blocks. Then it sets up a tree system for finding the final hash which looks similar to:
Code:
A   B C   D   
 \ /   \ /
  E     F
   \   /
    \ / 
     G
From what I remember, working with an "expected" hash value, one can then determine if any of the component hashes are bad. For instance, given the value of E, one could verify that blocks A and B are correct. Given the value of G, one could verify the hash totals for E and F. If, for instance, block C was corrupted, the hash total for F would be wrong so E and F would not equal G. Knowing that the hash total for F was wrong, a person could then trace back the fault to either block C or D. And, if given the true hash values for C and D, individual comparisons could be made to determine that C is incorrect.

I hope this makes a shred of sense, it's in the early morning as I'm writing this and my brain is falling asleep. Besides that, I can't seem to find the reference material I got this from.
Quote:
I think this is a <B><I>really exciting idea</I></B>. Could save a lot of bandwidth downloading broken mp3s etc, for example.
Or, in my case, broken video files. I really hate having to download a 500+ MB file again because a few frames in the middle somehow got garbled during transfer.
Reply With Quote
  #54 (permalink)  
Old April 14th, 2002
Gnutella Veteran
 
Join Date: September 21st, 2001
Posts: 110
gnutellafan is flying high
Default tiger hash

First, tiger is a hash algorythm just like md5 or sha1. "tree" describes a way of using that algorythm where segments of the file are hashed individually. The tigertree implementation used by Bitzi uses 1024b blocks (though they could use any size). I have no evidence but I think that around 1mb would be the best.

The tree hash is the best way to share partial files. A tree hash can use any hash algorithm (ie, md5, sha1, tiger, ect). Small chunks of the file are individually hashed and all of these hashes make up the tree hash. Because of this you could set it so that there is a hash for every 1mb of the file. Then you could securely and confidently download partial files of 1mb size from multiple hosts with partial files.



An added bonus of the tree hash method is the ability to resume from nearly identical files.

For example:

I want to download songx, so I search for it and find it. There are quite a few versions with the same size, bitrate, ect but they have different metadata so the hash is different.

Well, with the tree hash you could use all of those sources to swarm from for the parts of the file that are the same!!! This would greatly increase swarming speeds while providing the same security and confidence we currently have with hashed files!
Reply With Quote
  #55 (permalink)  
Old April 14th, 2002
Nosferatu's Avatar
Daemon
 
Join Date: March 25th, 2002
Location: Romania
Posts: 64
Nosferatu is flying high
Question Tree is stored where?

Hmm .. hash trees ...

What I have understood from searching the web:

Tree-hashes are also known as Merkle Trees. The idea was <A HREF="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=%274,309,569%27.WKU.&O S=PN/4,309,569&RS=PN/4,309,569">patented in 1979</A>, but I read somewhere the patent ran out in 1999. The tree works like this:

Hash tree

<PRE>
(Whole File)
|
/\
/ \
/ \
(m) (n)
/ \ / \
/ | | \
(i) (j) (k) (l)
/ | / \ / \ / \
(a)(b)(c)(d)(e)(f)(g)(h)
</PRE>

You get parts a and b of a file. You get the hash of the entire file, and the hash values for only j and n, and you can verify that a and b are part of the file by generating x, then n, then with p, the whole file hash.

But in gnutella you wouldn't do this - it's absolutely pointless. For it to work for all parts of the file, all values in the tree hash need to be stored centrally where you can get them to check. If you have available an index (<A HREF="http://bitzi.com/">bitzi</A> only stores the whole file hashes) you would just download the correct hash for section x and check it.

I can't see a feasible way to make that aspect work without a central server storing all the intermediate hash values, otherwise you might just as well do this:

If you didn't use a tree, you might store all the values a-h and download and check each one individually. For a big file this is a lot of downloading and checking.

So you might make a hash of all the hashes together and store that value - but that is useless unless you have downloaded <I>all</I> the sub parts.

So the tree covers these in-between cases.

BUT you need to have all these values available somewhere for verification for it to be useful, ie if you find only parts a, c and n, you still need someone to tell you the correct hashes for b and d and the whole hash to verify the whole hash. The idea is that you can download part of a file from someone who doesn't know what the whole file is, so the person you're downloading the file portion from might not know what the whole file looks like, so you have to find the info from someone else.

Now, to set up a service like Bitzi storing all the subtree values, I guess the storage would blow out. That's obviously why they don't do it.

And I can't see a sane way to track all these sub-portions within the gnutella protocol. I guess you could automatically make lists and store them in config files .. then automatically share them out as virtual files and automatically query for the virtual files .. but it sounds like a big headache to me.

Another option is calculating them on request, but this seems .. bad too.

So the hash tree idea doesn't seem helpful to me (except in maybe an all-out swarming effort .. which seems like too much work for not enough benefit at this stage).

Can anyone point out something I've missed? Is anyone implementing this as a way to do swarmed downloads?

I'm back to thinking that the easy and reliable solution which works is just query for a hash of a given byte-range.

This has an additional benefit I didn't mention before, you could ask for hash of mp3 files being offset by 64 bytes or 64k or whatever size the id3 tag is. Then files which are the same except for meta data could be easily found:

Search for files matching words blah foo bananas.

Query hash of file "foo bananas go blah.mp3" 2.35M offset by 64 bytes or whatever it is. Same with file "foo bananass goes blah.mp3" 2.37M. They match! Queue both as alternative downloads! Unfortunately "bananas go blah foo.mp3" 2.34M turned out to be a different file (must be 160bit or something ;P )

Nos
<I>[Editted 15 Apr 2002 to clean up drawing - sorry pic looks OK in (gak!) IE but maybe not in <B>your</B> browser]</I>

Last edited by Nosferatu; April 14th, 2002 at 09:16 PM.
Reply With Quote
  #56 (permalink)  
Old April 15th, 2002
Gnutella Veteran
 
Join Date: September 21st, 2001
Posts: 110
gnutellafan is flying high
Default

No need for a central server. Who ever is hosting (sharing) the file keeps the whole file hash as well as as 1mb incremental hashes. These are stored just like the sha1 for HUGE. Then if I start a download fromyou I get that hash info. Now I can use it to search the gnet for the other parts, even if they are partial files, to swarm from.
Reply With Quote
  #57 (permalink)  
Old April 15th, 2002
Gnutella Veteran
 
Join Date: March 14th, 2002
Location: Canada
Posts: 121
Smilin' Joe Fission is flying high
Default

What would be better is if a method were used where one could determine the component hashes by disassembling the full file hash. Then, only the full file hash would need to be sent when requesting a file.

I suppose that may be asking a bit much though.
Reply With Quote
  #58 (permalink)  
Old April 15th, 2002
Gnutella Veteran
 
Join Date: September 21st, 2001
Posts: 110
gnutellafan is flying high
Default

just a bit ;-)

Yes, this would be great. But a downfall might be that AN actual set of data might be able to be calculated that would match such a hash and then it would be possible to create fake data with the same hash and screw up the dls on gnet.
Reply With Quote
  #59 (permalink)  
Old April 15th, 2002
Nosferatu's Avatar
Daemon
 
Join Date: March 25th, 2002
Location: Romania
Posts: 64
Nosferatu is flying high
Default Can be done

It's another way of doing it, but I didn't mention it because basically it's not.

You just make the hash for the whole file the concatenation of the hashes for the parts.

It means that either you select parts as being pretty big compared with the size of the whole file, or you end up with a long hash.

Nos
Reply With Quote
  #60 (permalink)  
Old April 17th, 2002
Gnutella Veteran
 
Join Date: September 21st, 2001
Posts: 110
gnutellafan is flying high
Default improvement of my idea

After talking with gordon from Bitzi I think tree hashes are overkill. Instead you could simply hash ranges of the file with sha1. This could be done in 1mb chunks. So basically all files would be hashed 2x. Once for a full file hash, and once where a hash is generated for each 1mb portion of the file starting from the begining. Since the the file will not be an exact multiple of 1mb the last hash may be of a segment shorter than 1mb.

I dont have any basis for choosing 1 mb of course. A bit of trial and error would be needed to optimize the system. Anything larger than 1mb, say 5mb or 10mb would be good for large files but would not provide the benefit, esp meta data benefits, for small files such as mp3s.

Does anyone know more about meta data, is it always stored at the end of files, even for videos ect?
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Gnutella Protocoll v0.7 Proposal Moak General Gnutella Development Discussion 41 August 17th, 2002 10:55 AM
gnutella development plans Iamnacho General Gnutella Development Discussion 11 March 9th, 2002 06:21 PM
My Proposal for XoloX!!! Unregistered User Experience 1 February 6th, 2002 08:11 AM
Xolox and Gnutella development Moak Rants 6 November 25th, 2001 06:05 AM
---a Radical Proposal--- Unregistered General Gnutella / Gnutella Network Discussion 0 September 21st, 2001 12:08 PM


All times are GMT -7. The time now is 05:56 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.