hashing

#1 (**permalink**) October 11th, 2002

I'm just wondering..

I started up limewire about 3 hours ago.
Defined my shares and left the computer "idle"
with no workload but the hashing of the files.

That was 3 hours ago. It's gotten to about 6500 files now,
still not complete, with about 1500 to go.

My computer is a 1,1GhzAMD with 256MBram and ATA100/ultra-scsi disks currently on a win32 system (xp) with javaVM 1.31
and frankly:
**** I don't quite see why this should take so long?!?!?!? ****

It's a little annoying. I'm not saying that the Java-IO-interface deserves a good pat on the back for speed and agility, but what the hell ?!?

I notice that it takes longer in directories with "huge" files. Do you open the entire file?! If so, WHY are you doing this?
Isn't it enough to open the header of the file ??

Are there any alternative methods to the java approach that may be incorporated in a standalone (REAL (c/c++/asm)) program ?

By the way: The client is on a good path. A few months ago I found it rather unusable ! KEEP UP THE GOÒD WORK!

Best regards from a
Slightly bored user

Treatid · #2 (**permalink**) October 18th, 2002

Hashing does read every byte of every file that you are sharing.

The idea of hashing is to identify whether two files are identical or not.

The hard way of testing two files to see if they are identical is to compare each byte of each file. This is impractical if one of the files is sitting on a remote machine.

Hashing reads each byte of a file and creates a number based on those bytes. Two identical files will generate the same number (hash). Two different files (even if they are only a little different) will (probably) generate different numbers.

Now, a comparison between two files can be made by comparing the numbers rather than the whole file.

Mark

Norm · #3 (**permalink**) October 20th, 2002

Mark:

Interesting - If I understand what you are saying Limewire will identify, say, a 6,707 byte file on your computer to be the same file as a 6,707 byte file on my computer. If so it must also check the filename for identical or at least some common words to prevent same length but totally different files from appearing alike and attempting to split a download from two totally different files.

Norm

#4 (**permalink**) October 20th, 2002

not quite, norm.

hashing is more or less a relic (although relic is the wrong word) from cryptography. one example of a bad way of hashing would be to add up every byte of the file squared by the position of that bad. there are many different ways to hash a file, most clients today support an SHA1 hash, which is supposed to be unique per filesize. so, one file that's 500MB wouldn't have the same hash as another file that's 500MB unless they're exactly the same file. this is completely disregarding the file name.

the very nice thing with hashes is that people can change the filenames of files, but limewire (and other clients) will still be able to identify the specific file.

Muati · #5 (**permalink**) October 26th, 2002

Another reason that Limewire takes so long to hash file is that it doesn't take up 100% of your CPU for file hashing. It is more like 50%.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Library hashing	joeyjoe	BearShare Open Discussion	3	April 13th, 2006 07:36 AM
big fat pipe	jay173	Connection Problems	3	December 13th, 2005 11:27 PM
hashing speed	crohrs	New Feature Requests	18	November 27th, 2003 09:06 AM
Hashing indicator	Treatid	New Feature Requests	0	October 22nd, 2002 05:56 AM
Hashing	Unregistered	New Feature Requests	3	July 18th, 2002 04:01 AM