Gnutella Forums - View Single Post

#**143** (**permalink**) April 4th, 2005

Better detection of corrupt files. I often get files arriving that never generated the "Limewire has discovered a corruption in..." message, but which are nonfunctional -- images that won't render or render only partially, mp3z with distortion or which cut off, videos that won't work or only the audio works (ones in avi format seem to be especially prone to this, probably because errors completely destroy the ability to decompress any of an avi file, rather than just introducing distortion the way they usually do in mp3/mpeg format files).

On top of that, there are the substitutions -- files that aren't as advertised, such as deliberately-spoofed mp3z and jpegs whose content has been completely replaced with (rather than just painted over with) spam of some sort.

All of these have in common that the received file will have a different MD5 sum or SHA1 from the file that produced your search result. In principle, discovering that the file you got is either damaged or a fraud should be almost 100% accurate -- the odds of a hash collision being very remote. In practise, though I was sure Limewire uses SHA1 to detect these situations, it apparently doesn't because the majority of obviously corrupt/spoofed files do not get detected as corrupt at all -- perhaps one in eight get detected, if that.

(This is entirely separate from the issue of spoofed search results -- spoofed search results are easy to detect and avoid, since they always produce ridiculous numbers of supposed sources, claim to have a high speed connection and zillions of open upload slots, and the filename is just your query string, sometimes modified in one of a few ways (changing spaces to underscores, adding extra underscores, and changing capitalization, typicaly). Others have already suggested ways to exclude spoofed search results, by blocking all the hosts that respond to a random nonsensical query string automatically and rerunning on occasion since the b*stards seem to have several whole blocks of ip addresses sometimes. I'm talking about search results that are clearly genuine, but when downloaded, the downloaded file is either a) damaged, or b) a substitute. Killing that damned ipod spammer is going to take both solutions -- not only are a bunch of spoofed hits returned for every query with ipod spam, with this one spammer having several whole netblocks to judge by the hosts I've already identified and manually blocked, but the f*cker is also substituting his crap in responses to non-spoofed search results. I've gotten several valid search results I downloaded only to find that the ipod spammer had somehow intercepted the download request, I guess by participating in the file's mesh, and sent me his bullcrap. NOT ONE of these was discovered to be corrupt by Limewire, even though the ipod spam and the original, legitimate search result surely had different SHA1s. (The result was known to be legitimate in each of these cases due to having few sources, an honest Modem speed advertised, a queue, and the result's file name actually containing things besides underscores that weren't in my query string.))