Gnutella Forums  

Go Back   Gnutella Forums > Current Gnutella Client Forums > LimeWire+WireShare (Cross-platform) > New Feature Requests
Register FAQ The Twelve Commandments Members List Calendar Arcade Find the Best VPN Today's Posts

New Feature Requests Your idea for a cool new feature. Or, a LimeWire annoyance that has to get changed.


View Poll Results: Do you want this feature to be available?
Yes 51 92.73%
No way! 3 5.45%
I don't know 1 1.82%
Voters: 55. You may not vote on this poll

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old September 3rd, 2006
Enthusiast
 
Join Date: June 14th, 2006
Posts: 30
macho6868 is flying high
Default

The purpose of this one-way-hash filter is to have large scale hashing in a small subset of data.

It is similar to a bloom filter except that instead of using a binary system the filter uses an a-z system where each point can hold more that one value and is not limited to binary values.


View source code v.01 going to add in bitwise shifting soon.
View example of a spellchecker done entirely in php with a 300,000 word dictionary (2.5M), the hash table is a 50x50 (1 in 3,250,000 probabilty) at that size the hash table is 419k serialized and uncompressed, it could be done in a 25x25 hash table which would be around 200k however the probabilty of false positives is much much higher (Compare 25x25 matrix example against the above example).


--------------------------------------------------------------------------------

Why use these?

A 250x250 matrix (large) has a unique factor of over 40 million and is around 16 megabytes blank. The structure is designed for large hashes. It would be easily possible to hold the entire domain name system hashed in around 20 megs. What this means is you could have very fast lookups on if a name is in the system.


--------------------------------------------------------------------------------

How does this filter work?


Take an item and hash it four times, modulus each value to the following values.
A marker for the X axis of the matrix
A marker for the Y axis of the matrix
A marker for how deep into the hash at the matrix point to go into.
A marker for the letter to use,
Insert it twice by flipping the X and Y values.

Each item hashed in the database will have a unique value set to each of the above markers. For lookups, the letter must be in two exact places withing the entire filter. In the above example of the spelling checker, there is a 1/1.625M probabily of a duplicate at any given point (due to doubling data inserting).

These filters can take a while to generate, they are very processor intensive to create. These filters use a matrix format for speed in lookups, each matrix point holds a small filter.


--------------------------------------------------------------------------------

How is that this filter can hold so much information?

Example small filter

0000000000

Insert "d" at position '2' (note:Counting starts at zero)

00d0000000

Insert "z" at position '5' (note:Counting starts at zero)

00d00z0000

Now it becomes interesting. insert "f" at position '2' (on top of another letter in place)

00Df00z0000

If another needs to be positioned at the same point. insert "g" at position '2' (on top of another two letters in place)

00DFg00z0000

The array maintains all positions and if it encounters an uppercase letter it concats until a non-uppercase letter is found.

Array
(
[0] => 0
[1] => 0
[2] => DFg
[3] => 0
[4] => 0
[5] => z
[6] => 0
[7] => 0
[8] => 0
[9] => 0
)

Multiple hashing algorithms are used for placements.

TO DO:


Non-XY-flip version that uses a unique secondary value or a combination of other values, however I felt that at this time the number of hashes is approaching maximum.
More documentation and list of papers cited.
List of links to projects/experiments using bloom filters for large datasets.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
File size filter a must! kis46 New Feature Requests 37 January 21st, 2007 07:21 AM
File Size Filter??? brianosaur Open Discussion topics 0 May 28th, 2006 12:38 PM
File Size Filter Trip LD New Feature Requests 16 November 8th, 2005 07:56 PM
File size filter Nvi New Feature Requests 6 November 5th, 2005 08:10 AM
Can you filter by file size??? motter28218 Open Discussion topics 0 October 10th, 2005 05:52 PM


All times are GMT -7. The time now is 04:23 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.