View Single Post
  #5 (permalink)  
Old December 8th, 2004
verdyp's Avatar
verdyp verdyp is offline
LimeWire is International
 
Join Date: January 13th, 2002
Location: Nantes, FR; Rennes, FR
Posts: 306
verdyp is flying high
Default

The limit on 3 characters was designed at a time when only ASCII searches were reliable.
But since we now support Unicode for handling any language, this rule should be rewritten so that it will require a minimum 3 UTF-8 encoded bytes for a search.

This won't change anything for ASCII searches: it will still be 3 characters.

But for geenral European Latin/Greek searches it will mean that 2 characters will be enough if at least one is not ASCII (note however that searches ignores and drop accents, even if combining accents are still returned in the results)

For Asian languages, 3 UTF-8 bytes will code 1 ideograph or 1 Hiragana or Katakana. May be this limit of 3 bytes is too little.

So as a prudent alternative, I would say that 3 ASCII-only characters or 4 bytes of UTF-8 encoding will be needed to perform a search (For European languages, this is 3 ASCII, or 2 ASCII and 1 extended character, or 2 extended characters; for Asian texts, this means a minimum of 2 ideographs or 2 hiragana/katakana, ignoring the combining voice or tone marks)
__________________
LimeWire is international. Help translate LimeWire to your own language.
Visit: http://www.limewire.org/translate.shtml
Reply With Quote