Gnutella Forums - Gnutella Protocoll v0.7 Proposal

Gnutella Forums (https://www.gnutellaforums.com/)

- General Gnutella Development Discussion (https://www.gnutellaforums.com/general-gnutella-development-discussion/)

- - Gnutella Protocoll v0.7 Proposal (https://www.gnutellaforums.com/general-gnutella-development-discussion/6920-gnutella-protocoll-v0-7-proposal.html)

Quote:

Originally posted by Unregistered
Doesn't work. How many LANs have an working multicast tunnel for Gnutella... UDP is the most simple and best working alternative, other protocols use it too. If an network admin needs to block broadcasts he can do anytime, no need or advantages from multicast.

Pfft. Turn on your brain man!

Broadcast NEVER cross subnets. Multicasts don't cross subnets if you don't have a multicast router (which is common, I agree) or you set your TTL to 1. The most common configuration today is a switched network that sniffs the IGMP membership messages and will only flood multicasts to interested hosts, thus reducing any potential negitave impact on the network. Sure, the multicasts can't span subnets on most networks today, but if you would bother reading what I frigging wrote you'd see that I was saying that they were simply a smarter then broadcasts, even in the TTL=1 case.. Even in the most old & crappy network, the multicast traffic will still reach all the hosts it has to (in such a network it will be functionally the same as broadcasts).
The IETF generally considers the use of broadcast to be depreciated for any new protocol (look at OSPF, broadcast would work fine, but why send the packets to unintrested parties).Finally, most affordable switches don't give you the tools you need to block those broadcasts, you simply can't filter out all broadcasts as you need ARP on most networks. ;) If you want to be ignorant, thats fine, but please don't influence protocol design in areas you obviously don't understand.

A newcomer with attitude, plus unfriendly language.
UDP is still the most simple and best working for an auto-LAN-find.

Quote:

Originally posted by Unregistered
A newcomer with attitude, plus unfriendly language.
UDP is still the most simple and best working for an auto-LAN-find.

Oh please. In my post I assumed you had enough clue to understand what I was saying but simply wern't thinking. Obviously not the case.I don't mean to be offensive, but, seriously, how can you claim any level of expertise if you believe there is some kind of mutual exclusivity between multicast and UDP (while in reality it's quite the opposite)... Sigh.I'm sorry I got as hot headed as I did, but keeping uneducated programmers from using broadcast rather then multicast is a hot issue for me. Now, you can discuss this as a programmer who migh have something to learn about modern best-practices for protocol design, or you can choose to butt heads with me as an expert.. However, if you choose to play the expert then you should be prepaired to get taken apart when you venture into an area where you lack expertise.My harsh attitude should be a lesser social goof then your dishonest representation of expertise. After all, I might ruffle some feathers, but your actions could lead to long lasting stupid mistakes in the protocol.Sorry that I care when I see people being misinformed... but the fact that I'm acting like an ******* doesn't make you any less wrong or me any less right.

UTF, UNICODE technical (long)

As to why the new protocol should extend to UNICODE, and why implement this using UTF-8:

UNICODE aspires to define all characters of all languages. Right now, an address space of 2byte (about 64,000 characters) has been defined to cover most languages. This is being extended to 4bytes, but let's keep it at 2 bytes for now.

UTF (more correctly UTF-8) as well as UCS are ways to express the 2byte-number (I skip the 4byte UNICODE) for a character. UCS simply is the number in 2bytes, thus it may contain null-bytes. Normally when talking about UNICODE, the UCS-2 (= 2 bytes) method of expressing UNICODE is being refered to.

UTF or more correctly UTF-8 uses 1, 2 or 3 bytes to express the 2byte number for a UNICODE character. Null bytes do not occur. This works as follows:
<table border=1 cols=4><tr><td>UNICODE character number range (in hex)</td><td>UTF byte 1 (in binary)</td><td>UTF byte 2</td><td>UTF byte 3</td><td></tr><tr><td>0000 - 007f</td><td>0xxxxxxx</td><td>(none)</td><td>(none)</td><td></tr>
<tr><td>0080 - 07ff</td><td>110xxxxx</td><td>10xxxxxx</td><td>(none)</td></tr><tr><td>07ff - ffff</td><td>1110xxxx</td><td>10xxxxxx</td><td>10xxxxxx</td><td></tr></table>
UTF can also have 4 bytes, and using the same scheme express a character number up to U+10ffff. That won't be relevant right now, but may be in future. Provisions should be taken for upward compatibility with possible 4-byte UTF code sequences.

The first byte of a UTF sequence gives its length in the highest value bits up to the first 0-bit, the following 1 or 2 bytes are easily recognizable as belonging to an UTF sequence by their 2 highest value bits, having a value between 80 and BF. The bits here marked as 'x' give the number of the character in the UNICODE table.

Thus, a UTF character of 1 byte length is exactly the same number as the corresponding ASCII character. However, a Latin-1 character will have a number beyond 7f. So its not possible to say if a single byte is a Latin-1 character or the start of an UTF sequence.

In conclusion, extending the encoding sheme of the protocoll from ASCII to UTF would leave current clients still working, as nullbytes do not occur. Old clients of course would treat each byte of an UTF sequence as a separate character, leading to funny names in the search results. But you get that even now, and searches containing e.g. German special characters do not really work right now: These characters will normally just be ignored. Moving to UCS might make some old clients fail, as one character might contain a nullbyte. Compared to UCS, UTF for a single character either takes less space (for ASCII text), exactly the same space (for the special European characters and any characters up to 07ff UNICODE, for example Russian), or 1 byte more (most notable for Asian languages)

As the bulk of the traffic very probably will remain ASCII for a long time from now, the increase in load by using UTF should be tolerable. You gain a worldwide audience, and you stay compatible with the current standard. Keep in mind that Latin-1 right now neither is standard nor does it work well. Lastly, if at some point in future you desire an extension to cover UNICODE characters up to U+10ffff then UTF-8 can still be used.

Please have a look at the <a href=http://www.unicode.org/>UNICODE Consortium</a>. Demonstration pages for UNICODE (always UTF encoded) can be found anywhere on the web. One such is <a href=http://www.geocities.com/Tokyo/Pagoda/1675/unicode-page.html>here</a>.

If you go for Latin-1, then you need a mechanism to identify the message as Latin-1 or as UNCIODE. If you use UCS, then you probably cannot maintain downward compatibility. You will also get new problems when at some point in the future characters up to U+10ffff should be supported.

Re: UTF, UNICODE technical (long)

Quote:

If you go for Latin-1, then you need a mechanism to identify the message as Latin-1 or as UNCIODE. If you use UCS, then you probably cannot maintain downward compatibility. You will also get new problems when at some point in the future characters up to U+10ffff should be supported.

see GUID tagging.

Re: Re: UTF, UNICODE technical (long)

Quote:

Originally posted by Unregistered

see GUID tagging.

Would make it possible to use Latin 1 as default, but still leaves the nullbytes problem open. If you want to code UNICODE inside a Query/Queryhit into the common charcter set ISO_LATIN 1 you can use UTF-8 for this. Downward compatible, is it not.

see GUID tagging. Stuff could eigther be Latin-1 (default) or UTF-8 (unicode variation), would that fit to the international needs in your eyes?

Yes :D. To my knowledge, this would make a comprehensive basis for internationalization. With a view to the future, it's best to provide for UTF-8 up to 4 bytes, even though currently only up to 3 will be used. Once the protocoll has been defined that way, it's just up to the clients to fill it with life.

Hopefully the cutting edge of clients will support entry and display of characters not in the system default codepage... from past experiences, I'm worried about this :( .

Oki, I added this to the v0.7 proposal.
Thx for you help and Greets, Moak