Programming a Gnutella Client, part II: network core

Moak · #1 (**permalink**) January 30th, 2002

(Version 0.999b, Author: Mark "Moak" Seuffert, Editing: TruStarWarrior)

Programming your own Gnutella servent, after reading protocol specification and tons of documentation [1], usually begins with client/server programming, which quickly raises the question, "Which is a good network concept?". I have collected ideas about Gnutella network design and will describe a so-called 'asynchronous server' design here. Before I begin, let me say that this article is intended for advanced programmers. Please feel free to ask questions, make comment or help improve this article!

Ideas and goals for a Gnutella network core, in no specific order:

- flexible socket handling (listening socket, incoming client socket and outgoing socket)
- buffered data stream (line buffered, block buffered, file streaming and output buffering)
- low memory and CPU consumption (superpeer suitable)
- handle high load and high number of connections
- SOCKS proxy ability
- portability for platforms supporting Berkely sockets (Unix, Windows, etc.)
- reusability of code for other Internet projects (base class is not Gnutella specific)
- highly object orientated (OO)

Basic thoughts
An asynchronous server will notify you when something interesting happens, e.g. about a new incoming connection, when data could be sent, or data could be received on a socket. So you do not have to poll for events and do not have to start a new thread for every incoming client connection. Instead, the server will wait for changes and then notifies you within a single thread. For example, the server calls an OnReceive() function whenever new data arrives or OnSend() when more data could be sent.

I chose an asynchronous server because it is very flexible, fast and without any thread overhead, strong in handling a high number of connections (perfect for superpeers) and easy in synchronization (everything within one thread, not much semaphores). If you are not familiar with socket concepts, the possible alternatives are:
a) blocking servers (handling only one request at a time, then wait again blocking)
b) threaded servers (handling simultaneous requests, one thread for every new socket).
c) asynchronous server (handling simultaneous requests, one server thread will handle all sockets)
The server model always depends on application requirements. For example, the well-known Apache webserver is a pre-threaded server, which is a variation of the threaded server (minimized thread overhead, but more complicate). Apache's concept is great for handling thousands (!) of small requests with multiple CPUs. Gnutella usually has longer, persistent connections and runs on standard PCs, so I recommend an asynchronous server model here.

Well, if you're interested in a Windows only Gnutella servent (eeks, not my idea) you can use MS Windows CAsnycSocket class [2] and derive a class for further improvements. I prefer writing my own class because it will basically be compatible with CAsyncSocket, the design of which I really like after having used it in past projects. Advantages: your own class will be fully portable on all platforms supporting Berkeley sockets (your network core will run on Linux, Windows, Mac, etc)... and then you are "forced" to understand what's going on, grin. However, if you don't want to invent your own socket class, then use existing code, e.g. Trolltech's QT library [5] (Qtella is a nice example servent [6]) or as already mentioned, Microsoft's CAsyncSocket.
Perhaps you want to spend some more time to study basics if you have never programmed Internet applications [3] [4]. In any case, please take a short look into Microsoft's MFC class CAsnycSocket. It gives you a taste of what to do in your own code design, and it's very well documented. Alternatively, check the Mutella source code [7], see file asyncsocket.cpp. Mutella is a sweet Unix console client. Unfortunately, it needs more code documentation, but according to Mutella's author, Max Zaitsev, asking specific questions are always appreciated.
When you're familiar with object orientated programming and 'design patterns' have a look on 'Reactor', it describes the ansychronous server model or event-driven multiplexing we use here [13].

Talking Gnutella
Now let's have a look into the Gnutella protocol [8]. There are two different kind of data streams: 1) line buffered in connection handshaking or HTTP file requests and 2) block buffered in binary Gnutella messages. Let me anticipate there is also a third type, 3) file streaming in file transfers. You might say, "We can read a whole file into a memory block and so it's block buffered too...". But files could be huge, and reading a multi MB file into main memory isn't a good idea. Also, reading blocks of a file is unnecessary (without better reason), since all file streams are already prebuffered from your OS or low-level library. So we come to a third kind of data stream - file streaming... Actually it is just read/write (on memory mapped I/O) when we come to implementation.

Okay, let's take a closer look, since a brief understanding of what happens in a Gnutella servent is important before stepping further. We take the Gnutella v0.4 protocol for simplicity (upcoming v0.6 protocol would make no difference here) and trace a client-server connection: A client will connect to the servers listening port, the server will recognize this and establish a new client socket for further communication. Now the handshaking begins [9].
The client sends:

GNUTELLA CONNECT/0.4<lf><lf>

The server responds with:

GNUTELLA OK<lf><lf>

All lines are terminated with a double line feed.
An HTTP file request is similar. When simplified, it looks like this:

GET /get/1283/gnutti.deb HTTP/1.0<cr><lf>

We call this type of data stream line buffered. A possible implementation: collect every piece of incoming data until you find a <lf> and then return the full input line. Your server will read and evaluate this line by line.
After this handshaking, the connection is established and will stay permanent (as long as no side shuts down). Client and server continue to send the binary Gnutella messages. Now, the interesting part begins! A Gnutella message is called a 'descriptor' and consists of a header plus the payload. The binary data stream looks like this

(header)(payload ....)(header)(payload ....) etc...

The header has a fixed length of 23 bytes, the payload is between 0 - 64K bytes. There are no separators in the Gnutella data stream, so you have to read the payload length from the header. We call this type of data stream block buffered. A possible implementation: read 23 bytes into a buffer, which is for the header. Assign a new buffer from the previously read header payload length and read the payload into this buffer. Your server will read and evaluate data block by data block, and then answer with its own messages (e.g. 'QueryHits' and 'Pongs') or just forward it to other servents. So far, those are the basic network functions. Are you ready to go a step further?

Buffering traffic
As we already know, we have a buffered input, which is either line-by-line or block-by-block. What about the output traffic?
You might think we send the output data as soon as we have it, so just send & forget it, right? Actually this wouldn't work! Gnutella uses TCP/IP for transportation of packets across the Internet. TCP is a stream protocol. This means that if you send 100 bytes, the receiving end could receive all 100 bytes at once, or four 25-byte chunks, or whatever. In any case, the receiving peer has to acknowledge that it's ready to receive more bytes. The other peer might be connected via a slower connection (e.g. modem user) and couldn't receive packets with the same speed as you send them out. Your computer makes sure it does not send out more bytes than the other peer can receive and therefore stop sending more data. The solution for this problem is a buffered output.
A possible implementation is simple: We write every output into a big output buffer (one for every socket), and return immediately. In the background, all the buffered data is transmitted whenever a socket is ready for more data.

The application output buffer in common clients is about 64K per socket, at least big enough to buffer a few messages. What happens if the other peer is too slow to receive the Gnutella data buffer? This shouldn't happen for a long period of time, but it could happen. In this case you need to drop single descriptors (this means that you don't copy them to the output buffer). If the connection is very slow or the other peer doesn't accept any byte for a long time, you should drop the connection! Also, don't forget to catch TCP error messages, if an error occurs (e.g. peer disappears) drop the connection.

Again let's take a closer look what's going on inside a Gnutella servent. Lets assume the connection is already established and a new message arrives, e.g. a Gnutella 'Query' descriptor:

- new data arrives
- header is stored in a buffer: calling new() operator with fixed length of 23 bytes
- payload is stored in a buffer: calling new() operator with variable length
- message is evaluated: TTL decreased, hops increased, generate 'QueryHits'
- message is forwarded: calling memcpy() to copy message in every output buffer of connected servents
- free temporary buffers: calling free() twice
- start from top

Another kind of traffic is the file transfer. It's different from the binary message traffic because it doesn't need a output buffer for intermediate storage; we just send a local file. Let's have a look. A connection is already established and a file will be requested:

- new data arrives
- request is stored in buffer (line buffer): using CString or equivalent string container class.
- request is evaluated: parse HTTP request and search database for local shared files
- send file (direct file streaming with memory mapped I/O) or send error message
- close socket

Okay, so far, we have our basic network design. You could already write a small Gnutella servent. Congratulations! Let's improve our network design and search for possible weak spots....

Advanced thoughts: Reference buffering
Did you notice that nearly all the Gnutella message traffic is copied between buffers (memcpy)? Every routed message is copied from an input buffer into one or more output buffers. This could be a lot of work (higher CPU consumption) especially for a superpeer or Gnutella proxy with a constant traffic.

Is there a way around calling memcpy() repeatedly? Yes, but it's not simple and I'm not sure if it's worth the extra code...But nonetheless, here it is: Instead of copying every routed message into an output buffer, we keep each input buffer and just add a pointer to an output queue. The previously used output buffer (see above) is replaced with an output queue that collects pointers to buffers that has to be sent. Since a message could be sent to multiple queues (each for every connected servent) we keep a reference on how often the input buffer still has to be sent. When the reference reaches zero, the buffer is released.
I think you can imagine the extra code we have to add to our socket class. But with this 'reference buffering', we avoid repeated calls of memcpy() for routing messages and gain a lower CPU consumption! I would be glad to hear any feedback!

There is another weak spot in the message routing. We repeatedly allocate memory and release it. While Gnutella messages have different sizes, we constantly allocate memory blocks of different size. This might cause a heap fragmentation. I expect a normal servent will not care about it, but consider a superpeer that runs for days. It might be worth to improve memory management again.
As a solution, I suggest that a 'buffer memory management' is added. This allocates memory blocks of fixed size. If the user requests a buffer bigger than the defined block size, the memory management allocates multiple blocks and links them together in a list. The memory blocks should be pre-allocated in a dynamic memory pool to avoid calling new() and free() unnecessarily. This pool does allocate memory blocks on startup. More are allocated when required. Also, we need different sizes in our memory pool. At least 23 byte blocks for headers and even bigger blocks (e.g. 1 KB chunks) for the payload. You may want to add another granularity (different buffer block sizes for efficient storage).
Once again, this requires some extra code for our socket class (split it up into a own buffer management class). But we avoid repeated calls of new() + free() and prevent a heap fragmentation. Again, I would be glad to hear any feedback!

Advanced thoughts: Packets and MTU
Newcomers to network programming almost always run into the same problems, expecting data packets to be transferred and received in the same way you sent them out. Let me say this: packets are illusions!

When we talked about 'buffering traffic', we already mentioned the streaming nature of TCP/IP. When sending 100 bytes to another peer, do not expect these to be received from the other peer at once. It might be split up into smaller chunks, but not necessarily. So, for example, when you need to receive 2000 bytes, it means you must collect byte chunks until you've got 2000 bytes. Always collect bytes from a TCP socket no matter how few bytes you expect!
Did you remember the OnReceive() function from the beginning? An asynchronous server will notify you when (more) bytes have been received and could be read from a socket. The function of a buffered socket class is to collect those packets until the requested block size is reached (for block buffered data stream) or until a delimiter is received, e.g. <lf> (for line buffered data streams).
Some socket classes do not provide buffered streams out of the box, e.g. previously mentioned CAsyncSocket. Such a low level class is perfect for further modifications since it's only a raw interface to the sockets. I suggest that you write a class named CAsyncSocketBuffered, which would provide a flexible input and output buffering system for every kind of Internet project.

So far, we've talked about packets in TCP. Now, let's talk about how to use them effectively! Yeah, let's do some hardcore stuff!

Basically, when you send out data on a TCP socket with a call of send(), it will be sent out immediately with a TCP packet. Experienced user say, "Oh, I build on TCP's Nagle algorithm" [4] [10]: The data will be buffered to build a full frame (Nagle algorithm) and be sent as fast as possible (avoiding delayed ACK roundtrips). As result the line is used with full efficiency... So far, this is merely theory!
This isn't true when sending huge amounts of data or an other peer has a rude behaviour, e.g. calling receive() with a small number of bytes. In both cases, the TCP/IP stack can't optimal buffer your data streams anymore. TCP is very sensitive to the sizes of send and receive buffers. In Gnutella, we sometimes want to fire up the line and need to reach maximum transfer speed.
Well, the ideal case in networking is that each program always sends a full frame of data with each call to send(). This maximizes the percentage of useful data in a packet and minimizes data stream delays. The worst case would be when small packets are sent with the transmission of single-character messages. These result in 41 byte packets (one byte of data, 40 bytes of TCP/IP header).
Now, what is a full frame size? It's the MTU (Maximum Transmission Unit) of your network interface. The MTU is the largest packet that a given network medium can carry. Ethernet, for example, has a MTU of about 1500 bytes. The current size always depends on your OS and transportation interface. However, when a frame is translated from one type of network to another by a gateway or a router, a frame can be fragmented if the destination network type has a smaller MTU. For gaining maximum speed, we don't care about fragmentation, since your router or ISP will handle it for you and we just try our best!
When we speak about sending full frames we mean the exact available size. Query your interface MTU and subtract 40 bytes for the TCP/IP header, you get the so called MSS "Maximum Segment Size" (for TCP it's MTU - 40 bytes).
Now, what about sending data as fast as possible? From what we discussed until now we send byte chunks of MMS size and so achieve sending full frames. Pretty good, there is still a small problem left. TCP does do some kind of synchronization between frames, a peer does send a so called "ACK" (acknowledge) to sign it receives the last package: Peer A sending package, peer B receiving package, B sending ACK to A. To avoid long ACK roundtrips (delays between frames), we don't wait for ACKs and instead feed the TCP/IP stack with more data. A size of 3*MMS promises best performance and works perfectly together with Nagle (details discussed by Comer and Lin [12]).

A possible implementation: First, query your interface MTU and calculate the MSS (MSS = MTU-40). Set your low level TCP/IP send buffer to at least 3 times as large as the MSS (call setsockopt(s, SOL_SOCKET, SO_SNDBUF, ...) before bind/listen). Second, read full MSS chunks with a call of receive(), write full 3 * MSS chunks with a call of send(), whenever possible. Both points are important!
All application buffers (those to store line-buffered, block-buffered traffic or gnutella messages) will be allocated in a MSS orientated size, here is a formula: size = n * (MTU-40). If you have problems querying the correct MTU size ('ifconfig' on Unix, strange registry setting for Windows), then choose a INI-file solution. Finding the correct MTU is indeed difficult, because you also have to know the corresponding device of the local internet connection... usually this is the default route ('route' on Unix/Windows), but you can't count on that. An INI-file presetting of MTU = 1500 could be an acceptable solution.
What about our socket class with 'reference buffering' (see paragraph above)? Here we have small blocks, 23 bytes for headers, and bigger blocks for the payload. Again, send full frames whenever possible. Choose a MSS orientated block size for the payload chunks. Quite easy. You might think about concenating smaller packets, but I recommend that you send them directly since it occurs seldom. A typical Gnutella traffic consists of:
- small messages ('Ping' 23 bytes, 'Pong' 37 bytes), low percentage in future
- big messages ('Query' and 'QueryHit', with more metadata and grouped queries > MSS)
- file transfer and upcoming file tunneling (many MSS chunks)

Finally, we gained a high TCP performance with a high data flow, low CPU and low memory consumption. Feel free to improve your code and play around with alternatives! A great source for advanced TCP/IP programming are the books "Unix Network Programming, Stevens" [14] and "Effective TCP/IP Programming, Snader" [15]. Highly recommended on every network programmers desk.

Advanced thoughts: Firewalls
Firewalls become more common, they provide a higher security against possible attacks and block unwanted connections. From the view of a socket class, there are only three possibilities:

a) traffic goes through
b) traffic is blocked
c) traffic is blocked but a proxy is available.

We ignore the first two cases since we can't do anything against it. We need to handle the third case, which occurs when a proxy is available and we can't connect directly. The function of a proxy is to take traffic or requests and route them to the destination. An additional authorization handshake might be required to use a proxy. Common proxy software are SOCKS4 and SOCKS5 [11].
Adding proxy functionality to your socket class isn't very complicated. Unfortunately, we haven't the time to describe those details here. Simplified, you would route traffic via the proxy and add a transparent layer for authorization handshaking. Transparent means you don't have to change your existing code, only your socket class. Various sample source code and libraries are available on Internet. Ask your favorite search engine.

From a more general point of view, firewalls are much more complex. Consider Gnutella's PUSH descriptor when it comes to sharing files with "firewalled" peers (here, firewalled means that the peer can't handle incoming connections). Well, this is a subject for an entirely separate article.

I hope you had fun thinking about the insides of Gnutella! The idea for this article was to promote new developers and help them to code a own Gnutella client or understand existing ones. Before you ask, I will say this: Sorry, I don't have a socket class finished. I will let you know as soon as I find time to do so. If you're interested to cooperate, let me know. Your feedback is appreciated!

[1] Programming a Gnutella Client - http://www.gnutellaforums.com/showth...&threadid=4638
[2] MSDN Class CAsnycSocket - http://msdn.microsoft.com/library/en...syncsocket.asp
[3] TCP Stream Socket Application - http://msdn.microsoft.com/library/en...winsock_14.asp
[4] Winsock Programmer's FAQ - http://tangentsoft.net/wskfaq/index.html
[5] Trolltech QT Network Class - http://doc.trolltech.com/3.0/network.html
[6] Qtella Source Code - http://www.gnutelladev.com/source/qtella.html
[7] Mutella Source Code - http://www.gnutelladev.com/source/mutella.html
[8] Gnutella v0.4 Protocol - http://rfc-gnutella.sf.net/Developme...0_4-rev1_2.pdf
[9] Gnutella Handshaking Overview - http://www.gnucleus.com/research/connect.html
[10] Congestion Control in IP/TCP - http://www.faqs.org/rfcs/rfc896.html
[11] SOCKS Proxy - http://www.socks.nec.com/socksprot.html
[12] TCP Buffering and Performance, Comer and Lin, http://citeseer.nj.nec.com/comer95tcp.html
[13] Design Pattern 'Reactor' - http://www.cs.wustl.edu/~schmidt/patterns-ace.html
[14] Unix Network Programming, Volume 1, Second Edition, Stevens, ISBN 0-13-490012-X
[15] Effective TCP/IP Programming, Snader, ISBN 0-201-61589-4

Thx to TruStar, Tama, Tremix, Maksik, Morgwen

maksik · #2 (**permalink**) February 7th, 2002

Hi!

my comments are not going to be particularly organized, so the order has nothing to do with the priorities or importance and more related to the order of appearance in the original article.

1. Threading model of the client: Unfortunately the article does not address the issue in proper detail, whereas other design decisions may critically depend on the particular threading model selected. The two extreme models are sort of covered, one would 0ne-thread-per-socket and the other one basically presume that the only one thread exists. I personally find later one to be more attractive for multiple reasons like ease of debugging and so on. On the other hand gnutella client has to do some other things, not only receive and route packets, but also download and upload files and match queries against the list of shared files. For example the last task can be rather demanding if one shares the substantial amount of files, say few thousands. In combination with the high-bandwidth connection this may require substantial load of CPU and thus will cause lags with packet routing if the query-matching is performed in the same thread as the client's network activities. Secondly, simple memory mapped IO when uploading files will cause lags again when sharing big number of files from the slow devices like cdroms and NFS-partitions. Finally, the file write requests can also cause lags when NFS is used, which is likely to be the case if one uses gnutella in the office.

2. On the reference counting for the packets. Well, again, lets discuss the alternatives: the only reasonable alternative is that the class, responsible for the connection with the other gnutella peers (lets call it CNode) will have to allocate a receive buffer and take responsibility for its removal. When packet gets processed the pointer to this buffer gets passed down the call stack until it reaches Send call in the case if the packet is to be routed or broadcast. Because we are using non-blocking sockets, at this moment we will have to call memcpy. But it's only one call for routed packets, and in case of broadcasts - we are not broadcasting big packets anyway. Now compare this to the overhead of implementing reference-counting plus custom memory management... I don't know which approach wins...

3. On the proxy-server compatibility: just a short comment. Do you really think its' frequently needed? I've just came across semi-transparent wrapper around MFC's CasyncSocket, supporting SOCKS4,5 and HTTP proxy. I am thinking whether to integrate this into the Mutella's code.

That's it by now,
--Max

Moak · #3 (**permalink**) February 20th, 2002

Hi,

I found a conceptional weakness in CAsyncSocket (I think I did).

There is no method for a friendly TCP disconnect. Close() is too rude and more a "ForceDisconnect". A friendly disconnect (wait until data send, call shutdown, receive all remaining data, then close the socket) does take time. To fit to the asynchronous nature of the methods (remember we do not work in blocking mode usually), the background socket thread (call it a Heartbeat) has to perform such a disconnect IMHO.
I think it's necesarry to add a new Disconnect() method, together with a new handler OnDisconnect(). Short explanation: The method Disconnect() starts a friendly TCP disconnect (see above), after socket closing it calls OnDisconnect() which signs that the socket is sucessfully closed. The normal method Close() can anytime force a disconnect and the socket is closed immediately.
For the existing MS Windows class this means a reimplementation or a workaround with a new workerthread started from Disconnect() and stopped on Close(). For the existing CAsyncSocket clones (e.g. Mutella) this should not be much work to add.

Speaking about Gnutella clients. Mutella seems to like the hard disconnect, just closing the socket, not performing any TCP lowlevel closing (avoiding any timing problematics). Gnucleus seems to like a kind of friendly disconnect, but it might happen a problem on server shutdown... sockets objects become deleted while they still perform a socket closing (possible crash in Gnucleus?) or they block the server until they go down.

Some thoughts, Moak

PS: Such an OnDisconnect() can also sign that the socket is completly closed + completely removed from internal handling. Perhaps a efficient way to let a dynamic socket object remove itself from heap? Let a dynamic object remove itsef makes a timer function collecting and destroying dead sockets obsolete. I think Max and Swabby know what I mean.

Moak · #4 (**permalink**) February 20th, 2002

Hi Max!

About 1, see above in my socket concepts overview (first paragraph). I decided for: one socket thread will handle all sockets (server and clients). It's very similar to your CAsyncSocket Unix port. So the socket class has one independent worker thread that handles socket notifcations and calling handlers, or optional via external triggered Heartbeat().

About 2, yeah, it's more theory yet. I thought about superpeers, proxies and Guerillia clients (tunneling and broadcasting a high amount of traffic). Well, we never find out without testing. :-)

About 3, no, I think it's not needed by the mass of ppl using Gnutella, but it's not a lot of work if you allready use CAsyncSocket - codeproject.com has a fine proxy wrapper class for it, should work with your port too.
I think much more needed is still a gnutella proxy to increase servents which accept "incoming connection".

Greets, Moak

maksik · #5 (**permalink**) February 21st, 2002

Yes, it seems that "OnDisconnect" issue and the last message from you, Moak, in the Mutella forum are somehow related. Am I right?

But in general, I must admit, RequestDisconnect() and OnDisconnect() pair is really missing bit in MFS's CAsyncSocket. It's great that you could spot it out so clearly. I faced something like this while ago, but could never actually realise the problem.

--Max

P.S.: to be continued when I have more time

Moak · #6 (**permalink**) February 21st, 2002

>Yes, it seems that "OnDisconnect" issue and the last message from you, Moak,
>in the Mutella forum are somehow related. Am I right?

Yep, I'm currently working on this issue.

I would like to avoid a timerbased CleanupDeadSockets() operation if possible, since it adds an extra complexity to the code and the class usage.
However, not really. Since you always have to use a timer for socket timeouts (e.g. kicking idle clients, avoid DoS flooding sockets etc), so it's easy to remove dead sockets there. Also I see this timerbased solution works well: Mutella, Gnucleus and also PEERanha are using it (most time) to clean up dead socket objects. Hm, I try to restructure my class and test different tactics, eg. what you have done in Mutella MGnuDirector::RemoveNode().

Dead socket objects = disconnected socket objects, which the listening server socket has created for accepting new client connections, and now are already disconnected but remain dead/unused on the heap. Just use CAsyncsocket, and you'll see what I mean (at least when you have memory leaks). *g*

Greets & Thx, Moak

Moak · #7 (**permalink**) February 21st, 2002

Hi again, what I also like to improve on CAsyncSocket:

* You have no GetStatus() to see what the socket is currently doing, if it's created, unconnected, connected, dead..etc. The handle (m_hSocket) doesn't tell you, but in nearly every derived asynchronous socket you need such a status. Also Close() does not call OnClose() which is a little bit anoying... so I always always always add something like this:

Code:

int CMySocket::GetStatus() { return m_nStatus; }

void CMySocket::OnClose(int nErrorCode) 
{
	if(m_nStatus == SOCKET_DEAD) return;

	//Add more code here

	CAsyncSocket::OnClose(nErrorCode);	
	m_nStatus = SOCKET_DEAD;
}

void CMySocket::Close()
{
	if (m_hSocket == INVALID_SOCKET) return;

	OnClose(0);
	CAsyncSocket::Close();
}

Note: This code snippet is just a demonstratation. It's more complex in reality.

* The CAsyncSocket documentation does not mention that you have to override the destructor with something usefull... well maybe it's a C++ pitfall: never expect that a destructor calls a virtual method virtual. In this case it means, the destructor of any CAsynSocket derived class has to call Close();

So two more things I will add in my class....... Moak

PS: An idea from RAM (gtk-gnutella developer), met him on #gnutelladev today: An additional big input buffer on application level. The purpose of such a buffer is to eliminate latencies, those which happens when the TCP/IP stack "ACKs" it's ready for more frames. To avoid latency between starting the data stream again and the first arrival of new frames, the TCP/IP stack requests new data while you still have something in the input buffer (working as a prebuffer cache). I like this idea very much!
Btw GTK uses a kind of reference buffering I mentioned in the first post (see 'Advanced thoughts: Reference buffering' or gtk-gnutella source code).

maksik · #8 (**permalink**) February 24th, 2002

I was having a discussion with one of the guys who decided to contribute to the development of mutella on the subject. I just paste a quote here to save myself typing...

Quote:

>> Anything that results in a block of your heartbeat thread during its
>> operations will result in a lock on all activities pending during that
>> poll() call.
>>
> Care should be taken to omit blocking calls. There is none at the moment,
> or nearly so.
>
Any execution of a gethostbyname() call can result in a nameserver lookup
which, if delayed, will cause problems. That's used heavily at the moment,
and while this can't necessarily be avoided, it does need to be moved out of
the critical path; Darwin has some latency problems in specific NAT
environments and DNS proxies at the moment with that, and I can imagine that
there would be ways for it to cause problems for others as well.

So, what to do with "gethostbyname"? It may REALLY block for few seconds if not more. Cashing results of the name resolution will help somehow, but not entirely... because every now and again it might be needed to look up something completely new... Well, I can hardly think of anything other than host-servers, and in this case caching will help, but still, this does not seem to be elegant.

Secondly, what about matching the list of local shared files against the incoming queries? This task needs to be done in the backgroung, which implies a worker-thread needs to be used. Am I right?

--Max

Moak · #9 (**permalink**) February 25th, 2002

Good point. I found also no real solution yet, I use a IP-only policy in my inner network core. Only the (user) interface deals with hostnames, e.g when a) connecting to new servents and host caches or b) when displaying connected hosts. At the moment I see no way arround another worker thread which the single purpose of resolving hostnames to IPs.
Perhaps this is again a topic for design review of CAsyncSocket. The problem is especially Connect() which does a 'lphost = gethostbyname(lpszAscii)', which can block for seconds or minutes when calling with a host name.

Greets, Moak

maksik · #10 (**permalink**) March 3rd, 2002

Hi,

it's not exactly on the topic, but still rather close to what was discussed few messages above.

The question is about the way to define that the socket clears its send buffer and thus is ready to be closed. Example: in OnSend we send a large chunk of data (say 16K) and fill the send buffer substantially. If the other side is as slow as 144 modem it will take substantial time to actually transmit that amount of data. On the other hand, my feeling is that close() not just "schedules" close for the non-blocking socket, but actually closes it discarding the current send buffer. Is there any way to define that the send buffer is empty before we close the socket? Wait for the next "OnSend"? Or will it simply mean that there is still free space in the send buffer, but will not guarantee that the buffer is empty? I am a bit lost. The only solution I can think of would use "CloseTimeout", but this doesn't look elegant.

--Max

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
What posts belong in this General Gnutella / Gnutella Network Discussion section!	Lord of the Rings	General Gnutella / Gnutella Network Discussion	0	November 17th, 2005 06:54 AM
Gnutella core TCP blocking issue?	romsla	General Linux Support	0	December 13th, 2003 04:00 AM
Programming a Gnutella client, part I	Moak	General Gnutella Development Discussion	1	April 3rd, 2002 11:36 PM
Best network client?	PhatPhreddy	General Gnutella / Gnutella Network Discussion	2	March 15th, 2002 11:02 AM
Need Gnutella client w/ ip filter, private network	Unregistered	General Discussion	1	December 3rd, 2001 10:20 PM