Forum: >>> Magnum BBS <<<

[tao-bugs] Stale connections with BiDirGIOP

From Milan Cvetkovic@21:1/5 to tao-users on Sat Oct 17 21:11:55 2015

Copy: tao-bugs@list.isis.vanderbilt.edu

OK, answering to my own question to some extent...

I narrowed the problem to Transport_Cache_Manager_T, and its use of Cache_ExtId::index_.

First, how I see it working:
============================

Transport_Cache_Manager_T uses ACE_Hash_Map_Manager to keep a mapping
between Cache_ExtId and Cache_IntId. In case of IIOP (in my case it
really was SSLIOP, but I doubt there is a difference there), Cache_ExtId represents IP-ADDR:PORT/index triple for a connection. IP-ADDR:PORT is
the address that Transport connects to, and index is used to allow
multiple connections to same ip/port address. All three values (address,
port, index) are used to calculate hash when stored to ACE_Hash_Map_Manager

When a new Transport is created, it is registered with cache manager,
and it would create an entry using ip:port:index(0). When another
transport is needed again, Transport_Cache_Manager_T::find_i looks up
for an existing connection, and uses it if it is found and idle.

The problem:
============
Transport_Cache_Manager_T::find_i assumes that indexes of existing
connections are all consecutive numbers starting with 0. It will try to
lookup Transport with index=1 *only* if index=0 entry for the same
IP:port exists, and if it is busy. If IP:port:index=0 entry is
previously purged from the cache, Transport_Cache_Manager_T::find_i will
never try to use index=1 (or any other index in the cache).

This scenario is exactly what happens with BiDirGIOP when client
disappears from the network, and later reconnects( and re-registers
callback with same IP:PORT) value:
- server caches first callback with IP:addr:index=0
- client reconnects/re-registers
- server caches the second callback with IP:addr:index=1
- eventually, server cleans up cache entry with IP:addr:index=0
- but it is never able to access the entry with IP:addr:index=1

I am not too sure on the impact on regular TAO clients, since I didnt
try it, but I would assume that:
- if index=0 entry is busy, second transport is created
- if index=0 entry's transport is closed, index=1
entry is purged from cache, and index=1 entry is no
longer reachable, until index=0 entry for the same IP:PORT is created.

Potential solutions:
====================
- I could fix Transport_Cache_Manager_T::unbind_i so it made sure
that the assumption made in find_i is true: If cache has M elements,
when removing an entry at index=N (where N is in [0,M), all remaining
entries for same IP:addr should have consecutive indexes
in range [0,M-1).
- Alternatively, Transport_Cache_Manager_T can be rewritten
to actually use multi-hashmap. The existing implementation with
hash-map and indexes seems inappropriate and sub-optimal.
Or there is a good reason not to use multi-hash-map, that I am not
aware of...
It seems that this would touch more files in TAO though.

I would like to contribute this patch. I would appreciate if someone
could advise me, which direction should I take.

Thanks, Milan.

Milan Cvetkovic wrote:

TAO VERSION: 2.2.1
ACE VERSION: 6.2.1

HOST MACHINE and OPERATING SYSTEM: Debian wheezy on x86_64

THE $ACE_ROOT/ace/config.h FILE: config-linux.h

THE $ACE_ROOT/include/makeinclude/platform_macros.GNU FILE:
c++11 = 1
ssl = 1
include ${ACE_ROOT}/include/makeinclude/platform_linux.GNU

AREA/CLASS/EXAMPLE AFFECTED:
BiDirGIOP / Transport_Cache_Manager_T / SSLIOP
DOES THE PROBLEM AFFECT:
EXECUTION: YES

SYNOPSIS: After loss of network connection from a client, server is
no longer able to invoke callback RPCs, even after client reconnected,
and resubmitted its callback IOR.

DESCRIPTION:

I have BiDirGIOP setup over SSLIOP. Client is behind firewall router on 192.168.12.x network. Client incarnates callback object, listening on 192.168.12.113:7770 and port 7771 for ss. Client contacts the server
over the internet, and it sends the IOR to callback object above. Server later uses callback object to send various notifications. This setup
utilizes bidirectional GIOP, over SSLIOP.

Everything works as desired, until client loses connectivity to server.
When client re-registers, server adds the new Transport to Transport
cache manager, however in some scenarios it does not remove the old transport, and keeps using it for callbacks, failing on CORBA::TIMEOUT

My understanding is that Transport_Cache_Manager keeps the hash map
table of all connections. These connections have the same key, being
issued from the same IP:port every time (in the example above, 192.168.12.113:7771). In some cases, the server does not replace the
existing transport entry, but adds it with an increased index, and keeps using index:0 for making callbacks.

I am attaching the portions of TAO logs. Note that second registration
binds with index :1. The stale transport is kept with index :0.

How do I control the content of Transport_Cache_Manager_T. I removed the references to callback objects from server, however the transport is
still cached.

Thanks, Milan.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	295
Nodes:	16 (2 / 14)
Uptime:	20:53:20
Calls:	6,640
Files:	12,188
Messages:	5,325,292

[tao-bugs] Stale connections with BiDirGIOP

Who's Online

System Info