Subject |
Re: [GLIF controlplane] RE: Network Control Architecture |
From |
Gigi Karmous-Edwards <gigi@xxxxxxxx> |
Date |
Sun, 06 May 2007 08:17:44 -0400 |
Hi Jerry and All,
Ok Jerry, I stuck with you on your insightful email ( I started your
email a couple of weeks ago and just finished it this morning :-) ). If
I can summarize your assertions : When an interdomain lightpath is
requested, the resource broker (RB) (which is a servant of a user
rather than a domain) talks only to the first Domain's NRM (network
resource manager) and then that NRM talks to the second NRM, and so on
till the destination. This requires each domain to have established some
sort of agreement with all adjacent domains. In your second scenario it
seems the user requested a source RM that is not in the RB's "domain"
and that the RB will have to forward it to the right RM, then a repeat
of the above process.
I think what you described is the ultimate goal of the community,
however, due to complexities of the current infrastructures (NRENs,
Research testbeds, Global government networks, etc) that require
interoperation, it seems that we first need to take small "baby steps".
Existing infrastructures include a variety of technologies, different
management (TL1, SNMP, CLI, etc.) and control plane (very few
deployments of GMPLS) tools for configuration and fault management, also
current procedures for information exchange between network domains
range from protocols to phone calls/emails. These complexities and other
"policy" related challenges force us to break the problem up into
smaller functional blocks. I think the framework presented will give us
a path forward based on "baby steps" to finally reach the scenario you
describe.
I see the problem as having three key challenges:
1) Information dissemination (where is what resource? what are its
characteristics? what are its policies for use?)
2) Capability to request reservations on resources globally once
discovered ( standard interfaces to query resource managers, with "NO"
restrictions on how each resource manager accommodates each request,
reuse of existing implementations)
3) Scalability ( division of labor among functional components and
responsibilities per domain)
The assumption in the framework sent out has been that an RB takes
requests from a particular domain's user/application but behaves as a
servant of the domain not a single user. In this case there will be
several RBs worldwide, but not one for each user, rather one or two per
domain. It is assumed that the knowledge of the different resources
globally will be published per domain in a very distributed fashion
(each RB will publish the resources and their characteristics hopefully
using the schema from the OGF Network Markup Language working group. A
query from one RB to the "distributed GLIF resources" will use a type of
crawl mechanism to match the requested resources with the "published"
resource information that each domain RB publishes on behalf of its
RMs. The assumption is, the information published by the RBs is not
static and will be updated by each RB when necessary. This email is
already getting too long, I suggest that we have a conference call and
use a WEB based slide sharing application to go through some scenarios.
Any interest?
To summarize, the strategy in your email will be the goal of the
community but it will take a while. I think, as a community we can start
to develop standard interfaces for the various RMs such as the Generic
Network Interface (GNI), this will help us towards interoperability in
today's environment.
Please let me know if we should have a GLIF control plane conference
call in the next few weeks?
Kind regards,
Gigi
--------------------------------------------
Gigi Karmous-Edwards
Principal Scientist
Advanced Technology Group
http://www.mcnc.org
MCNC
RTP, NC, USA
+1 919-248 -4121
gigi@xxxxxxxx
--------------------------------------------
Jerry Sobieski wrote:
Good comments both Steve and Bert...let me chime in: (this is a
bit long, but I think it is relevant)
I too think the reservation phase in each domain must be atomic -
there are effective ways to do this. The overall process though
becomes two phase: HOLD a resource for some finite holding time and
provide an ACK to the requestor. At some later time the RM will
receive a CONFIRM from the requester, or a RELEASE. If the hold time
expires, the resource is released unilaterally. On a macro basis,
the reservation of the entire end-to-end lightpath must also be held
in the HOLD state while the rest of the application resources are
reserved as there may be a dependency between availability of
non-network resources and the reserved lightpath.
As Steve suggests, this atomic two phase mechanism is used in many
other similar reservations systems.
The issue I am concerned about is the roles of the RB and RM. I think
the RBs will be numerous - possibly one for every user. I believe we
must assume that all networks will default to a stringent "self
secure" stance and will only allow access to its RM from known and
trusted peers. It doesn't scale for every network to "know" about
every other RB in the world (RBs are agents of the user - not of the
network) Therefore, for scalability and security reasons, these
resource reservation requests must be made between directly peering
networks, and each network is responsible for recursively reserving
the resources forward toward the destination. This is still a two
stage commit as described above but it solves two problems: a) it
scales much better as each network only needs to expect queries from
its direct peers (and customers) and b) it allows each network to
negotiate aggregation policies with its peers for services (enabling
economies of scale and global reach). This is not unlike how we
place a phone call to anywhere in the world - we don't go asking each
network if we can use it, we ask our service provider to do so, they
ask theirs, and so on, and so on,...
The above scenario assumes the RB poses the service request to the RM
serving the source end of a path. There is a [common?] case where
the RB is not at the endpoint(s) and does not know of any RMs at the
endpoint (or in the middle for that matter). This brings us to
another assumption I think we must make: a RB only knows its *local*
network RM. An appropriately designed algorithm should/could forward
the request to the source address RM using the same forwarding process
as the reservation (but crossgrain toward toward the source), and then
the request can be serviced forward normally as described above. (This
is the "third party" provisioning scenario.) An alternative model
asumes a "minion" agent at the path endpoints that is owned by the end
user and knows of its local RM- the minion agent acts as proxy for the
RB and makes the reservation request to the minion's RM. (got
that?:-) I think we *can* assume that the RB knows of these minions
since they reside at the end points (source or destination) at a well
known port.
It is important to note that this process relies on each network RM
(not the RB) knowing constrained reachability of all endpoints - not
unlike current interdomain routing protocols. This allows the RM to
postulate which "nexthop" network will provide the best path and try
that first. If the RM knows more than just reachability - i.e. if it
knows topology, then the RM can select a more specific candidate path
and, via authorized recursive querires, can reserve the resource.
Only the RM responsible for a network knows the state and availability
details associated with the internal network resources, and therefore
only the local RM can authoritatively and atomically reserve the
resources in that network.
The beauty of this process is that from the RB perspective, the RB
need only ask one RM for the entire end-to-end network path. The RM
will either return a ticket indicating a path was successfully
reserved that meets the requested service characteristics, or a NACK
indicating that the resource was not available for some reason. The
user must change the requested services parameters somehow before
trying again - i.e. change the source or destination addr, the start
time, the capacity, etc.)
As Gigi states, once all application resources are reserved in the
HOLD state, then all must be CONFIRM'ed which will lock in the
reservation.
At some delta-t later (which could be 0) there is a separate process
that causes the reconfiguration of the network elements to make the
reserved resources available for actual use (i.e. the provisioning or
signaling process). This process must be correlated to a previous
reservation and so the provisioning request (separate from the
reservation request) must contain some indicator that is trusted by
the network and indicates which reservation is being placed into
service (see Leon's work on AAA)
Note that none of the above is predicated on any particular routing or
signaling protocol... That being said (:-), DRAGON has implemented
much of this functionality using GMPLS protocols. -The DRAGON
Network Aware Resource Broker (NARB) is analogous to the network RM
and performs the path computation recursively reserving the resources
along the way.. It returns a path reservation in the form of an
Explicit Route Object (ERO) to the source requestor. This loose hop
ERO specifies a path consisting of ingress and egress points at each
network boundary. -RSVP then uses this ERO to provision the
multi-domain end-to-end path. -The DRAGON Application Specific
Topology "Master" is an agent analogous to the RB mentioned above.
AST Master queries all the various resource managers (compute nodes,
storage, instruments, network, etc) to reserve groups of dependent
resources. There is a significant protocol exchange defined for ASTs
to construct a workable physical resource grid for the application.
What DRAGON has not yet implemented: We have implemented scheduling
and policy constraints in the traffic engineering database, but we
have not yet implemented the path computation to use those constraints
(this will be coming soon).
We have atomic reservations, but have not implemented the two phase
commit - though we have long recognized it as critical to the
bookahead capability and a robust integrated resource scheduling process.
Thanks for sticking with me on this ...:-)
Jerry