At 9:28 on Tuesday 12/6/2011, the T11 FC-GS-7 working group approved a motion to incorporate the text for “Peer Zoning” that was prepared by Claudio DeSanti (Cisco). The actual text can be found in the T11 document 11-411v2. To me this moment marked the culmination of a four year journey to take a concept, get it through a standards body and into a standards document.
Target Driven Zoning (TDZ) is a proposed application that utilizes Peer Zoning to reduce the number of steps required to provision new storage by 50%. Since many customers have been complaining about the task of zoning for years, I’m proposing to achieve this reduction by eliminating the manual task of FC zoning.
Just to be clear, while I view introduction of the Peer Zoning functionality as a game changer for FC, I don’t view the completion of this effort as some kind of extraordinary accomplishment, actually there’s nothing extraordinary about it. These sorts of efforts are started all the time in standards bodies. Sometimes they result in useful protocols that are implemented by many (e.g., FCoE and FIP). Other results are implemented by few but they provide a critical requirement for the industry (e.g., FC-SP) and yet others turn out to be interesting academic exercises but are never implemented by anyone (e.g., FC-SCM). I have hopes that Peer Zoning will fall into the category of “useful technology that is implemented (and deployed) by many” but it’s too early to tell as this point.
This post is intended to give you an overview of the technology and give you enough information to decide if TDZ is something that you’d like to use in your environment. I also feel it’s important to give credit where it’s due and with that in mind without the help of Mark Lippitt, David Black, Claudio DeSanti, Bob Nixon and Ralph Weber; this entire effort would still be hopelessly stalled.
The story starts back in mid-2007. I was on a conference call with David Black and Mark Lippitt (both with EMC) discussing what would happen if zoning wasn’t used in an FC SAN (as was being proposed in the T11 FC-SCM working group). In case you don’t know, EMC has a fairly strict best practice called “Single Initiator Zoning” that was created based on lab testing results. As the name suggests, the best practice states “zones should only contain a single initiator as well as the storage targets it needs to access.” Because of the work that I had done with fabric scalability, I was pulled into the meeting (somewhat last minute if memory serves) and asked to provide my opinion on what was being proposed. Now, as most people who work with me will tell you, I typically call things as I see them and the intensity of my voice will vary with the level of conviction that I feel about a given topic. As a result, my reaction to eliminating zoning was something along the lines of “ARE YOU <bleep> KIDDING ME?” and I proceeded to provide specific reasons why I thought that this was the *SILLIEST* idea I had ever heard of. Perhaps it was the intensity of my reaction (or the simple fact that David simply didn’t have sufficient bandwidth to create a presentation that contained all of my points), in either case, he eventually asked me to present my list of concerns to the T11 FC-SCM working group at the August 2007 meeting in Seattle.
Preparing for Seattle
In case you aren’t familiar with T11, this is the group that literally wrote the book on FC. In fact, a better way to say it would be they ARE the book. From my point of view as an integration engineer who “grew up” with FC, the idea of merely attending (let alone presenting at) this meeting felt something akin to traveling to Mt Olympus and meeting Zeus and the gang. As a result, I spent much of the month before the meeting preparing by testing specific “unzoned” configurations in the lab, re-reading the various standards involved (FC-GS and FC-SW), iterating on the presentation with Mark Lippitt and working myself up into an emotional wreck in general. As luck would have it, the FC-SCM meeting was first on the schedule for the week, so off I went to present at my first T11 meeting without having actually ever attended one.
The FC-SCM meeting
During the FC-SCM meeting I presented “FC-SCM-OpenZoning”. The main point of the presentation was due to the way Initiators and Targets perform discovery, you can’t simply eliminate zoning. If you did, the resulting flood of name server queries every time something changed in the fabric would bring the fabric to its knees with as few as 255 N_Ports in the fabric. Consider the case where an N_Port is bouncing (logging in an out) and you are totally screwed. After speaking to my last slide and answering a couple of questions, I returned to my seat feeling like the weight of the world had been lifted off of my shoulders. I think Mark Lippitt and David Black said something to the effect of “Nice job” and my response was going to be “Boy, am I glad that’s over with” but before I could even get the words out of my mouth, Claudio turned around and said something to the effect of “Ok, so you have shown us a problem, now what? Would you be willing to come back and present a solution?” I literally couldn’t believe my ears, I was like “are you talking to me?” After briefly consulting with Mark, I said yes and committed to provide a proposal at the October 2007 T11 meeting in Coeur d’Alene, ID. The purpose of that proposal would be to describe an approach that would eliminate the manual task of zoning.
After the FC-SCM meeting Mark, David and I went out to lunch. On the walk down to Pike’s Place Market I was walking behind them while they were talking about the concept of OLZ or “One Large Zone” (what would eventually become FC-SCM). I found OLZ interesting and would dedicate many hours over the next three years pushing it for it to be adopted, but I wanted to pursue a different approach. I believed (and still do) that the key to a solution was to solve the RSCN problem. Basically, the RSCN problem happens in an environment without zoning when a single device logs in or out. When the device’s status changes an RSCN is sent to all devices that are impacted by the change. Since all devices register for RSCN in a fabric without zoning, every device would receive an RSCN and would pile on the Name Server at the same time and basically cause a DoS attack. With this concern in mind my thinking process went something along the lines of:
What if somehow we could send RSCNs only to end devices that are actually impacted by the change (i.e., hosts that are logged into that target and vice versa)?
- Where would this information come from?
Maybe a storage port could tell the switch to only send RSCNs to end devices that are logged into it?
- You would probably identify these devices to the switch by their WWPN
But wait a second, our storage ports have a list of WWPNs that they care about (have been granted access) in the LUN masking database
- If a storage ports could somehow send these WWPNs to the switch…
- A switch could use this information to restrict RSCN distribution
But…switches already use the WWPNs in the definition of a zone
Why couldn’t the storage port just share the WWPNs in the LUN Masking database with the switch and then the switch could just create the zones based on this information???
The rest of the story
Describing all the ups and downs of the story goes way beyond the scope of this post, but a couple of excerpts from a cliff notes version would certainly mention; Claudio agrees to write the text at the August 2011 meeting in Edmonton, delivers the goods in Albuquerque and it all gets approved in St. Petersburg. Thanks again Claudio!
Target Driven Zoning technical details
In this section I describe how I envision the Peer Zoning functionality Claudio defined in 11-411v2 will be utilized by an application that I am calling Target Driven Zoning or TDZ.
The Target Driven Zoning concept is simple, since a storage port has all of the information required to create an FC zone, it should just go ahead and create one automatically.
Before I dive into the protocol level details that will explain how TDZ works, let me take step back and define the Storage Provisioning process as it is done today without TDZ and then show how it would work with TDZ.
- Physically attach hosts and storage to the fabric.
- Using the switch zoning interface, create a zone that contains an initiator and its targets. Repeat for every initiator/target relationship being added to the environment.
- Activate the new Zone Set / Configuration.
- Using the Storage array management interface, add each initiator to a “Storage Group” on the array.
- Physically attach hosts and storage to the fabric.
- Using the Storage array management interface, add each initiator to a “Storage Group” on the array. The storage will then automatically setup the appropriate zones to allow each initiator to access it.
The nuts and bolts of Peer Zoning / TDZ
For the remainder of this section, I’ll be referring to the following configuration. Starting from the left, the configuration consists of:
- A Host containing a single HBA/CNA port that has a WWPN (World Wide Port Name) of 10:00:00:00:00:00:00:00.
- A FC Fabric containing two FC switches:
- FC/FCoE Switch Domain 3; and
- FC/FCoE Switch Domain 4
- A Storage array containing a port that has a WWPN of 50:00:09:71:20:30:40:50
Step 1 – The Target queries the Unzoned Name Server
Due to an assumption on our part of “when creating a Storage Group, users would rather pick WWPNs from a list rather than manually type them in”; TDZ suffers from a causality dilemma that is solved by utilizing the unzoned name server.
The dilemma is:
How do you pick a host WWPN from a list in the array’s management software when none of the interfaces on that array have been zoned to have access to that host WWPN?
As I describe in the Hard zoning versus Soft zoning blog post, the response from the switch to a Name Server query will usually only contain those devices that share a common zone with the querying N_Port. When TDZ is being used, there is a very good chance that the storage has not been zoned to have access to anything. As a result, it will not be possible to provide a list of initiator WWPNs to the storage admin via a traditional NS query.
The unzoned Name Server provides a solution to this dilemma because as the name implies, the unzoned name server allows an N_Port to query the Name Server and obtain a list of all N_Ports registered with the fabric without regard for zoning.
Step 2 – The Name Server returns a list of all ports registered in the Fabric
The response to the unzoned Name Server query will be a list of all N_Ports registered with the fabric.
As you think about this behavior, you may come to the conclusion that it represents a serious security flaw. As Peer Zoning was being defined, we felt the question we needed to answer was:
“If any N_Port can just query the unzoned Name Server, what’s to stop a host from using the unzoned name server to discover what targets are out there and then grant itself access to any storage port that it wants?”
We thought about this question (a lot actually) and we came up with a multi-pronged approach. As I’ll describe shortly each approach represents a different level of security that can be set on a fabric wide basis. However, before I get to the descriptions, let me state that no matter which one you choose, you are still protected by LUN Masking on the Storage array. In other words, if a rogue host were to grant itself access to the target using Peer Zoning commands, the rogue host would still be prevented from accessing data on that target due to LUN Masking. LUN Masking will only allow certain WWPNs to access certain LUNs on each target. So even if a host created a zone to allow itself to have access to a target port, it would also need to spoof the WWPN of a host that has been granted access to LUNs on that target. Let me point out that this is something that can easily be done today on all HBA and CNAs and does not represent a new security hole that is being introduced by peer zoning.
- Option 1: Peer Zoning disabled (default). If you don’t find manual zoning tasks to be too much trouble, then you can opt not to enable peer zoning.
- Option 2: Peer Zoning enabled – Authentication required. This option allows you to enable Peer Zoning but will require any port that wants to use the unzoned Name Server to authenticate with the switch using one of the mechanisms defined in FC-SP (e.g., DH-CHAP)
- Option 3: Peer zoning enabled – Port based. This option allows you to enable Peer Zoning but it will only allow unzoned name server queries on certain switch interfaces. You could simply enable the feature on interfaces where storage ports are located.
- Option 4: Peer zoning enabled – OUI based. This option allows you to specify that certain vendor OUI’s can use unzoned Name Server queries by default. (e.g., I trust <your storage vendor of choice> (hopefully EMC) OUIs but not others)
- Option 5: Peer zoning enable – open. This option allows you to specify that Peer Zoning can be used by any N_Port in the fabric.
Step 3 – The Storage Admin attaches the correct host WWPN to a “Storage Group”
This is more of a Storage Provisioning process step and does not require any FC protocol interaction. From within the Storage Array software application, the storage admin selects a host and associates it with a “Storage Group”. When the user clicks “OK” or something similar, the next step will be performed.
Step 4 – The Storage port uses AAPZ
Once the Storage Administrator adds the WWPN of the host to the storage group and commits this change, the storage port(s) that are associated with this storage group will use AAPZ (Add/replace Active Peer Zone) to add the appropriate peer zones to the fabric. The AAPZ request will contain the zone name, the principal N_Port name as well as the zone members (the peers) that should be allowed to access the principal N_Port name.
Step 5 – The AAPZ is accepted by the switch
Assuming the switch is functioning properly and it is not too busy to process the AAPZ, it will accept the AAPZ.
Step 6 – New zoneset / configuration is activated on the fabric by the switch
The Once the switch transmits the ACC to the AAPZ, it has one minute to actually update the zoning on the fabric. An example of a peer zone is shown at the top of the following diagram.
Step 7 – RSCNs distributed to all affected N_Ports
Once the zone set containing the new Peer Zone has been activated onto the fabric, each switch will transmit RSCNs to each affected N_Port. By the way, there is nothing new here. RSCNs are an existing part of the FC protocol and are widely used today.
Step 8 – The Host performs FC discovery
Once the hosts in the peer zone have received an RSCN, they will perform FC discovery as normal. This is the same discovery process that I described in the Hard versus soft zoning blog post with one big exception; the peer members in a peer zone are only allowed to access the principal zone member and not each other. This is a new behavior specific to peer zones and it is was put in place to ensure that our single initiator zoning best practice can remain intact.
Step 9 – Storage verifies zoning via GAPZ
At any point after using AAPZ, a target port can use GAPZ to verify that the zoning change is in place.
If you’re interested in hearing more about TDZ or are an EMC Customer and interested in actually giving it a try, please let me know! A robust amount of Customer demand is all that is required to get this idea out of the prototype phase and into the ready to deploy phase.
Thanks for reading!