There’s a lot happening in the world of NVMe over Fibre Channel (FC-NVMe). That’s why the FCIA hosted a live webcast “What’s New in FC-NVMe-2?” where our FCIA experts, Mark Jones, Craig Carlson and Marcus Thordal explained the latest updates with the standard, the intricacies of error detection and recovery, how to ensure the most reliable NVMe over Fibre Channel deployment possible. If you missed the live webcast, you can view it on-demand or watch it on the FCIA YouTube Channel.
The webcast generated many interesting questions. As promised, our speakers have answered them all here.
Q. With reference for the section covering XFer-RDY is there any information on “write accelerated” environments and the characteristics of NVMe in these scenarios?
A. Assuming the question is referring to FCIP extension for writing over distance. For (asynchronous) replication over extended distance ‘write acceleration’ is often applied by the (FCIP) extension switch which ‘in lieu’ of the target array will respond with XFER_RDY when receiving a CMND IU. Thereby the initiator (primary array) is not waiting the full round trip time (RTT) to the remote site which can be hundreds of milliseconds to receive the XFER_RDY and start sending write data.
Please refer to your vendor for FC-NVMe-2 and SLER support for FCIP extension.
Q. Will FC-NVMe-2 work on the existing hardware?
A. Yes. Hardware changes at the HBA or switch are not needed to support FC-NVMe-2 Flush-based Sequence level error detection and recovery.
Q. Is there any real-world testing/data on use case behavior of FC-NVMe-2? And better, any data to compare it to competing solutions to show the benefits against the other NVMe solutions?
A. Whereas the latest HBA generations and drivers support FC-NVMe-2 there are not yet any storage arrays which have general availability support for FC-NVMe-2.
The difference in behavior (and application performance impact) is that without SLER, error recovery at the NVMe protocol layer results in halting IO for around 30 seconds before resuming, while with SLER recovery happens without any impact or visibility at the NVMe protocol layer and without halt in IO.
Q. Shouldn’t it should pick up where it left off (i.e. data sequence), not retransmit all data sequences?
A. The work group that developed FC-NVMe-2 discussed whether or not to allow retransmission at a specified relative offset, or from relative offset zero. The work group chose to always perform retransmission from relative offset zero.
Q. Can vendors that implemented or started to implement FC-NVMe-1 implement FC-NVMe-2 easily?
A. Yes. The error recovery defined in FC-NVMe-2 is an addition to the functionality defined in FC-NVMe. Furthermore, since it’s not on the performance path, error recovery is typically performed in firmware meaning that hardware changes are not needed.
Q. FCP-2 has REC/SRR already, why reinvent the wheel again?
A. As mentioned in the webinar (see slides 17 and 18), FCP-2 error detection and recovery uses the REC Extended Link Service and SRR FCP Link Service that are each sent in a different Exchange than the I/O command. Thus, with Exchange-based routing the Exchanges may be delivered out-of-order, and that may result in data integrity issues.
Q. Can you briefly summarize how SLER with FC-NVMe-2 compares to how error recovery works with the SCSI protocol and what the benefit is to the application / database / user experience? For example, does it happen faster and cause less latency or disruption to host/application because it’s not at the ULP level? A high-level explanation is fine.
A. At the ULP/SCSI protocol level, error recovery, if any, is performed after a command timeout period that is typically in the 10s of seconds range (e.g., 30 seconds) FC-NVMe-2 SLER is performed every 2 seconds, the default value, at the transport level, and normally within the ULP timeout. As such, any error detection and recovery is transparent to the ULP.
Q. In the case of lost write data if let’s say the second write data is lost, then does the initiator retransmits the whole exchange from the first write data? Or does it pick up from the second write data?
A. See response to the data sequence question above. In addition, FC-NVMe-2 specifies if the NVMe_SR IU requires retransmission of an NVMe_DATA IU, then the entire NVMeoFC Data Series shall be retransmitted. Note a Data Series is defined as the “set of NVMe_DATA IUs that make up the total data transfer for a particular command.”
Q. How are the number of FC ports sold vs. RoCE / iWARP? How are the Ethernet ports affecting new FC port sales?
A. We are not aware of NVMe over Fabrics transports being tracked by industry analysts, so it’s difficult to report on the numbers of deployments to compare. One indicator of success of Fibre Channel as a transport for NVMe over Fabrics is the number of fully supported products and solutions in the marketplace. Nearly all the major all flash storage vendors now support NVMe over FC, along with all HBA and switch manufacturers, also connectivity is available over all major operating systems. We have not observed that RoCE or iWarp is a threat to FC port sales.
Want to learn more about NVMe over Fibre Channel? Check out the FC-NVMe Playlist on the FCIA YouTube Channel.