Sıra | DOSYA ADI | Format | Bağlantı |
---|---|---|---|
01. | Windows Features Vport Queues | pptx | Sunumu İndir |
Transkript
Fall2017
Fall 2017Windows Networking: Offloads and OptimizationsDon StanwyckSr PM, Windows Core Networking
Introduction and AgendaSession IntroductionNetworking Offloads and Optimizations: How we make Windows Networking faster and more efficientSession Agenda:The networking offloads of Windows, especially Windows ServerThe networking features of Windows Server 1709 and Windows 10 Fall Creators UpdateOur vision of networking in the next 12-18 months
Offloads and OptimizationsQ: What’s an offload?A: It’s when the NIC does the processing for the OSQ: What’s an optimization?A: It’s when the OS does something to make processing faster or more efficientQ: Is something ever both?A: Yes. Some features are software-hardware coordination, an optimization with an offload
• Address checksum offload• Interrupt moderation• Jumbo frames• Large send offload (LSO)• Receive Side Coalescing (RSC)Hardware Offloads and Capabilities
• Data Center Bridging (DCB)• Enhanced Transmission Service (ETS)• Priority Flow Control (PFC)• Encapsulation offloads• NV-GRE• VxLAN• IPsec Task Offload• RMDA• Native host (mode 1)• Converged NIC (mode 2)• Guest RDMA (mode 3)Software controlled Hardware features•RSS•SR-IOV•VLAN•VMQ•VMMQ
• ACLs, extended ACLs, and SDN ACLs• NIC Teaming (LBFO, SET)• vmQoS and SDN QoS• Virtual RSS (vRSS)Software features (no HW support needed)
NIC EvolutionThe original NICs were simple devicesLoaded and unloaded one byte at a timeCalculated FCS (CRC)Performed no protocol logic
NIC EvolutionOver time they got more sophisticatedLoaded and unloaded one packet at a timeUnderstood some protocol logicCould generate IP, TCP, and UDP header checksums under the right conditionsMemory and computing power were getting cheaper
NIC Evolution WS2008, WS2008 R2NICs developed the ability to manage distinct queuesQueues could be set to interrupt independent CPUsProtocol headers could be processed enough to direct packets to different queuesRSS for native traffic (WS2008)5-tuple hash createdHash values mapped to different queuesVMQ for Hyper-V traffic (WS2008 R2)Filters set to match destination MAC addressesEach MAC address got it’s own queueDefaultSET OF QUEUES
Without RSSNICarrivingpacketsTCP/IPstackOne queuePackets processed sequentially
NICWith RSSarrivingpacketsTCP/IPstackOne queuePackets processed sequentiallyarrivingpacketsTCP/IPstackMany queuesPackets processed in parallelon associated coresGenerate hashPlace in queue
NIC Evolution WS2012, WS2012 R2To support the industry standard for SR-IOV, NICs began to get onboard switches VMQ evolved under the covers from filters to switch-port routes (when a switch was present)Windows Terminology: NIC-Switch port “vPort” (can be a vPort of a PF or can be a VF)vPortvPortvPortSWITCHPFvPortvPortvPortvPortvPortvPortvPortVFvPortVFVFVF
NIC Evolution WS2012, WS2012 R2In parallel, Remote DMA (RDMA) arrived on the sceneHigh speed full packet processing in the NICDMA between host and NIC at both sidesSkips packet processing in the stackHigh throughput, low overhead, using the processing power of the NICDataCenter Bridging (DCB) also arrived The cable becomes several virtual cables called “traffic classes”Bandwidth is managed on per TC basisIndividual TCs can be paused to prevent switch buffer issues vPortvPortvPortSWITCHPFvPortvPortvPortvPortvPortvPortvPortVFvPortVFVFVFRDMA enginevmvmvmvmHost
VMQ (filter-based)arrivingpacketsGet MAC/VLANMatch filterAB CD E FDefNoHyper-V Switch AarrivingpacketsGet MAC/VLANMatch filterAB C D E FDefHyper-V SwitchNIC_switchA
• The software path between the Hyper-V switch and the guest became a multi-lane highway• vRSS is built on VMQ• Hardware calculates the Toeplitz hash (RSS hash) on each incoming packet and stores it with the packet• vRSS unloads a VMQ and reads the RSS Hash• vRSS assigns packet to vmBus queue/core and issues software interruptAccelerating the guest - vRSS
• Packet is delivered over vmBus channel to VM• VM maps vmBus channel to RSS vCPU and processes packets in parallel vCPU processes• vmBus channels and vCPUs are independent• vRSS runs on both variants of VMQVirtual RSS
vRSSarrivingpacketsGet MAC/VLANMatch filterAB C D E FDefHyper-V SwitchNIC_switchVM “F”123vCPUvmBuschannelsvRSS
Windows Server 2016Windows Server 2016 brought several new items:- Converged NIC (RDMA to the host vNIC)- Microsoft Azure VFP Switch Extension- SDN v2 (NVGREv2 and VxLAN offloads)- SDN QoS- VMMQ- Switch Embedded Teaming
Windows Server 2016 – Converged NIC VMHostpartition SMB MgmtLive MigrationOther StuffSMB Multichannel & SMB DirectHyper-V SwitchNIC TeamNIC NICVM Storage vmNICNICNICVMvmNICVMvmNICWindows Server 2012 R2(DCB) (DCB)
Windows Server 2016 – Converged NIC VMHostpartition SMB MgmtLive MigrationOther StuffSMB Multichannel & SMB DirectHyper-V SwitchNIC TeamNIC NICVM Storage vmNICNICNICVMvmNICVMvmNICVM VM VMHost partitionLive MigrationManagement& ClusterOther StuffWith embedded teamingHyper-V Switch (SDN)NIC NICVM Storage (DCB) (DCB)RDMATCP/IPSMBWindows Server 2012 R2 Windows Server 2016(DCB) (DCB)vmNICvmNICvmNIC
SDN Switch Extension• Known in Azure as the Virtualization Filtering Platform (VFP)• Acts as a virtual switch inside the Hyper-V vmSwitch• Provides core SDN functionality for Azure networking services, including:• Address Virtualization for VNET• VIP -> DIP Translation for SLB• ACLs, Metering (QoS), and Security Guards• Bandwidth management/control (QoS)• Uses programmable rule/flow tables to perform per-packet actions• Supports all Azure data plane policy at 40GbE+ with offloads• Available to private cloud in Windows Server 2016vNIC vmNICVM SwitchVFPVM VMACLs, Metering, Security VNETSLB (NAT)vmNIC
Host: 10.4.1.5Flow Tables: the Right Abstraction for the HostVMSwitch exposes a typed Match-Action-Table API to the controllerControllers define policyOne table per policyKey insight: Let controller tell switch exactly what to do with which packets e.g. encap/decap, rather than trying to use existing abstractions (tunnels, …)Tenant DescriptionVNet DescriptionVNet Routing Policy ACLsNATEndpoints VFPVM110.1.1.2NICFlow ActionTO: 10.2/16 Encap to GWTO: 10.1.1.5Encap to 10.5.1.7TO: !10/8 NAT out of VNETFlow ActionTO: 79.3.1.2DNAT to 10.1.1.2TO: !10/8 SNAT to 79.3.1.2Flow ActionTO: 10.1.1/24Allow10.4/16 BlockTO: !10/8 AllowVNET LB NAT ACLSController
Table Typing/Flow Caching are Critical to PerformanceFlow ActionTO: 10.2/16Encap to GWTO: 10.1.1.5Encap to 10.5.1.7TO: !10/8 NAT out of VNET• COGS in the cloud is driven by VM density: 50GbE and 100GbE are here• 60 to 100 VMs/host is common, 200+ VMs/host have been seen on customer sites• First-packet actions can be complex• Established-flow matches must be typed, predictable, and simple hash lookupsBlue VM110.1.1.2NICVFPFirst Packet Subsequent PacketsFlow ActionFlow ActionFl ATO: 79.3.1.2DNAT to 10.1.1.2TO: !10/8 SNAT to 79.3.1.2Fl ActTO: 10.1.1/24Allow10.4/16 BlockTO: !10/8 AllowVNET LB NAT ACLSConnection Action10.1.1.2,10.2.3.5,80,9876DNAT + Encap to GW10.1.1.2,10.2.1.5,80,9876Encap to 10.5.1.710.1.1.2,64.3.2.5,6754,80SNAT to 79.3.1.2
Microsoft ConfidentialCustomers asked for VxLAN – we delivered!!• But we still do NV-GRE for those who like that optionAll HNV policies are handled in the SDN ExtensionNetwork Controller (NC) plumbs the policies to the gateways and hosts• Either SCVMM or NRP program the NC • A semi-hidden feature automatically adjusts the MTU on the wire to accommodate the encapsulation overhead• Better performance than splitting packets due to length of encapsulationHNVv2 – VxLAN, NV-GRE
Microsoft ConfidentialA more reliable, more performant vmQoS• Compatible with RDMA work loads• Compatible with DCB• Supports Egress reservations (minimum guaranteed bandwidth)• Supports Egress limits (maximum permitted bandwidth)• Works well even with very different policies for different VMs• Works on all vmSwitch ports (host or guest)• Managed by Network Controller• Implemented in the VFPSDN QoS
Virtual Machine MultiQueue (VMMQ)• Built on VMQ (vPort) and vRSS• Associates vPorts with one or more hardware queues• Distributes traffic between queues based on RSS Hash• Allows for multiple queues on the default vPort• Useful for very network intensive VMs or when VMs outnumber the number of VMQ queues
VMMQarrivingpacketsGet MAC/VLANMatch filterAB C D E FDefHyper-V SwitchNIC_switchVM “F”123vCPUvmBuschannels
VMMQ Discussion• Still limited by number of cores available• Still limited by number of queues available• Not much advantage below 25 Gbps• A fast processor can keep up with 6-10 Gbps by itself• Very useful when number of VMs exceeds number of queues• Increasing the number of queues that the default vPort can use helps all the default vPort users (the ones that don’t have their own VMQ)
Microsoft ConfidentialLegacy NIC Teaming isn’t going away • It isn’t compatible with the SDN switch extensionLong-term direction is to integrate full teaming functionality into the Hyper-V Switch• WS2016 is the v1 edition of integrated teaming• Focused on the needs of the SDN Extension and Converged NIC• Has a number of limitations/restrictions in order to focus on doing the right things rightSwitch Embedded Teaming (SET)
Microsoft ConfidentialWhat it does• Switch independent teaming• Dynamic or HyperVPort modes of load distribution• RDMA/DCB aware• SR-IOV teaming• Teams of up to 8 portsThe limitations• All team members must be identical make/model/driver/features• No LACP• No Active/Passive teamingSwitch Embedded Teaming (SET)
Microsoft ConfidentialTwo notable features in this Semi-Annual Channel (SAC) release• Dynamic VMQ/VMMQ (also known as RSSv2)• Guest RDMAWindows Server 1709
Dynamic VMQ/VMMQ (RSSv2)dVMQ from WS2012R2 had challenges• Spreading was too slow• Coalescing was too fastRSS spreading management was redesigned• Spreading to more cores happens aggressively• Coalescing to fewer cores happens conservativelyVMQ and VMMQ make use of RSSv2
Guest RDMARDMA works in Native Hosts – why not Guests?Guest RDMA uses SR-IOVThroughput as good as with native RDMANo noticeable load on Guest LPs
RDMA Demo
VMHost partitionSMBManagementVM Storage Hyper-V SwitchC1vNICvNICTORVFvmNICSR-IOVHost B..42.100C1..42.101..42.110Demo configuration
Help us do RDMA right Customers complain: RoCE is too hard to deploy• They are right• DCB is hard – and it has real issues in larger deploymentsIf you do RoCE, please help us with tools to verify and validate switch settings, NIC settings, etc.Fortunately iWarp just worksNew RDMA Deployment Guide available at: https://gallery.technet.microsoft.com/RDMA-configuration-425bcdf2
Review, Our vision, Our future plansWindows Server 2012- RDMA- SR-IOV- NIC Teaming- Software QoSWindows Server 2012 R2- Dynamic VMQ- HNVv1Windows Server 2016- Converged NIC- VXLAN- VMMQ- SET teamingWindows Server, version 1709- Guest RDMA- Dynamic VMQ/VMMQFuture releases- Hardware QoS - Tenant DCB - QUIC - CryptoAccelerating the host- GFTAccelerating the guest
Looking aheadVision: Accelerated Networking everywhereAcceleration for nested Hyper-V (e.g., Container Host)Acceleration for untrusted tenantsAcceleration for low-latency apps in guests/tenantsRDMA everywhere SMB, pMEM, NVMe, etc.
Call to actionOffloads are the key to high performance networkingPlease implement as many of the Windows Offloads as possibleTell us what we other offloads we should be exploringIf you do RoCE – we need tools, diagnostics, etc.Look at security offloads – design with network security in mind
Thanks! Thank you for being our partnerQuestions?
Please Complete An Evaluation Your input is important!Multiple ways to access Evaluation Forms:1. CommNet stations located throughout conference venues2. Via WinHEC app on your Windows Phone and Windows device3. Via BYOD browser from any wired or wireless internet connection to <link>
© Microsoft Corporation. All rights reserved.