Sıra | DOSYA ADI | Format | Bağlantı |
---|---|---|---|
01. | Transparent Implemented Canonical Floating | pptx | Sunumu İndir |
Transkript
Rendering Technology in ‘Agents of Mayhem’Scott KircherPrincipal ProgrammerCore Technology GroupDeep Silver Volition
Rendering Technology in Agents of MayhemScott Kircher
Who am I?• Principal Programmer• Rendering Team in Volition’s Core Technology Group• Ph.D. in Computer Science from UIUC• Nearly Eleven years of experience at Volition
Agents of Mayhem• Open World City• Third-person Action• Stylized Art with Physically Based Rendering• Tons of Particles & Alpha Meshes
Topics• Order Independent Transparency• Modifications to Weighted Blended OIT [McGuire2013]• Lighting Compute• Features and Optimization• Global Illumination• Better Occlusion for Light Propagation Volumes [Kaplanyan2010]
Order Independent Transparency
Rationale• All previous Volition games:• Traditional back-to-front CPU sorted alpha
Rationale• All previous Volition games:• Traditional back-to-front CPU sorted alpha• Lots of sorted alpha means:• Inefficient CPU rendering• Per “object” sorting, not per-pixel• Sort “popping”• Low-res alpha doesn’t sort with high-res
Rationale• All previous Volition games:• Traditional back-to-front CPU sorted alpha• Lots of sorted alpha means:• Inefficient CPU rendering• Per “object” sorting, not per-pixel• Sort “popping”• Low-res alpha doesn’t sort with high-res• Solution: OIT?• Many OIT techniques inefficient on GPU
Weighted-Blended OIT• Enter McGuire & Bavoil [McGuire2013, McGuire2015]Image from [McGuire2013]
Weighted-Blended OIT Pros• “Negative” CPU cost• Can now sort alpha by render state (i.e. material/shader) instead of depth
Weighted-Blended OIT Pros• “Negative” CPU cost• Can now sort alpha by state instead of depth• Efficient on GPU• Some math added to alpha shaders• Simple full-screen composite step• Low-res and high-res alpha “sort” seamlessly• No popping, ever.• “Sort” issues transition smoothly• Simple?• No. But close enough.
Weighted-Blended OIT Cons• MAGIC NUMBERS EVERYWHERE• Very opaque alpha behaves badly• Always “wrong”• (But not wrong enough!)
How WBOIT Works (McGuire)• Replace ordered blending with weighted average?=∑ ??????∑?? ?=∏ (1−??)??????????=?+?(1−?)??=??????????? ?? ????????
Weighting Function (McGuire)• Weights are the “magic”• Weight high-coverage things more• Weight near things more?= ? (?3 ,?3 ) [McGuire2015]?=min (8? ,1)+0.01?=1−0.95 ?Where f rescales/clamps w for precisionzw?=1?=0.0625
Emissive Alpha – Major Problem
Intuition• Consider n layers of the same emissive alpha value E? ′=∑?=1??=???=∑?=1??? ?∑?=1???=?? ′?=?
Main Idea• Accumulate additional information• “Additiveness” ≈ Number of additive layers• Amplify weighted average by additiveness
Visual Summary of New WBOIT÷×Accumulated weighted colors Accumulated weightsAdditiveness RevealageBlended via
Visual Summary of New WBOIT÷×Accumulated weighted colors Accumulated weightsAdditiveness RevealageBlended via
WBOIT Formulas + Additiveness• Revealage remains the same• Color is the same• But with emissive explicitly identified• Additiveness is new?=∑ (????+??)??∑???=∏ (1−??)?=∑ min¿¿¿Arbitrary sensitivity constant??=?????????−????????????? ?? ????????
New WBOIT Composite• Additiveness amplifies weighted average color• But needs to be mitigated for mixed emissive/non-emissive??????????=(?′ )?+? (1−?)?′=( ?4 (1−?)+? ∙? )+min (2 (1−? ) ,1 )Reduces additiveness in areas of high opacity (low revealage)Prevents darkening in absence of emissive?=???????????????????????
Weighting Function and Emissive• Purely emissive alpha has zero opacity• Must include emissive in computation of weight• Must allow weight to go to zero?=min (8? ,1 )+0.01?= ? (?3 ,?3 )?=min (3?+? ∙ lum(?) ,1)?=2?=20?=?ParticlesAlpha Meshes
Other WBOIT IssuesAnd how we dealt with them
Color Dominance• Side-effect of WBOIT with Additiveness• Luminance is adjusted, but hue dominated by foreground layers• Our artists actually liked thisRegular Additive Regular WBOITWBOIT Additive
Dark Halos• High “sensitivity” to opacity or emissivity produces these?=min (3?+? ∙ lum(?) ,1)?=20 ?=2
Punch Through• Low “sensitivity” with dim emissive can produce punch-through?=20?=2?=min (3?+? ∙ lum(?) ,1)
Halo vs. Punch Through Control?=?????? ??????????? h???? ?? ?????OIT Feather Start = 0.01 0.3333 (default) 0.75 1.0
Depth Range• To avoid retuning, convert depth to a canonical range• We chose near = 0.5m, far = 300m• Also, we allow b to go to zero• We have an alternate method of dealing with very low weights?= ? (?3 ,?3 )?=1− ? ′?′=saturate (??−?−? ∙??(?−? ))?=300?=0.5 linear view depth
Weight Biasing/Clamping• FP16 Precision is an issue. Solved already [McGuire2013,2015]• Large variance in weights between near and far alpha is bad.?= ? (?3 ,?3 )¿min (104?3?3 ,300)?=??????? h???? ???? ??????
Better Weight Biasing/Clamping• Can’t just introduce big clamp at low end• Lose depth sorting when weights are clamped• Instead, shift weights up (only depth-related portion)?= ? (?3 ,?3 )¿min ( (104?3+5 ) ,20 )?3Opacity weight multiplied in after biasing and clamping!?=??????? h???? ????
Implementation• Simple 2-target MRT setup.• Second MRT stores Revealage in Red and Additiveness in Alpha• Use separate blending control for alpha channel• See Appendix for more details• Shader Source Code• CMASK Optimization
Lighting Compute
Tile-Based Lighting Compute
Tile-Based Lighting Compute
Tile-Based Shading Review• Compute shader culls lights to tiles (groupshared list per tile)• Then shades pixels in tile per those light lists
Features• Lots of (expensive) lighting features implemented• Multiple lighting models (all PBR)• PCF shadows• Variable penumbra shadows (PCSS)• Projected textures• Textured-emitter area lights• Omni lights• “Realistic” tube lights• Square or round spot lights• Darks (negative lights)• Light clip planes• Light blockers & portals
Light Leaking• Familiar problem
Light Leaking• Familiar problem, standard solutionsInfinite clip planes Stencil clip meshes
Light Blockers• Finite light clip planes
In Game Example• No light blockers
In Game Example• With light blockers
Why Not Shadow Casters?• Too many lights, some don’t even support shadows
Light Blocker Setup
How It Works• Cull tiles against blocker “shadow” frustums
How It Works• List blockers requiring per-pixel checks for each light
Returning To This Example• For a moment
Blocker Tile Culling• Light blockers off
Blocker Tile Culling• Light blockers on
Blocker Tile Culling• Tiles requiring per pixel checks
ImplementationCull lights vs. tile Light listClassify (light, portal) pairs vs. tileOne threadgroup (256 threads) = one 16x16 pixel tileProcess (groupsync between each phase) Groupshared memory (LDS)(Light, portal) bitarrays (per pixel / enclosed)Build list of (light, portal) pairs Light list (Light, portal) listOne per lightThread allocationOne per lightOne per pair Light list (Light, portal) listOne per lightBuild list of (light, blocker) pairs Light list (Light, blocker) list(Light, portal) bitarrays (per pixel / enclosed)Classify (light, blocker) pairs vs. tile One per pair Trimmed light list (Light, blocker) list(Light, blocker) bitarray (per pixel test needed)(Light, portal) bitarrays (per pixel / enclosed)Compact & sort surviving lights One per lighti l i (per pixel test need
Feature SpectraLighting Compute Optimization
Remember That Feature List?• Lots of features means lots of register usage• More registers per thread = less threads per shader unit• Naïve implementation = BAD occupancy
Main Idea• Break shader into culling phase + different combinations of features• Select feature set (or spectrum) based on needs, per tile• Culling phase determines what shader to use for each tile
Feature Spectra
Shader Modes• Selected from feature spectra
Investigating Feature Spectra
ResultCulling PhaseShading PhaseVarious Tile Modes
Global Illumination
Light Propagation Volumes• One of first real-time GI techniques• Crytek[Kaplanyan2010]
LPV Middleware• Our starting point: Aura Library from • Heavily modified (by Volition’s own Mike Flavin)• Modifications applicable to any LPV implementation
LPV Basics• No Global Illumination• (Direct + Occluded Skydome only)
LPV Basics• With Global Illumination
LPV Basics• Render Reflective Shadow Maps (RSM)• Inject into LPV volumes• Propagate light through volume• Apply to sceneAlbedo Normals Depth
LPV Basics• 3D LPV volumes store SH of radiant intensity function
Global vs. Local Volumes• Originally, only cascaded global volume• Follows cameraPlayerCascade 0Cascade 1Cascade 2
Global vs. Local Volumes• For interiors, we found fixed local volumes worked better• Higher quality• No need to inject & propagate every frame
Original LPV Occlusion• Inject “occluders” into LPV volume from depth [Kaplanyan2010]• Main depth buffer• Auxiliary depth buffers (RSMs themselves, other shadow maps)• Existed in original Confetti implementation[Kaplanyan2010]
LPV Occlusion Problems[Kaplanyan2010]Light bleeding from coarse discretization Missed geometryBiggest Problems:• Inconsistent results based on view direction• Limited artist control!
Light Blockers for LPV• Artists placing light blockers anyway, can use for GI too!
GI Only View• Light blockers on
GI Only View• Light blockers off
GI Only View• Light blocker placement
Blockers During Propagation• Light blockers injected into volume• Stored as “axial” occlusion (amount of occlusion along each axis)
Blockers During Propagation• Light blockers injected into volume• Stored as “axial” occlusion (amount of occlusion along each axis)• Block light during propagation• Produces GI “shadows”
Blockers During Apply• Light blockers culled against 4x4x4 macro-cells• To reduce set of blockers considered in each LPV cell• Block light from trilinear samples during apply• Eliminates light leaking from coarse grid
Light Portals for LPV• Portals injected along with blockers as set of “holes” per blocker• Modify axial occlusion for propagation• Negate sample blockage in apply
Summary• Emissive/additive support for Weighted, Blended Order Independent Transparency• Light blockers & portals for tile-based lighting methods• Feature Spectra for optimizing large tile-based deferred shading feature sets• Modifications for Light Propagation Volume based GI• Local volumes• Light blockers & portals
Questions?http://www.dsvolition.com/publications/
References• Andersson, DirectX 11 Rendering in Battlefield 3, Game Developers Conference, 2011• https://www.slideshare.net/DICEStudio/directx-11-rendering-in-battlefield-3 • Kaplanyan, Dachsbacher, Cascaded Light Propagation Volumes for Real-Time Indirect Illumination, Proceedings of the 2010 Symposium on Interactive 3D Graphics and Games.• http://dl.acm.org/citation.cfm?id=1730821&CFID=989089912&CFTOKEN=24284118• McGuire, Bavoil, Weighted Blended Order-Independent Transparency, Journal of Computer Graphics Techniques, vol. 2, no. 2, 2013• http://jcgt.org/published/0002/02/09/• McGuire, Implementing Weighted, Blended Order-Independent Transparency, Blog post, 2015• http://casual-effects.blogspot.com/2015/03/implemented-weighted-blended-order.htmlhttp://www.confettispecialfx.com/http://www.dsvolition.com/
AppendixWBOIT implementation details + shader source code
Implementation• MRT Setup• Blend State• MRT0: (1)S + (1)D for all channels• MRT1: (0)S + (1-S)D for color channels, (1)S+(1)D for alpha channel• Low-res and high-res alpha easily combined in composite• See Appendix for Shader Source CodeRed * Weight Green * Weight Blue * Weight WeightRevealage (unused) (unused) AdditivenessMRT0: FP16:16:16:16MRT1: 8:8:8:8
CMASK Optimization• Reading high-res targets can be expensive• Fast-clear eliminate of high-res buffers also slow (~0.2ms)• Read super-tiny CMASK buffer first and skip work if not written• Reduces “no-alpha” case from 0.7ms to 0.3ms on PS4
// This function is executed in alpha material shaders as the last step before writing out to the MRTsvoid weighted_oit_process(out float4 accum, out float revealage, out float emissive_weight, float4 premultiplied_alpha_color, float raw_emissive_luminance, float view_depth, float current_camera_exposure) { const float opacity_sensitivity = 3.0; // Should be greater than 1, so that we only downweight nearly transparent things. Otherwise, everything at the same depth should get equal weight. Can be artist controlledconst float weight_bias = 5.0; //Must be greater than zero. Weight bias helps prevent distant things from getting hugely lower weight than near things, as well as preventing floating point underflowconst float precision_scalar = 10000.0; //adjusts where the weights fall in the floating point range, used to balance precision to combat both underflow and overflowconst float maximum_weight = 20.0; //Don't weight near things more than a certain amount to both combat overflow and reduce the \overpower\ effect of very near vs. very far thingsconst float maximum_color_value = 1000.0;const float additive_sensitivity = 10.0; //how much we amplify the emissive when deciding whether to consider this additively blended// Exposure changes relative importance of emissive luminance (whereas it does not for opacity)float relative_emissive_luminance = raw_emissive_luminance * current_camera_exposure;//Emissive sensitivity is hard to pin down//On the one hand, we want a low sensitivity so we don't get dark halos around \feathered\ emissive alpha that overlap with eachother//On the other hand, we want a high sensitivity so that dim emissive holograms don't get overly downweighted.//We expose this to the artist to let them choose what is more important.const float emissive_sensitivity = 1.0/<<artist controlled value between 0.01 and 1>>;float clamped_emissive = saturate(relative_emissive_luminance);float clamped_alpha = saturate(premultiplied_alpha_color.a);// Intermediate terms to be cubed// NOTE: This part differs from McGuire's sample code:// since we're using premultiplied alpha in the composite, we want to// keep emissive values that have low coverage weighted appropriately// so, we'll add the emissive luminance to the alpha when computing the alpha portion of the weight// NOTE: We also don't add a small value to a, we allow it to go all the way to zero, so that completely invisible portions do not influence the resultfloat a = saturate((clamped_alpha*opacity_sensitivity) + (clamped_emissive*emissive_sensitivity));// NOTE: This differs from McGuire's sample code. In order to avoid having to tune the algorithm separately for different// near/far plane values, we produce a \canonical\ depth value from the view-depth, using an fixed near plane and a tunable far planeconst float canonical_near_z = 0.5;const float canonical_far_z = 300.0;float range = canonical_far_z-canonical_near_z;float canonical_depth = saturate(canonical_far_z/range - (canonical_far_z*canonical_near_z)/(view_depth*range));float b = 1.0 - canonical_depth; // clamp color to combat overflow (weight will be clamped too)float3 clamped_color = min(premultiplied_alpha_color.rgb, maximum_color_value);float w = precision_scalar * b * b * b; //basic depth based weightw += weight_bias; //NOTE: This differs from McGuire's code. It is an alternate way to prevent underflow and limits near/far weight ratiow = min(w, maximum_weight); //clamp by maximum weight BEFORE multiplying by opacity weight (so that we'll properly reduce near faint stuff in weight) w *= a * a * a; //incorporate opacity weight as the last stepaccum = float4(clamped_color*w, w); //NOTE: This differs from McGuire's sample code because we want to be able to handle fully additive alpha (e.g. emissive), which has a coverage of 0 (revealage of 1.0)revealage = clamped_alpha; //blend state will invert this to produce actual revealageemissive_weight = saturate(relative_emissive_luminance*additive_sensitivity)/8.0f; //we're going to store this into an 8-bit channel, so we divide by the maximum number of additive layers we can support}
// Full-screen composite pixel shaderPS_OUTPUT main_ps(VS_OUTPUT input) {uint3 ipos = uint3(input.pos.xy, 0);#if (defined(_PS4) || defined(_XBOX3)) && defined(USE_CMASK_OPT)// skip some work for pixels that we didn't write to at allconst bool hires_written = decoded_cmask.Load(uint3(ipos.x/4,ipos.y/4,0))!=0.0f;#else const bool hires_written = true;#endiffloat revealage = 1.0;float additiveness = 0.0;float4 accum = float4(0.0,0.0,0.0,0.0);// high-res alpha[branch]if(hires_written) {float4 temp = input_accum2.Load(ipos);revealage = temp.r;additiveness = temp.w;accum = input_accum1.Load(ipos);}// low-res alphafloat4 temp = input_accum2_subpass.SampleLevel(Sampler_filter_clamp, input.uv, 0);revealage = revealage * temp.r;additiveness = additiveness + temp.w; accum = accum + input_accum1_subpass.SampleLevel(Sampler_filter_clamp, input.uv, 0);// weighted average (weights were applied during accumulation, and accum.a stores the sum of weights)float3 average_color = accum.rgb / max(accum.a, 0.00001);// Amplify based on additiveness to try and regain intensity we lost from averaging things that would formerly have been additive.// Revealage gives a rough estimate of how much \alpha stuff\ there is in the pixel, allowing us to reduce the additive amplification when mixed in with non-additivefloat emissive_amplifier = (additiveness*8.0f); //The constant factor here must match the constant divisor in the material shaders!emissive_amplifier = lerp(emissive_amplifier*0.25, emissive_amplifier, revealage); //lessen, but do not completely remove amplification when there's opaque stuff mixed in// Also add in the opacity (1-revealage) to account for the fact that additive + non-additive should never be darker than the non-additive by itselfemissive_amplifier += saturate((1.0-revealage)*2.0); //constant factor here is an adjustable thing to indicate how \sensitive\ we should be to the presence of opaque stuffaverage_color *= max(emissive_amplifier,1.0); // NOTE: We max with 1 here so that this can only amplify, never darken, the result// Suppress overflow (turns INF into bright white)if (any(isinf(accum.rgb))) {average_color = 100.0f;}PS_OUTPUT OUT; OUT.Color0 = float4(average_color, 1.0 - revealage);return OUT;}
Additional Bonus SlideLight Blockers/Portals LDS Memory Analysis for Lighting Compute
Some Rough Numbers• Max lights per tile: 64• Max blockers per light: 32• Max portals per light: 32• Max portals per blocker: 32• Max (light,portal) or (light,blocker) pairs per tile: 256• Groupshared (LDS) memory requirements:• Initial & final lights in tile: 512 bytes• Various (light,blocker)/(light,portal) bitarrays: 1280 bytes• + Other miscellaneous counts, etc…• Total: ~2KB (max theoretical PS4 occupancy: 8 wavefronts/SIMD)