Sıra | DOSYA ADI | Format | Bağlantı |
---|---|---|---|
01. | Stores Triangles Frames Triangles | pptx | Sunumu İndir |
Transkript
HYBRID RECONSTRUCTION ANTI ALIASING1/22/20UBISOFT ENTERTAINMENTMICHAL.DROBOT3D @ FAR CRY 4
HRAA: Goals• Temporal Stability• High quality Edge Anti Aliasing• Super-sampling comparable to 4x RGSS• Shading cost of 1 sample / pixel• Performance ~1ms on PS4 / X1 @ 1080p resolution
HRAA: Overview• Stable Edge Anti-aliasing• Temporal Super-sampling• Temporal Anti-aliasing
Stable Edge AA• Morphological– SMAA [Jimenez 11]– FXAA [Lottes 09]• Analytical Edge AA– GBAA [Persson 11]– DEAA [Malan 10]• MSAA• EQAA / CSAA• Coverage Based– CRAA
Morphological• Pros:– Highest perceptual quality is static scenario– Catch All behaviour– Ease of integration– Uses rasterized data
Morphological: Frame A
Morphological: Frame B
Analytical: Frame A
Analytical: Frame B
Morphological• Cons:• 1.0-1.5 ms @1080p (PS4/X1)• Not temporarily stable– Wobbles under motion• Partially solved– More expensive SMAAx4
Analytical• Pros:– Highest edge quality close to ground truth– Temporarily StableExtends to Alpha Test (use Signed Distance Fields for best results)– Fast 0.3 ms @1080p (PS4/X1)
Analytical• Cons:– Complicated integration• Every G-Buffer shader outputs distance to edge• Geometry Shader / Direct Vertex access [Drobot 14]– Suffers from rasterization issues• Rasterization Order Dependant• Content dependent– Overtessellation effectively turns AA off– Does not AA intersecting trianglesd0d1 d2D = min3(d0, d1, d2)
Analytical: Distance to Vertical Edge
Analytical: Distance to Horizontal Edge
Analytical: No AA
Analytical: AA
Analytical AA
MSAA• Pros:– Converges to ground truth with amount of samples– Resolves sub-pixel issues• Cons:– Memory footprint scales linearly with amount of samples– Mesh rendering time scales with amount of samples– Complex integration with deferred rendering
EQAA/CSAA• GPUs can decouple coverage samples from color/depth fragments– MSAA aided by cheap coverage samples = EQAA
Coverage Reconstruction AA• Use color fragment with additional coverage samples– Minimal cost• Reconstruct final image from coverage• Requires hardware capable of direct Coverage samples access• Following presentation based on AMD GCN architecture– Other IHVs also support coverage sampling
Basic concepts• Fragments– Rasterized values stored in memory– Dictate Buffer Memory Footprint– 1-8 in 2^N format• Samples– Rasterizer positions inside a pixel– Set on Rasterizer State Vector– 1-16 in 2^N format– Anchors - overlapping with Color/Depth Fragments
Basic concepts : Association Buffers• FMASK– Fragment Compression Buffer associated with Color Buffer– Stores association table between samples and color fragments– For every pixel stores• For every sample– Bit index of associated fragment– ( [1,2,4,8,16 samples][1,2,4 bit for color index] + 1 bit for UNKNOWN) per pixel• 4-sample/2-fragment = 4 * 2 = 8 bit• 8-sample/1-fragment = 8 * 1 = 8 bit• 16-sample/8-frag = 16 * 4 = 64 bit
Example : Color/Depth : 2F 4S0123Color / Depth103 22 21 20 2Color FragmentsFMASK
0123103 22 21 20 2Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 02 01 20 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 02 010 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 02 01 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 02 01 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 02 010 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 12 01 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 12 01 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 22 01 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 22 01 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 221 10Color FragmentsFMASKExample : Color/Depth : 2F 4S
0123103 02 21 10 0Color FragmentsFMASKExample : Color/Depth : 2F 4S
CRAA Setup• MRT Setup– Color / Depth 1F xS• Pipeline– Gbuffer Render– Lighting– CRAA Resolve
CRAA• FMASK : 1F xS– 8 bit• X e {1, 2, 4, 8}– 16 bit• X e {16}– Bitwise• 0 – Fragment written to color buffer was ‚hit’ by sample• 1 – UNKNOWN – sample is associated with other Color Fragment– Immediately know Color Fragment ‚coverage’• (X - Countbits(FMASK[pixel])) / X– Can we infer associations of UNKNOWN samples?
8xCRAA: Example053614270 7 16 15 14 13 12 11 10 1Color Fragments FMASK
053624270 7 16 15 14 13 12 11 10 1Color Fragments FMASK8xCRAA: Example
053624270 7 16 15 14 13 12 11 10 1Color Fragments FMASK8xCRAA: Example
053614270 7 06 15 14 03 12 01 00 0Color Fragments FMASK8xCRAA: Example
053614270 7 06 15 14 03 12 01 00 0Color Fragments FMASK8xCRAA: Example
053614270 7 06 15 14 03 12 01 00 0Color Fragments FMASKUBLRNeighbourFragments8xCRAA: Example
• FMASK : 00010110• X = 8• RED Coverage = CountBits(00010110^) / 8 = 5/8• UNKNOWN– Infer them from neighbourhood– We know every Sample position8xCRAA: Example FMASK
8xCRAA: Simple Resolve• For every UNKNOWN sample– GetSamplePosition– Treat Sample Pos as vector– Add together• Sum defines an approximate equation of half plane dividing the pixel– Calculate Half Plane direction : Vertical / Horizontal– Calculate Half Plane slope– From Direction and Slope Infer UNKNOWN fragment• Up/Bottom• Left/Right• Resolved Pixel = Color Fragment * Coverage + (1-Coverage) * Inferred Fragment
8xCRAA: Simple Resolve053614270 7 06 15 14 03 12 01 00 0Color Fragments FMASKUBLRNeighbourFragments
8xCRAA: Simple Resolve053614270 7 06 15 14 03 12 01 00 0Color Fragments FMASKUBLRNeighbourFragments
8xCRAA : In Practice8xCRAA8xCRAA 8xMSAA
8xCRAA : In Practice8xCRAA8xCRAA 8xMSAA
8xCRAA LUT• What about subpixel artifacts?• Can we eliminate them?• Can we get rid of ALU to be only BW bound?• Solution• Precompute an LUT to store neighbouring pixel weights– Use full neighborhood– Multiple edges / triangles crossing the pixel
053614270 7 16 15 14 13 12 11 10 1Color Fragments FMASKUBLRNeighbourFragments8xCRAA LUT : Example
053614270 7 16 15 14 13 02 11 10 0Color Fragments FMASKUBLRNeighbourFragments8xCRAA LUT : Example
8xCRAA LUT : In Practice• CLUT[256]– Every entry stores weights for UP, BOTTOM, LEFT, RIGHT neighbour sample– Weights are 4BIT – as maximum coverage can be 16– LUT is indexed directly by FMASK bit pattern• CLUT for 8S is 512bytes : 256 * 4 * 4 – Fits Texture Cache Lines– Once primed lookups are for ‚free’• For Every FMASK entry – Precompute Optimal Neighbourhood Blending Scheme
053614270 7 16 15 14 13 02 11 10 0Color Fragments FMASKUBLRNeighbourFragmentsCLUT[01101111]U 2B 3L 0R 08xCRAA LUT : Example
053614270 7 16 15 14 13 02 11 10 0Color Fragments FMASKUBLRNeighbourFragmentsCLUT[01101111]U 2B 3L 0R 08xCRAA LUT : Example
8xCRAA LUT : In Practice• Simple resolve– Neighborhood prefetch– FMask read– LUT[FMask] read– Blend• Minimal overhead of coverage sampling– You mileage may vary depending on HW, settings etc.
8xCRAA LUT : In Practice• AA triangle intersections• Sub-pixel quality varies– Better than Analytical methods based on single traingle• Non sub-pixel triangle quality equal to 8xMSAA– Correct resolve assuming all triangles cutting the pixel will rasterize in immediate neighborhood• Common fail case:– Triangle doesn’t rasterize in immediate neighborhood
8xCRAA8xCRAA LUT
Temporal Super Sampling• Based on Killzone: Shadow Fall [Valient14]• Use current and previous frame for data (2 samples) – Use N-2 frame for Color flow test• N-1 Sample is valid only if:– Motion flow between frame N and N-1 is coherent– Color flow between frames N and N-2 is coherent• (note N-2 and N have same sub-pixel jitter)
Frame NTemporarily StableEdge Anti-aliasingTemporal 2x Super-sampled resolveFrame N-1Frame N-2Stable Super-sampled Frame
Temporal Super Sampling• Tests use 3x3 neighborhood • Sum of Absolute Differences– For performance reasons => smaller window =>more conservative• GCN provides HW acceleration– SAD– QSAD– MQSAD– Packed LERP
Temporal Super Sampling• If N-1 sample fails Geometric Metric– Interpolate from N• If N-1 sample fails Color Metric– Limit N-1 sample by N color bounding box– Improves stability– Brings in some new information• Maximize incoming information through advanced sampling patterns
Sampling Patterns : 1x Centroid
Sampling Patterns : 2x Rotated Grid
Sampling Patterns : 2x Quincunx
Sampling Patterns : 4x Rotated Grid
Sampling Patterns : 2x FLIPQUAD
Sampling Patterns : Comparison1x FLIPQUAD 4xRG
FLIPQUAD: In Practice• [AMD 13] AMD_framebuffer_sample_positions• 2xMSAA – easy setup• Significantly higher quality than QUINCUNX at same cost [Laine 06]Pattern E1x Centroid >1.04x Uniform Grid 0.6984x Rotated Grid 0.439Quincunx 0.518FLIPQUAD 0.364
Temporal FLIPQUAD• Split the pattern in half• Frame A (BLUE) renders on part Frame B (RED) second• Needs custom per pixel within quad resolve– Convenient blend on X or Y axis depending on frame• Pixel0 = avg(BLUE(0,1), RED(0,2))
Temporal FLIPQUAD: In Practice• Non uniform rasterization grid may result in filterable ‚jigsaw’ pattern
Temporal FLIPQUAD: In Practice• UVs need to be interpolated at SAMPLE positions for super-sampling– Use HLSL interpolator modifiers• sample float2 UV;• Not normalized spatial distances between rasterization samples => wrong derivative calculationDDX = 1.0DDY = 1.6DDX = 0.4DDY = 1.0
Temporal FLIPQUAD: In Practice• Mip map selection needs special care:– Use tex2Dgrad with analytical gradients– Manually average gradients inside quad– Manually pick samples within quad for uniform gradients– Adjust sample order/positions to minimize temporal changes of distances• Default solution
Frame ACorrect Mip
Frame B OversharpenedMip
Frame A CorrectMipReordered samples
Frame B CorrectMipReordered samples
Temporal Anti Aliasing• History exponential buffer• Amortize sudden visual changes (flicker)• Accumulate as much new ‘important’ data as possible• Use frequency based acceptance metric• Operate on fresh data neighborhood (3x3 window)– History sample close to mean doesn’t bring new information– History sample further away brings more information– History sample too far might be a fluctuation• Use local minima / maxima for soft bounds
Frame NTemporarily StableEdge Anti-aliasingFLIPQUAD Reconstruction&Temporal Anti-aliasingFrame N-1Frame N-2Accumulation History BufferStable Super-sampled Frame
Temporal Anti Aliasing: In Practice• Use exponential history buffer for stabilization– Not robust enough for real sample accumulation (Super-sampling)– Impossible to keep uniform sample weights• With removal of stale data– Convergence impossible• Long history requires a lot of resampling• Leads to numerical diffusion– Overblurring
Higher Order Resampling• Reprojection = resampling problem• Non-fraction offsets result in numerical diffusion• Especially evident in history buffers– Error accumulates over time• Equivalent to problem of advection in discrete fluid simulation
2nd Order Resampling: Mac Cormack• Mac Cormack Scheme [Dupont 03]• 1 – project value into future N+1• 2 – reproject back into N– Reprojected value has double accumulated error of projection method used• 3 – correct value by half accumulated error• 4 – project corrected value into N+1
2nd Order Resampling : BFCEE• Back Forth Error Correction & Compensation [Selle 07]• 1 – project value into future N+1• 2 – reproject back into N– Reprojected value has double accumulated error of projection method used• 3 – correct projected value by half accumulated error
2nd Order Resampling: GPU BFCEE• GPU Optimized BFCEE• 1 – project value into future N+1• 2 – reproject back into N– Reprojected value has double accumulated error of projection method used• 3 – project reprojected value into N+1– Triple accumulated error• 4 – correct projected value by half accumulated error between projected and double projected value
• Use per Sample interpolation mode – super-sampling on texture data• Not normalized spatial distances between rasterization samples => wrong derivative calculation• Mip map selection need special care– tex2Dgrad with analytical gradients• Set sample order to minimize temporal changes of distancesBilinear : Continuous resampling 30 framesShader BFECC: Continuous resampling 30 frames
HRAA: Final Implementation• Temporarily Stable Edge Anti-aliasing– SMAA (Normal + Depth + Luma Predicated Thresholding)– CRAA– AEAA (GBAA)• Temporal FLIPQUAD Reconstruction combined with Temporal Anti-aliasing (TAA)– TFQ + TAA
HRAA: FC4 Final Implementation• Temporarily Stable Edge Anti-aliasing– Non obvious choice• SMAA + AEAA on Alpha Test– Most reliable, reasonable performance• CRAA + AEAA on Alpha Test– Best performance, some content issues
1x TFQ TFQ + AEAATFQ + CRAATFQ + SMAA
1x TFQ TFQ + AEAATFQ + CRAATFQ + SMAA
Single Pass Timing (ms) GBuffer Overhead (%)BFECC Single Value 0.3 N/ATemporal FLIPQUAD (TFQ) 0.2 N/AAEAA 0.25 <1% C8xCRAA 0.25 <8% HW/CSMAA 0.9 N/ATAA 0.6 N/ATFQ + TAA 0.62 N/AFull Method AEAA(Alpha Test) + 8xCRAA + TFQ + TAA0.9 <3% HW/CSMAA + TFQ + TAA 1.4
HRAA: Hi Frequency Recovery• FLIPQUAD BOX resolve kernel– Results in 0.5 blur– Art direction ‘might’ find it objectionable• Super-sampling requires wider, complex kernel to preserve [Burley 07]:– Anti-aliasing– Hi-Frequency details
HRAA: Hi Frequency Recovery• 4-tap Sinc kernel can be approximated by:– Box Blur (FLIPQUAD resolve) – 0.5 pixel radius– Unsharp masking – 0.5 pixel radius• “Arguably” reconstruct detail– Will not introduce aliasing as long as it is inside window of reconstrucion blur kernel– All information exusts in various image frequencies• Negative Mip Bias on all textures to match Super-sampled resolution– Adds detail that would get resolved if super-sampled in single frame
HRAA: Hi Frequency Recovery1x FQ FQ + Unsharp(oversharpened for effect)
MOVIE
HRAA: Summary• Temporarily Stable• Comparable to 4x RGSS• Cut to fit your needs• Fast• Doesn’t solve all problems – sub-pixel content still problematic• Provides some new ideas and solutions to your AA toolbox
HRAA: Future Direction• Tile based Edge Anti Aliasing– Bigger neighborhood knowledge guarantees less fail cases– More complex Coverage analisys for better LUT• Augment SMAA with Coverage information• Explore more sampling patterns• Upsampling– Partially trade Super Sampling for higher resolution resolve
Q&ATwitter: @MichalDrobothello@drobot.orgMore details, samples and pictures in upcoming GPU Pro 6 articleGO grab it March 2015
References• [Akenine 03] Akenine-Moller T. 2003, “An Extremely Inexpensive Multisampling Scheme”• [AMD 11] AMD 2011, “EQAA Modes for AMD 6900 Series Graphics Cards”• [AMD 13] AMD, Alnasser M., Sellers G. 2013, “AMD_framebuffer_sample_positions”, OpenGL Extension Registry.• [Burley 07] Burley B. 2007, “Filtering in PRMan”, part of “Renderman Repository”. • [Drobot 14] Drobot M. 2014, “Low Level Optimizations for AMD GCN Architecture”, Digital Dragons 2014. • [Dupont 03] Dupont T., Liu Y. 2003, “Back and forth error compensation and correction methods for removing errors induced by uneven gradients of the level set function”, • [Jimenez 11] Jimenez J., Masia B., Echevarria J., Navarro F., Gutiereez D. 2011, “Practical Morphological Anti-Aliasing.”, GPU Pro 2. AK Peters Ltd., 2011. • [Jimenez 12] Jimenez J., Echevarria J., Gutiereez D., Sousa T., 2012, “SMAA : Enhanced Subpixel Morphological Antialiasing.”, EUROGRAPHICS 2012.• [Laine 06] Laine S. and Aila T. 2006, “A Weighted Error Metric and Optimization Method for Antialiasing Patterns”• [Lottes 09] Lottes T. 2009, “FXAA”, NVIDIA Whitepaper Repository.• [Malan 10] Malan H. 2010, “Edge Anti-aliasing by Post-Processing”, GPU Pro 1, 2010• [Persson 11] Persson E. 2011, “Geometric Buffer Antialiasing”. SIGGRAPH 2011.• [Selle 07] Selle A., Fedkiw R., Kim B., Liu Y., Rossignac J. 2007, “An Uncoditionally Stable MacCormack Method”.• [Valient 14] Valient M. 2014, Taking Killzone Shadow Fall Image Quality into the Next Generation”, Game Developer Conference 2014.
Special Thanks• In Random Order• Ubisoft 3D Teams:– Stephen Hill– Urlich Haar– Jeremy Moore– Bartlomiej Wronski• AMD:– Layla Mah– Chris Brennan• Microsoft:– David Cook• MY TURTLE• BEER