d-06 clock tree synthesis of smic40nm low leakage cortex a9 with cadence ccopt_brite semi.pdf

Upload: meenakshi-snmurthy

Post on 06-Jul-2018

247 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    1/12

    Clock Tree Synthesis of  SMIC40nm 

    Low Leakage Cortex A9 With 

    Cadence CCopt

    Brite

     Arthur Liang, Titan Wang

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    2/12

    Design overview

    •   ARM dual‐core Cortex A9

    •   32K i‐cache and 32K D‐cache, includes Neon

    •   Use SMIC 40nm low leakage process

    •   Implementation with Cadence CCopt

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    3/12

    CCopt Flow Methodology

    Traditio nal EDI Balanced

    Clocks Flow

    RTLRTLRTL

    SynthesisSynthesis

    NetlistNetlistNetlist

    PlacementPlacement

    Routing &Post-route opt

    Routing &Post-route opt

    GDSIIGDSIIGDSII

    CTSCTS

    Pre-CTS OptPre-CTS Opt

    Post-CTS OptimizationPost-CTS Optimization

    New CCopt

    Flow

    RTLRTLRTL

    SynthesisSynthesis

    NetlistNetlistNetlist

    PlacementPlacement

    Routing &

    Post-route opt

    Routing &Post-route opt

    GDSIIGDSIIGDSII

    CCOptClock Concurrent Optimization

    CCOpt

    Clock Concurrent Optimization

    Pre-CTS OptPre-CTS Opt

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    4/12

    G[i‐1]max   G[i]max

    T

    clock

    P[i‐1]

    P[i]

    P[i+1]

    Critical path

    • Many iterations

    • Excessive run time

    • Area explosion

    • Higher leakage

    Traditional 

    EDI 

    CTS 

    Methodology

    UnnecessaryNo fundamental timing 

    requirement that clocks 

    need to be balanced

    Balanced CTS

    Expensive• Clock buffer explosion to 

    minimize skew

    • Other expensive options (e.g, 

    mesh, spine, ..)

    Severe IR DropAll flops/RAMs forced to 

    trigger at the same time

    Traditional Timing Optimization

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    5/12

    G[i‐1]max

    G[i]max

    T

    clock

    P[i‐1]   P[i+1]

    CCopt

    Time borrowing

    • Faster timing closure

    • Higher performance

    • Lower Area

    • Lower leakage

    CCopt ‐ Clock Concurrent Optimization Flow

    Lower IR Drop• Flops/RAMs triggered at 

    different times

    • Critical and non‐critical sinks 

    are skewed

    Efficient• Significant reduction in clock 

    buffers (no explicit requirement to 

    balance Tree)

    MM/MC/OCVUseful‐skew takes into all 

    timing aspects including MM, 

    MC, OCV, setup, hold 

    P[i]

    Concurrent useful‐skew and datapath optimization

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    6/12

    clock

    variable

    skew

    Gmax

    Gmax 

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    7/12

    A9 CPU

     Snapshot

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    8/12

    Reference CCopt

     Script

    •   setCCOptMode \‐cts_buffer_cells {BUF_X16B_A12TR40 BUF_X13B_A12TR40 BUF_X11B_A12TR40 

    BUF_X6B_A12TR40} \

    ‐cts_inverter_cells {INV_X16B_A12TR40 INV_X13B_A12TR40 INV_X11B_A12TR40 

    INV_X6B_A12TR40} \

    ‐cts_clock_gating_cells { PREICG_X11B_A12TR40 } \

    ‐cts_target_slew 0.08 \

    ‐cts_target_nonleaf_slew 0.08 \

    ‐cts_target_skew 0.15 \

    ‐io_opt off  \

    ‐ccopt_auto_limit_insertion_delay_factor 1.2 \

    ‐ccopt_enable_downsizer true \

    ‐erc fix \

    ‐cts_use_inverters true

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    9/12

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    10/12

    CCopt Clock

     Tree

     Summary

    STA Timing Summary With CCopt

    STA Timing Summary With Traditional CTS Flow

    Clock Tree Name : "CLK"

    Clock Period : 1.10000

    Number of Levels : 21

    Number of Sinks : 54562

    Number of CT Buffers : 1262

    Total Area of CT Buffers : 4689.66

    Max Global Skew : 0.2268

    Clock Tree Name : "CLK"

    Clock Period : 1.10000

    Number of Levels : 20

    Number of Sinks : 54562

    Number of CT Buffers : 1178Total Area of CT Buffers : 4716.22

    Max Global Skew : 0.1356

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    11/12

    Conclusion•   Ccopt is able to determine the proper clock 

    offsets  – instead of  manually skewing a clock 

    in an iterative process

    •   Have increased A9 cpu frequency

    •   Can reduce

     clock

     tree

     buffer

    •   Ccopt is internally making tradeoffs between 

    timing/power/schedule

  • 8/17/2019 D-06 Clock Tree Synthesis of SMIC40nm Low Leakage Cortex A9 with Cadence CCopt_Brite Semi.pdf

    12/12

    Thanks