Main / DigitalDesign
Building BlocksThe CLB is the primary block. Each CLB slice contains LUTs, carry blocks, muxes, and FFs. (For Xilinx, they cite 8x6inp LUTs, 8-bit carry, 16 FFs). There's no gate/storage in a LUT, purely combinational. LUTs can be cascaded to add clock delays and pipeline timing balance. A Xilinx SLICEM is a super slice that has extra functions like RAM or shift regs. They can be used as distributed RAM or 32-bit SR. How can you add a simple latch to provide clock relief?There is no flip-flop or latch IP block, but you can create this with the "RAM-based shift register" and customize the output and control signals. BuffersAn IBUFG drives a global clock net from an external pin. A BUFG drives a global clock net from an internal signal. A BUFH is for horizontal row adjacent signals within a region, so you wouldn't use it for global nets. MemoriesDistributed MemoryThe Distributed Memory Generator IP core creates a variety of memory structures using Select RAM. It can be used to create Read Only Memory (ROM), single-port Random Access Memory (RAM), and simple dual/Dual port RAM as well as SRL16-based RAM. FIFOsThe built in block RAM and FIFO primitives in the 7 Series FPGA can be used to implement RAMs, ROMs, and FIFO blocks for a design. The block RAM and FIFO are optimized for performance and allow you to implement a RAM, ROM, or FIFO block in a design without requiring large amounts of fabric resources from slice logic. Timing IntroductionXilinx recommends the "UltraFast" design methodology which maps timing in this order:
What is skew? What is slew? Which process corner(s) are used for setup and hold in IDELAY component modes? What is static timing analysis? What is I/O pin delay? What is the general timing constraints flow? The "Ultra Fast" Design MethodologyClock pins
Data pins
I/F control
How to reach closure and reduce build time?
Build Work FlowAt Synthesis (after Synthesis run is complete, but before Implementation is run):
At Implementation (after Implementation run is complete):
ClocksClocks of same freq will still have differing phases. Note that Vivado default is to use ns timescale. The clock object database is flat, with no concept of hierarchy. A primary clock enters through an input port, sourced externally. A forwarded clock is an internal FPGA logic clock that is driven to an output pin, used as a reference for other outputs. Physically exclusive clocks have the same source point, and same clock tree. Logically exclusive clock groups have different source points, and share part of the clock tree. A virtual clock has no physical connections to ports/pins and is used for I/O delay constraints. A virtual clock is declared with the Tcl create_clock command with no source object specified. These are not connected physically to ports/pins, they don't really exist in the design, but are used for I/O delay constraints. They are considered to represent clocks external to FPGA. https://support.xilinx.com/s/article/55287?language=en_US You may get this warning about a perfectly valid clock name and set_clock_groups command: [Vivado 12-4739] set_clock_groups:No valid object(s) found for '-group clk_fpga_1'. ["/mnt/newHHD/sandbox/midnight_fw/fpga_build/midnight_FPGAsrc/VSG_xilinx_2ch/VSG_xilinx.srcs/constrs_1/new/VSG_xilinx.xdc":312] This may be because of OOC synthesis being used. Once synth is complete, try running the same set_clock_groups command and see if the error is still there. Accordingly, some have suggested this can be ignored for synthesis. Inter/IntraThere are a number of timing constraints that apply to flip-flops (or other elements) that are all relative to the same clock signal. These are called intra-clock constraints, or timing, etc. Sometimes there are timing paths or constraints that cross from one clock to another. These are called inter-clock constraints, or timing. For example, you can have a skew defined for one clock (intra-clock skew) or you can define the skew between two or more different clocks (inter-clock skew). Suggest fixing inter-clock timing violations first. ManagersMMCMs and PLLs live on CMTs. The MMCM and PLL share many characteristics. Both can serve as a frequency synthesizer for a wide range of frequencies and as a jitter filter for incoming clocks. At the center of both components is a voltage-controlled oscillator (VCO), which speeds up and slows down depending on the input voltage it receives from the phase frequency detector (PFD). Visualizing ClocksHandy Tcl command to show system clock schematic: show_schematic [get_cells -hierarchical -filter { PRIMITIVE_TYPE =~ CLK.*.* } ] Groupsset_clock_groups is a commonly used constraint that can associate or dissociate clocks. Paths between dissociated clocks are treated as false paths. It has three mutually exclusive arguments: -physical_exclusive, -logical_exclusive and -asynchronous. What are the differences between the three arguments? The three clock relationships originated from SDC have different impacts on SI crosstalk analysis. From an FPGA timing analysis perspective, the impact would be the same. -asynchronous -logical_exclusive -physical_exclusive FREQ_HZ Disagreements[BD 41-237] Bus Interface property FREQ_HZ does not match between /ecc_proxy_ip_0/ECC_S00_AXI(180000360) and /processing_system7_0/M_AXI_GP1(180000000) O For errors like this, open the IP block for editing. Then go to the Ports and Interfaces listing and edit the interface in question. On the Parameters tab of the box that opens up, you can change the FREQ_HZ value. PS FCLKsClocks called FCLK0-FLCK3 from the PS side of the Zynq for clocking on the PL side. It is convenient to use these FCLKs. However, the jitter associated with these clocks is considerably higher than the jitter for clocks normally used for Programmable Logic (PL). Specifically, the set_input_jitter constraint shown on pg93 of PG082 indicates that FCLK jitter is 0.6ns Pk-Pk. Good clocks have jitter well below 0.1ns Pk-Pk. The higher jitter of the FCLKs could limit the run/clocking speed for your PL-side applications. When you correctly specify clock jitter with the set_input_jitter constraint and correctly specify clock period with the create_clock constraint then Vivado timing analysis will tell you whether your PL-side application will operate properly. -Mark Slack Timing AnalysisWhen the slack is positive, timing is said to be MET: Slack (MET) : 0.111ns (arrival time - required time) If negative, timing is said to be VIOLATED: Slack (VIOLATED) : -0.633ns (required time - arrival time) PathsThe path characteristics fall into four main categories: timing, logic, physical, and property. You can find the definition of each characteristics in the command long help. Tcl Command: report_design_analysis -help TimingThe timing path requirement is typically one clock period for setup/recovery analysis, 0ns for hold/removal analysis, when the startpoint and endpoint are controlled by the same clock, or by clocks with no phase-shift. When the path is between two different clocks, the requirement corresponds to the smallest positive difference between any source and destination clock edges. This value is overridden by timing exception constraints such as multicycle path, max delay and min delay. Paths with setup requirement under 2 ns are difficult to meet and must be avoided in general, especially for the older architectures. The Path Delay = Logic Delay + Net Delay info details the total datapath delay. If the Logic Delay makes up an unusually high proportion of the total datapath delay, for example 50% or higher, it is advised to examine the datapath logic depth and types of cells on the logic path, and possibly modify the RTL or synthesis options to reduce the path depth or use cells with faster delays. If the Net Delay dominates the total path delay for a setup path where the Requirement is reasonable, it is advised to analyze some of the physical characteristics and property characteristics of the path listed in this section. Specific items to look at include the High Fanout and Cumulative Fanout characteristics tounderstand if some nets of the path have a high fanout that could potentially be causing a placement problem. What is high number of logic levels? The Vivado Design Suite router prioritizes fixing hold over setup. This is because your design may work in the lab if you are failing setup by a small amount. There is always the option of lowering the clock frequency. If you have hold violations, the design will most likely not work. FloorplanningFloorplanning can improve the setup slack (TNS, WNS) by reducing the average route delay. During implementation, the timing engine works on resolving the worst setup violations and all the hold violations. Floorplanning can only improve setup slack. StrategiesNote that the qualifiers of Low and High on the NetDelay strategies refer to the priority level, not the delay amount. So if you need shorter net delays, pick NetDelay High. External Device Interface TimingExample from Xilinx course: Estimate if the AD5404 500-Mbps interface can meet the timing for this interface. Compare the input data width and the minimum data window required. The minimum input data window is 1.2 ns wide as determined from the AD5404 datasheet. If TIME mode is used with a 1173-ps delay, the data window required by the FPGA is 1.661 ns: o Total slack available is 1.2 ? 1.661 = -0.461 ns. o Negative slack implies this will not meet the timing. ? At 500 Mbps, the maximum possible input data window is 1/500MHz= 2 ns. This is sufficiently wide to meet the timing. However, the AD5404 consumes 40% of this window for its timing uncertainties, making it more challenging to meet in Static Component /TIME mode. If COUNT mode is used with a 255-tap delay, the data window required by the FPGA is 1.115 ns: o Total slack available is 1.2 ? 1.115 = 0.085 ns. o Since the input data is center aligned (at the input and within the FPGA), this positive slack will divide equally between setup slack and hold slack. This will meet timing with about 49 ps slack for setup and hold each. For the 500-Mbps interface, timing can be closed by using COUNT mode. Estimate if the 900-Mbps interface of the AD5409 can meet the timing for this interface with dynamic delay adjustment. Compare the input data width and the minimum data window required. The minimum input data window is 0.460 ns wide (0.672 ns typical) as found from the AD5409 datasheet. The worst-case data window required by the FPGA is 0.887 ns (Slow corner). There is no positive slack to meet the timing. At 900 Mbps, the maximum possible input data window is 1/900M= 1.1 ns. This is sufficiently wide to meet the timing. However, AD5409 consumes 58% of this window for timing uncertainties, making it more challenging to meet in Component mode. Native mode would need to be considered to meet this timing UltraScale FamilyVirtex and Kintex available as US, but US+ also has Zynq and Artix. Artix is the low-power low-cost chip, which has double the power efficiency and b/w of the 7-series Artix chips. Has three IOB options:
HBM Gen 2 has the highest DRAM b/w available, using SSI technology. Has a hard AXI I/F controller and supports CCIX. Routing delay now dominates overall delay and clock skew consumes more margin than before. Two slice types, SLICEL (LUTs, MUX, CLA) and SLICEM (RAM or 32-bit SR). The traditional unimacros library is not supported for US, use XPM instead. RAM OptionsDistributed RAM is created with LUTs and is faster/smaller than BRAM but increases chip utilization. BRAMs are built into fabric, can be powered down when not in use, dynamically, with persistent contents. Up to 36Kb each block. Can be cascaded and has integrated error correction. Often used for FIFO implementation. UltraRAM is for larger amounts of data. 288 Kb/block (72 bits by 4096 deep). Optional 64-bit ECC and sleep mode. 16 URAM blocks per clock region per column. RoT for design is to target URAM only if you need 144 Kb or more (more than four BRAMs). I/O Resourcestypes are IOB, ILOGIC/ISERDES, OLOGIC/OSERDES, IODELAY For DDR4, up to 2400 Mb/s speed. Component mode vs Native mode: Migration from 7-series Designs to US+Two options for IP, managed or project local. The former is preferred. A managed migrating IP project needs to be moved before the FPGA design is moved. For clock optimization, consider merging clocks sharing freq+phase, remove redundant buffers, consider replacing MMCM with BUFGCE_DIV or BUFG_GT for simple divided clocks. In timing analysis, intra-clock "partial false paths" may be reported due to native IP generating false path specs, which is fine. Logic and Digital Electronics Principleshttp://en.wikipedia.org/wiki/Four_value_logic Note that you cannot simply route bi-directional "pass-thru" lines through an FPGA between devices that need to talk to one another directly. You can't just use the FPGA as a wire connector. This works fine for unidirectional signals, but the problem with bi-directional signals is the FPGA has an input or output buffer for each pin or a dual buffer for bi-directional lines. A signal controls how the pin is driven, in or out, and if the FPGA doesn't have smarts about who is supposed to be driving the line at the proper instant it won't know what to do. You'd have to create a state machine in the fabric to handle this. A good illustration of this is using I2C because it has a bi-directional data/address line SDA. Related to this issue, inout ports are an interesting special case. They are typically used only in the top level component at the chip edges. You'll need an IO buffer that can tri-state the pin, such as this VHDL example: IOBUF_inst : IOBUF generic map ( DRIVE => 12, IOSTANDARD => "DEFAULT", SLEW => "SLOW") port map ( O => RIGHT_CONNECTOR_A1_O, -- Buffer output IO => RIGHT_CONNECTOR_A(1), -- Buffer inout port (connect directly to top-level port) I => RIGHT_CONNECTOR_A1_I, -- Buffer input T => RIGHT_CONNECTOR_A1_T -- 3-state enable input, high=input, low=output ); |