# **CASPER Workshop** # **Tutorial 2: 10GbE Interface** **Dev. By:** Jason Manley and Andrew Martens Doc. By: Irappa M. Halagali gmrt/ncra/tifr Expected completion time: 2hrs #### Contents: 1. The Hardware and Software required for this tutorial - 2. Introduction - 3. Background - 4. Setup - 5. Creating your design - 6. Control - 7. Receiving data on the PC using CX4 cable - 8. Looking at the data received - 9. Conclusion # 1 The Hardware and Software required for this tutorial. PC : Dell Intel(R) Core(TM) i3 CPU 530 @ 2.93GHz width 64 bit & 4GB RAM OS : Linux 2.6.35-30-generic #54-Ubuntu 10.10 SMP x86\_64 GNU/Linux Matlab : 2008a Xilinx : version 11.5 Casper : gits\_100511 corr pack's: corr-0.6.5 7. Python : 2.6 8. minicom: ver 2.4 (compiled on Jun 3 2010) 9. Wireshark: Running on Linux 2.6.35-30-generic, with libpcap version 1.1.1, GnuTLS 2.8.6, Gcrypt 1.4.5. 10. TCPdump: tcpdump-4.1.1 **11.** GULP : version 2.0, January 2004 **12.** iperf : version 2.0.4 (7 Apr 2008) pthreads **13.** jperf : version 2.0.2 **14.** ROACH unit : version 1.0 Rev 3 2009 , uboot : <u>uboot-2010-07-15-r3231-dram</u> , Linux Kernel Image : <u>ulmage-iiffy-20091110</u> **15.** 10Gbe Myricom card installed in the PC with drivers. # 2 Introduction In this tutorial, you will create a simple Simulink design which uses the ROACH's 10GbE ports to send data at high speeds to another port. This could just as easily be another ROACH board or a computer with a 10GbE network interface card. In addition, we will learn to control the design remotely, using a supplied Python library for KATCP. # 3 Background ROACH boards have four CX-4 ports. There are two 156.25MHz crystals on the board. Each one clocks two ports, 0 & 1 and 2 & 3. This clock is then multiplied up on the FPGA by a factor of 20. Each port has 4 channels running in parallel (hence the digit in CX-4). Thus, the speed on the wire is actually $4 \times 156.25$ MHz $\times 20 = 12.5$ Gbps. However, 10GbE uses 8/10 encoding, which means that for every byte sent, 10 bits are actually transmitted. This is to ensure proper clocking, since the receiver recovers and locks-on to the transmitter's clock and requires edges in the data. Imagine transmitting a string of 0xFF or 0b11111111... which would otherwise generate a DC level on the line, now an extra two bits are introduced which includes a zero bit which the receiver can use to recover the clock and byte endings. See <a href="http://en.wikipedia.org/wiki/8b/10b\_encoding">http://en.wikipedia.org/wiki/8b/10b\_encoding</a> for more information. For this reason, we actually get 12.5Gbps \* 8/10 = 10Gbps usable datarate. CASPER's 10GbE Simulink core sends and receives UDP over IPv4 packets. These IP packets are wrapped in Ethernet frames. Each Ethernet frame requires a 38 byte header, IPv4 requires another 20 bytes and UDP a further 16. So, for each packet of data you send, you will incur a cost of at least 74 bytes. I say at least, because the core will zero-pad some headers to be on a 64-bit boundary. You will thus never achieve 10Gbps of usable throughput. The best we have managed without loss is ~9.5Gbps. It pays to send larger packets if you are trying to get higher throughputs. The maximum payload length of the CASPER 10GbE\_v2 core is 8192 bytes (implemented in BRAM) plus another 512 (implemented in distributed RAM) which is useful for an application header. These ports (and hence part of the10 GbE cores) run at 156.25MHz, while the interface to your design runs at the FPGA clock rate (*sys\_clk*, *adcX\_clk* etc). The interface is asynchronous, and buffers are required at the clock boundary. For this reason, even if you send data between two ROACH boards which are running off the same hard-wired clock, there will be jitter in the data. A second consideration is how often you clock values into the core when you try to send data. If your FPGA is running faster than the core, and you try and clock data in on every clock cycle, the buffers will eventually overflow. Likewise for receiving, if you send too much data to a board and cannot clock it out of the receive buffer fast enough, the receive buffers will overflow and you will lose data. In our design, we are clocking the FPGA at 200MHz, with the cores running at 156.25MHz. We will thus not be able to clock data into the TX buffer continuously for very long before it overflows. If this doesn't make much sense to you now, don't worry, it will become clear after you've tried it. # 4 Setup The lab at the workshop is pre-configured with the CASPER libraries, Matlab and Xilinx tools. Please refer the file "LOCATIONSandFILES.pdf" in the home/Desktop area or LOCATIONSandFILES slides displayed, for the locations/directories and files information required in the tutorial.Note: The Date and Time portion of the BOF file name will be different! It depends upon when (Date & Time) you complile your model file! Note: All the following cable connections and entries in the /etc/\* files of the workshop PCs are already done. - 1. Connect the Serial port cable between the ROACH board's P2 connector and serial port of the PC (on which minicom program exists). - 2. Connect the Ethernet cable to J25 port of the ROACH board from the PCs eth1 port. /etc/ethers file should have mac address and corresponding ip address. In the /etc/network/interfaces file, eth1 should be configured. And in the file /etc/hosts, ip address and corresponding roach board(host) name entry to be done. - 3. a. To test the 10Gbe data Transfer and Receive in the loop back mode: Connect the CX4-10Gbe cable between Port 0 (rightmost bottom if we view the ROACH UNIT from the - Connect the CX4-10Gbe cable between Port 0 (rightmost bottom if we view the ROACH UNIT from the backside) to the Port 3 (bottom one, adjacent to J25 connector). - b. To test the 10Gbe data Transfer and Receive between ROACH unit and PC: Connect the CX4-10Gbe cable between Port 0 (rightmost bottom if we view the ROACH UNIT from the backside) to the connector on the 10Gbe Myricom card installed in the PC. 4. Either copy the mdl file "[TUT2\_MDL\_FILE]" from the the area "[STD\_MDL\_DIR]" or follow the steps given below to create the mdl file one similar to the file "[STD\_MDL\_DIR]/[TUT4\_MDL\_FILE]" after creating your own directory at "[USER\_DIR]" to save and compile your model file or the bof file is kept in the area "[FPGA\_PROG\_BOF\_DIR]/[TUT4\_BOF\_FILE]" to directly program( using the python script explained in "Control") the FPGA and look at the results. ### 5. Start the matlab: \$cd [MATLAB\_START\_DIR] [MATLAB\_START\_DIR]\$./[MATLAB\_START\_FILE] & # 5 Creating Your Design In this tutorial, a counter will be transmitted through one CX-4 port and back into another. This will allow a test of the communications link in terms of performance and reliability. This test can be used to test the link between boards and the effect of different cable lengths on communications quality. ### 5.1 Create a new model Start Matlab and open Simulink (either by typing 'simulink' on the Matlab command line, or by clicking on the Simulink icon in the taskbar). Create a new model and add the *Xilinx System Generator* and *XSG core config* blocks as before in Tutorial 1.Specify *sys\_clk\_2x* (200MHz clock derived from 100MHz onboard clock) as the clock source in the *XSG core config* block. ### 5.2 Add reset logic Illustration 1: Reset logic A very important piece of logic to consider when designing your system is how, when and what happens during reset. In this example we shall control our resets via a software register. We shall have two independent resets, one for the 10Ge cores which shall be used initially, and one to reset the user logic which may be used more often to restart the user part of the system. Construct reset circuitry as shown below. ### 5.2.1 Add a software register Use a software register yellow block from the *BEE\_XPS System Blockset* for the rst block. Rename it to rst. Configure the *I/O direction* to be *From Processor*. Attach a *Constant* block from the *Simulink->Sources* section of the Simulink Library Browser to the input of the software register and make the value 0. #### 5.2.2 Add Slice blocks Add two *Slice* blocks from the *Xilinx Blockset*. Configure the one to be used to reset the 10Ge core to use the 2<sup>nd</sup> least significant bit as shown below. Illustration 2: core\_rst setup The *Slice* block to use the least significant bit is configured in a similar way except that the *Offset of bottom bit* should be 0. #### 5.2.3 Add Goto blocks Add two *Goto* blocks from *Simulink->Signal Routing*. Configure them to have the tags as shown (*core\_rst* and *cnt\_rst*). These tags will be used by associated *From* (also found in *Simulink->Signal* Routing) blocks in other parts of the design. These help to reduce clutter in your design and are useful for control signals that are routed to many destinations. They should not be used a lot for data signals as it reduces the ease with which data flow can be seen through the system. ## 5.3 Add 10Ge and associated registers for data transmission We will now add the 10Ge block to transmit a counter at a programmable rate. #### 5.3.1 Add a 10Ge block for data transmission Illustration 3: 10Ge transmit block configuration Add a *ten\_GbE\_V2* yellow block from the *BEE\_XPS System Blockset*. It will be used to transmit data and we shall add another later to receive data. Rename it gbe0. Double click on the block to configure it and set it to be associated with CX-4 port 0. If your application can guarantee that it will be able to use received data straight away (as our application can), shallow receive buffers can be used to save resources. This optimisation is not necessary in this case as we will use a small fraction of resources in the FPGA. ### 5.3.2 Add registers to provide the target IP address and port number Add two yellow-block software registers to provide the destination IP address and port number for transmission with the data. Name one dest\_ip and the other dest\_port. The registers should be configured to receive their values from the processor. Connect them to the appropriate inputs of the gbe0 10Ge block as shown. A *Slice* block is required to use the lower 16 bits of data from the dest\_port register. *Constant* blocks from *Simulink->Sources* with 0 values are attached to the simulation inputs of the software registers. The destination port and IP address are not important in this system as it is a loopback example. Add a *From* block from *Simulink->Signal Routing* and set the tag to use *core rst*, this enables one to reset the block. ## 5.4 Create a subsystem to generate a counter to transmit as data We will now implement logic to generate a counter to transmit as data. [core\_rst] led up led\_rx tx data led\_tx tx\_afull tx overflow sim in reg in tx dest ip rx\_data dest\_ip rx\_valid reg\_in **▶** [a:b] rx\_source\_ip dest\_port nx source port tx\_end\_of\_frame rx\_end\_of\_frame rx bad frame rx overrun overrun ack gbeO Illustration 4: 10Ge transmission block and registers ## 5.5 Construct a subsystem for data generation logic It is often useful to group related functionality and hide the details. This reduces drawing space and complexity of the logic on the screen, making it easier to understand what is happening. Simulink allows the creation of *Subsystems* to accomplish this. These can be copied to places where the same functionality is required or even placed in a library for use in other projects and by other people. To create a subsystem, one can highlight the logical elements to be encapsulated, then right-click and choose *Create Subsystem* from the list of options. You can also simply add a *Subsystem* block from *Simulink->Ports & Subsystems*. Subsystems inherit variables from their parent system. Simulink allows one to create a variable whose scope is only a particular subsystem. To do this, right-click on a subsystem and choose the *Create Mask* option. The mask created for that particular subsystem allows one to add parameters that appear when you double-click on the icon associated with the subsystem. The mask also allows you to associate an initialisation script with a particular subsystem. This script is called every time a mask parameter is modified and the *Apply* button clicked. It is especially useful if the internal structure of a subsystem must change based on mask parameters. Most of the interesting blocks in the CASPER library use these initialisation scripts. Drop a subsystem block into your design and rename it *pkt\_sim*. Then double-click on it to add logic. ## 5.6 Add a counter to generate a certain amount of data Add a *Counter* block from *Xilinx Blockset->Basic Elements* and configure it to be unsigned, free-running, 32-bits, incrementing by 1 as shown. Add a *Logical* block, *software register* and *Constant* block as shown. In simulation this circuit will generate a counter from 0 to 49 and then stop counting. This will allow us to generate 50 data elements before stopping. ### 5.7 Add a counter to limit the data rate As mentioned earlier in this tutorial, it is impossible to supply data to the 10Ge transmission block at the full clock rate. This would mean transmitting a 64-bit word at 200MHz, and the 10Ge standard only supports up to 156.25MHz data transmission. We thus want to generate data in bursts such that the transmission FIFOs do not overflow. We thus add circuitry to limit the data rate as shown below. The logic that we have added on the left generates a reset at a fixed period determined by the software register. This will trigger the generation of a new packet of data as before. In simulation this allows us to limit the data rate to 50/200 \* 200MHz = 50MHz. Using these values in actual hardware would limit the data rate to (50/(8/10\*156.25)) = 4Gbps. Counter3 (Xilinx Counter) - | D | X | Hardware notes: Free running counters are the least expensive in hardware. A count limited counter is implemented by combining a counter with a comparator. Basic Advanced Implementation Counter type: Count to value 1023 Count direction: 50 sim\_in reg in C Up/Down im\_length Initial value 0 Output Precision Output type: C Signed (2's comp) © Unsigned Number of bits 32 Binary point 0 Optional Ports Provide load port Provide synchronous reset port Provide enable port Explicit Sample Period 1 Explicit period OK Cancel Help Apply Illustration 5: Data count configuration Illustration 6: Payload counter and period counter ## 5.8 Finalise logic including counter to be used as data We will now finalise the data generation logic as shown below. To save time, use the existing logic provided with the tutorial. Counter1 in the illustration generates the actual data to be transmitted and the enable register allows this data stream to the transmitting 10Ge core to be turned off and on. Logic linked to the eof output port provides an indication to the 10Ge core that the final data word for the frame is being sent. This will trigger the core to begin transmission of the frame of data using the IP address and port number specified. Illustration 7: Full data generation logic To limit verbosity, major concepts and specific blocks of Tutorial 2 will be discussed in the following pages. # 5.9 Receive blocks and logic The receive logic is is composed of another 10Ge yellow block with the transmission interface inputs all tied to 0 as no transmission is to be done, however Simulink requires all inputs to be connected. Connecting them to 0 should ensure that during synthesis the transmission logic for this 10Ge block is removed. ## 5.10 Buffers to capture received and transmitted data The transmission logic and receive interface of the 10Ge block receiving data are both connected to *snap64* blocks (located in *CASPER DSP Blockset->scopes*) that have been modified to capture status information associated along with the data. These blocks (snap\_gbe0\_tx and snap\_gbe3\_rx) are identical internally. Using these blocks we can capture data as it is transmitted and compare it to the data we receive. Snap and snap64 are a standard way of capturing snapshots of data in the CASPER tool-set. A snap block contains a single shared BRAM allowing capture of 32-bit words while a snap64 contains two shared BRAM blocks supporting capture of 64-bit data. In this example, bram\_oob along with the data path supplying it have been added to allow the capture of status information associated with the data. Illustration 8: Modified snap64 The *ctrl* register in a snap block allows control of the capture. The least significant bit enables the capture. Writing a rising edge to this bit primes the snap block for capture. The 2<sup>nd</sup> least most significant bit allows the the choice of a trigger source. The trigger can come from an external source or be internal and immediately. The 3<sup>rd</sup> most least significant bit allows you to choose the source of the valid signal associated with the data. This may also be supplied externally or be immediately enabled. The basic principle of the snap block is that it is primed by the user and then waits for a trigger at which point it captures a block of data and then waits to be primed again. Once primed the addr output register returns an address of 0 and will increment as data is written into the BRAMs. Upon completion the addr register will contain the final address. Reading this value will show that the capture has completed and the results may be extracted from the shared BRAMs. ## 5.11 LEDs and status registers We shall now look at some registers and LEDs to monitor the progress of our data transfer. We shall be able to check if the link on each 10Ge ports is up, whether transmission is taking place, if our buffers have overflowed etc via registers. ## 5.11.1 Transmission registers and LEDs - *gbe0\_tx\_cnt* is attached to a counter that increments when the end-of-frame signal is raised on the input to the transmitting 10Ge block. It keeps a count of the number of frames transmitted. - led0\_gbe0\_pulse\_tx is attached to the 16<sup>th</sup> bit of this counter and will give a visual indication of data transmission. - *gbe0\_linkup* is a register that allows us to check if the transmitting 10Ge block is available for data transfer. - *gbe0\_rx* is a register that we can poll to see if the transmitting core is receiving any data, which should not be the case in this design except for house-keeping traffic. - *gbe0\_tx* can be polled to check for data transmission. - *gbe0\_tx\_full* is a useful register allowing us to see when the FIFOs in the transmitting core are almost full, indicating that we are close to overflowing our transmission buffers. - gbe0\_tx\_over indicates to us that transmission FIFOs have in fact overflowed and we have lost data. - led1\_gbe0\_tx\_over is a visual indication for the user of overflows in the transmission FIFOs. ### 5.11.2 Receiving registers and LEDs - gbe3\_linkup is a register that allows one to determine if the receiving 10Ge block is attached to a viable communications medium. - *gbe3\_rx* allows monitoring of whether the receiving 10Ge block is receiving data. - *gbe3\_tx* allows you to check whether the receiving 10Ge is transmitting data. This should be rare and should only occur during house-keeping operations such as ARP. - gbe3 tx full allows you to check whether the transmit FIFOs are almost full. - gbe3 tx over reflects whether the transmission FIFOs hae overflowed. - *led3\_gbe3\_rx\_err* is an LED that reflects whether the 10Ge receiving block detected any errors during frame reception. A pulse extender block ensures that the LED turns on for long enough that a user can see when errors occur. - gbe3\_rx\_frame\_err is a register that allows you to check for receive errors via software. - gbe3\_rx\_frame\_cnt is attached to a counter that counts the number of frames received by the receiving 10Ge block. - led2\_gbe3\_pulse\_rx is an LED that gives a visual indication of the rate of frame reception in a similar way to led\_gbe0\_pulse\_tx. # 5.12 Compilation By giving bee\_xps command in the matlab window, we will get a pop-up. Make sure the file displayed in the pop-up is correct and then press RUN to start the compilation. After compilation, it creates a directory named after the model file name without the .mdl extension. There is a sub directory named bit\_files. In this bit\_files directory there are .bit and .bof file. We need the .bof file to program the FPGA. You need to save this .bof file at location [FPGA\_PROG\_BOF\_DIR] . Compiling this design takes approximately 35 minutes. A pre-compiled binary (.bof file) "[TUT2\_BOF\_FILE]" is made available to save time. This is already present on the ROACHs' filesystem area "[FPGA\_PROG\_BOF\_DIR]" as "[TUT2\_BOF\_FILE]". # 6 Control In this next section we shall run tests using our compiled design. We shall learn how the 10Ge interface is managed via tgtap processes. We shall learn about the katcp python library that provides routines to access the ROACH board via the katcp server. ## 6.1 Recap of lessons from Tutorial 1 You should remember from Tutorial 1 that the end of the compilation process for our gateware produces a .bof file. This file can be executed on the ROACH like any other Linux executable by the custom BORPH-enabled kernel running on the PPC. Registers, shared BRAMs etc are available for reading and writing as files in the /proc filesystem. A special telnet-like server called the 'katcp server' provides a simple interface to the outside world. You can telnet to port 7147 on your ROACH and give it commands in the KATCP protocol (try ?help for a list of available commands). This interface allows you to list available .bof files (via ?listbof) execute a .bof file (via ?progdev <filename>), list available registers, memory regions etc for access (via ?listdev), and write and read from these (via ?wordwrite <offset> <value> and ?wordread <offset> <size>). # **6.2 Introduction to Python library for use with KATCP server** ### 6.2.1 Getting the required packages: These are pre-installed on the server in the workshop and you do not need to do any further configuration. Python and C APIs are available for connecting to KATCP servers. A further Python wrapper is available for simplifying communications with the KATCP server on ROACH. The KATCP package is available from the Python Package Index (PyPI) here: <a href="http://pypi.python.org/pypi/katcp">http://pypi.python.org/pypi/katcp</a> and the wrapper is bundled in the CASPER Packetised Correlator control package, corr, found in CASPER SVN here: <a href="http://casper.berkeley.edu/svn/trunk/projects/packetized\_correlator/corr-0.4.0/">http://casper.berkeley.edu/svn/trunk/projects/packetized\_correlator/corr-0.4.0/</a> # **6.3** Testing our design using the provided demonstration python script. The transmission and reception of the data can be tested using - a. loopback between port 0 (rightmost bottom if viewed from the backside of the ROACH board) for transmitting and port 3 ( bottom one adjacent to J25 connector) for receiving the data over ROACH board using CX 4 cable or - b. Connecting the cable between port 0 of ROACH board and 10Gbe connector on the Myricom card installed on the PC for testing the data transfer from ROACH unit to the PC. - c. Copy the bof file to be programed which is compiled by you to the directory [FPGA\_PROG\_BOF\_DIR] after changing the permissions of the file. #### \$ chmod a+x [STD\_BOF\_DIR]/[TUT2\_BOF\_FILE] \$ cp [STD\_BOF\_DIR]/[TUT2\_BOF\_FILE] [FPGA\_PROG\_BOF\_DIR] Get the help for using the demonstration python script, "[TUT2\_PYSCRIPT\_FILE]" on the workshop PC , by the command ; ### \$[STD PYSCRIPT DIR]/[TUT2 PYSCRIPT FILE] -help Usage: [TUT2\_PYSCRIPT\_FILE] <ROACH\_HOSTNAME\_or\_IP> [options] This script demonstrates programming an FPGA, configuring 10GbE cores and checking transmitted and received data using the Python KATCP library along with the katcp\_wrapper distributed in the corr package. Designed for use with TUT3 at the 2009 CASPER workshop. Author: Jason Manley, August 2009. ### Options: - -h, --help show this help message and exit - -p, --plot Plot the TX and RX counters. Needs matplotlib/pylab. - -a, --arp Print the ARP table and other interesting bits. ## 6.4 Testing the tutorial using the Loopback mode. To run the demonstration script to program the FPGA and get the plot of the Tx & Rx data is; [STD\_PYSCRIPT\_DIR]/[TUT2\_PYSCRIPT\_FILE] < roach number > < options > -b < BOF file > [STD\_PYSCRIPT\_DIR]/[TUT2\_PYSCRIPT\_FILE] < roach number > -p -b < BOF file > This script programs the FPGA with the the bof file name followed by the option b or the bof file written in the line number 29 of the demonstration script (boffile = $'[TUT2\_PYSCRIPT\_FILE]'$ ). The option p, plots the Tx & Rxd data. eg. \$ [STD\_PYSCRIPT\_DIR]/[TUT2\_PYSCRIPT\_FILE] roach030172 -p -b [TUT2\_BOF\_FILE] #Enter the corresponding Location/File names and roach name/IP. You should be presented with a crude figure containing two curves as follows: The top curve represents the counter that you transmitted. The lower curve represents the contents of the first 2048 datawords received. Seeing as how you used a loopback cable, we expect these to be the same! If not, call a CASPERite over to help you debug. ## 6.5 What's going on behind the scenes? Execute the script [TUT2\_PYSCRIPT\_FILE] again. This time with the -a flag. This will suppress all the output lines decoding the snap block contents, allowing you to see the initilisation process. You should see something like this: ``` [STD PYSCRIPT DIR]/[TUT2 PYSCRIPT FILE] <roach number> -a -b <BOF file> [STD PYSCRIPT DIR]/[TUT2 PYSCRIPT FILE] roach030172 -a -b [TUT2 BOF FILE] #Enter the corresponding Location/File names and roach name/IP. Connecting to server roach020110 on port 7147... ok _____ Programming FPGA... ok Port 0 linkup: True Port 3 linkup: True ______ Configuring receiver core... done Configuring transmitter core... done _____ Setting-up packet source... done Setting-up destination addresses... done Resetting cores and counters... done Sent 0 packets already. Received 0 packets already. Triggering snap captures... done Enabling output... done Reading 2048 values from bram snap_gbe0_tx_bram_msb... ok Reading 2048 values from bram snap_gbe0_tx_bram_lsb... ok Reading 2048 values from bram snap_gbe0_tx_bram_oob... ok Reading 2048 values from bram snap gbe3 rx bram msb... ok Reading 2048 values from bram snap_gbe3_rx_bram_lsb... ok Reading 2048 values from bram snap_gbe3_rx_bram_oob... ok Unpacking TX packet stream... Unpacking RX packet stream... ``` The script has some Python specific stuff where it defines some variables, the command-line options and some initialisation stuff. As part of this, it will establish a connection to your ROACH of choice using KATCP. Let's try to replicate some of this script's functions manually. Start interactive python by running #### ipython Now import the correlator control library. This will automatically pull-in the KATCP library and any other required communications libraries. #### import corr ## 6.5.1 Connecting to the board To connect to the roach board, we define a new object. Let's call it *fpga*. The wrapper's fpgaclient initiator requires two arguments, the IP address or hostname of the roach board, and the KATCP port. KATCP defaults to port 7147. fpga=corr.katcp\_wrapper.FpgaClient('roach030172',7147) #Change the roach number! The first thing we do is configure the FPGA. This happens on line 134 of the script. fpga.progdev(boffile) ## 6.5.2 Reading from software registers Then we check to make sure that there is actually a cable plugged in, else it's pointless continuing. The 10GbE cores require some time to initialise and establish a link. So we sleep for 4 seconds to allow it to settle. Then we read our two software registers, *gbe0\_linkup* and *gbe3\_linkup*. These should both read 1. Reading and writing registers is easy with the python wrapper. It automatically packs and unpacks the binary data for you. There are a few functions available. For the 32-bit software registers, use read\_int and read\_uint to unpack signed and unsigned numbers respectively. Since these boolean numbers are only ever zero or one, it makes no difference which function you use. Let's try In [3]: fpga.read\_uint('gbe0\_linkup') Out[3]: 1 # 6.5.3 Writing to software registers This is very similar to reading from registers. Python will automatically figure out if you have a negative number that you need a signed number, and otherwise assumes an unsigned number. For example, to set the software register that configures the destination UDP port: fpga.write\_int('dest\_port',60000) # 6.5.4 Configuring the 10GbE cores There is a special KATCP command to start the 10GbE userspace tap driver on ROACH. The call is as follows: fpga.tap\_start(device\_name, mac, ip, port). The device name is the name of your core as defined in Simulink. The MAC address is a 48-bit integer. Unless you have purchased a MAC address, we recommend you use a MAC address reserved for private networks (02:xx:xx:xx:xx). The IP address is the integer representation of the 32-bit address. The port defines the fabric UDP port. Any data received by the core not destined for this port will be redirected to the PPC for processing. This is how ARP and pings are handled (ie not by the FPGA fabric itself). If you disable the CPU interface by un-checking the boxes in the Simulink mask, obviously you will lose this functionality. If you run the script **[TUT2\_PYSCRIPT\_FILE]** with the -a command-line switch, it will decode the 10GbE core's ARP table so that you can see how it has been configured and if it has correctly discovered other nodes on the network. Gateware Port: 60000 ``` Fabric interface is currently: Enabled XAUI Status: 7E Lane sync 0: 1 Lane sync 1: 1 Lane sync 2: 1 Lane sync 3: 1 Chan bond : 1 XAUI PHY config: RX_eq_mix: RX eq pol: TX pre-emph: 0 TX_diff_ctrl: 4 ARP Table: IP: 10. 0. 0: MAC: FF FF FF FF FF 10. 0. 0. 1: MAC: FF FF FF FF FF 10. 0. 0. 2: MAC: FF FF FF FF FF ``` ## 6.5.5 Reading and writing RAM In much the same way as you would read and write registers, there is a function to read or write generic blocks of memory. Unsurprisingly, they are called *read* and *write*. The write function will automatically verify that the writes happened. There are also a *blindread* and *blindwrite* functions, which do not perform this verification which means you can obtain higher performance. For example, to read one of the brams in the snap blocks, you could do *fpga.read(bram\_name,tx\_size)*Where *tx\_size* is in bytes. Python returns binary data in a string, so you need to unpack it into integers, floats, characters etc as required. There is a library called *struct* to do this. Do an *import struct* in iPython and then you can do packing or unpacking of binary data. There is onboard documentation, try *print struct.*\_\_doc\_\_. For example, to unpack a big-endian 32-bit unsigned integer, we could do: struct.unpack('>L',string\_to\_unpack[start\_index:index\_end\_of\_four\_bytes]) ### 6.5.6 Other notes - iPython includes tab-completion. In case you forget which function to use, try typing library\_name.<tab><tab> - There is also onboard help. Try typing library\_name.function? - Many libraries have onboard documentation stored in the string library\_name.\_\_doc\_\_ - KATCP in Python supports asynchronous communications. This means you can send a command, register a callback for the response and continue to issue new commands even before you receive the response. You can achieve much greater speeds this way. The Python wrapper in the corr package does not currently make use of this and all calls are blocking. Feel free to use the lower layers to implement this yourself if you need it! # 7 Receiving data on the PC using CX 4 cable. Now disconnect the CX4 cable's end connected to the port 3 of the ROACH board and connect it to the connector of the 10Gbe Myricom card in the PC. Now run the following command. ``` [STD_PYSCRIPT_DIR]/[TUT2_PYSCRIPT_FILE] < roach number > -a -b < BOF file > eg. [STD_PYSCRIPT_DIR]/[TUT2_PYSCRIPT_FILE] roach030172 -a -b [TUT2_BOF_FILE] ``` #Enter the corresponding Location/File names and roach name/IP. Note: if you run the above command with -p option the plot will appear without any data in the lower half. Because no data received in the port 3, which has been diverted to PC through 10Gbe card. # 8 Looking at the data received. ### 8.1 using WIRESHARK ``` root@$ su root@[USER DIR]$ sudo wireshark & ``` should be root then only it shows all ports! Capture --> Options # Select eth2(10Gbe card). Select "Capture packets in promiscuous mode and enter Stop Capture after XXX (eg 10000) packets. Press start button to start acquiring the packets from eth2. File--> Quit--> quit without saving (or save the data in your directory if required). ### 8.2 using TCPDUMP ``` root@[USER_DIR]$ tcpdump -i eth2 -c 2048 -w [HOME_DIR]/tcpdump_2048.dat ``` tcpdump: WARNING: eth2: no IPv4 address assigned tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes 2048 packets captured 2053 packets received by filter 0 packets dropped by kernel root@\$:/data# # To dump the data from eth2 of counts 2048 and write in the home directory (permission problem in /data area). To move the file to working area: root@\$ mv [HOME DIR]/tcpdump\_2048.dat [USER DIR] ### 8.3 using GULP # 9 Conclusion: we have completed the tuturial 2 and got the experience to acquire the data to PC.