Overview
On the NetTLP platform, you can develop your PCIe devices in software interacting with hardware root complexes. For an example of such a software device, we introduce an implementation of simple NIC. The simple NIC, originally introduced by pcie-bench, is a theoretical model of a simplistic Ethernet NIC, which does not have any performance optimizations and complicated DMA interfaces, for understanding PCIe interactions and calculating bandwidth. NetTLP enables implementing such models of PCIe devices in software and confirming whether the models actually work with existent hardware root complex. In other words, the PCIe devices are certainly implemented in software, but they perform as actual devices for physical root complexes. Moreover, their PCIe transactions can be observed by tcpdump.
The PCIe interactions of the simple NIC model are described in the model's implementation, actually, here: https://github.com/pcie-bench/pcie-model/blob/master/model/simple_nic.py. Our simple NIC implementation in NetTLP follows the steps of the PCIe interactions described in the source code. Anyway, this page focuses on how to set up and run the simple NIC on the NetTLP platform.
Setup
To run the simple NIC, please set up a NetTLP environment following the instruction and confirm the example applications work.
The simple NIC implementation is composed of two components among both adapter and device hosts: (1) a device driver for the NetTLP adapter and (2) the software device implementation for simple NIC using LibTLP. The driver treats the NetTLP adapter as a physical Ethernet NIC and creates a netdevice interface (like usual Ethernet interfaces in Linux as ifconfig command shows). Packets transmitted to the interface are delivered to the device host as TLPs over the PCIe links between the root complex and the NetTLP adapter and the Ethernet link between the adapter host and device host. The simple NIC implementation receives the TLPs using LibTLP and sends the packets to a tap interface.
Figure 1 shows an example setup of the simple NIC. Descriptions of network interfaces in the figure are:
- Adapter Host
- eth0: just a management network. A requirement is the adapter host must be able to communicate with the device host via this link.
- eth2: netdevice interface for the simple NIC. The simple NIC driver for the NetTLP adapter creates this interface. Packets transmitted to eth2 are delivered to the device host by encapsulating TLPs in Ethernet via the 10G Ethernet link between the adapter and the device host.
- Device Host
- eth0: just a management network. A requirement is the device host must be able to communicate with the adapter host via this link.
- eth1: a netdevice interface to receive and send TLPs with the NetTLP adapter. It is eth1 in the setup instruction.
- tap0: the actual ethernet port of the simple NIC. The simple NIC process on the device host creates this tap interface. Packets transmitted to eth2 (the NetTLP adapter) on the device host are transmitted from this tap interface on the device host. As well as the transmission, packets received by tap0 are received by eth2 on the adapter host.
- br0: just a target interface to send ping. tap0 is enslaved by br0. As a result, packets transmitted to eth2 on the adapter host are transmitted from tap0 and received by br0. It means that we can ping between eth2 on the adapter host (10.0.0.1) and br0 on the device host (10.0.0.2).
The implementation of simple NIC, which is illustrated as simple-nic in Figure 1, is responsible for the substance of the NIC represented as eth2 on the device host. simple-nic receives and sends TLPs from and to the NetTLP adapter through LibTLP. The interactions between simple-nic and the NetTLP adapter fully follow the simple NIC model, and they can be observed by tcpdump at the 10G Ethernet link.
Adapter host
First, install the simple NIC driver at the adapter host.
$ ls
libtlp
$ git clone https://github.com/nettlp/simple-nic
Cloning into 'simple-nic'...
remote: Enumerating objects: 112, done.
remote: Counting objects: 100% (112/112), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 112 (delta 43), reused 98 (delta 29), pack-reused 0
Receiving objects: 100% (112/112), 31.81 KiB | 552.00 KiB/s, done.
Resolving deltas: 100% (43/43), done.
$ ls
libtlp simple-nic
$ cd simple-nic/
$ ls
device driver include
$ cd driver/
$ ls
Makefile nettlp_msg.c nettlp_msg.h nettlp_snic.c
$ make
make -C /lib/modules/4.20.2-tsukumo1-nopti/build M=/home/upa/work/test/nettlp/simple-nic/driver V=0 modules
make[1]: Entering directory '/home/upa/src/linux-4.20.2'
WARNING: Symbol version dump ./Module.symvers
is missing; modules will have no dependencies and modversions.
CC [M] /home/upa/work/test/nettlp/simple-nic/driver/nettlp_snic.o
CC [M] /home/upa/work/test/nettlp/simple-nic/driver/nettlp_msg.o
LD [M] /home/upa/work/test/nettlp/simple-nic/driver/nettlp_snic_driver.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/upa/work/test/nettlp/simple-nic/driver/nettlp_snic_driver.mod.o
LD [M] /home/upa/work/test/nettlp/simple-nic/driver/nettlp_snic_driver.ko
make[1]: Leaving directory '/home/upa/src/linux-4.20.2'
$ ls
Makefile nettlp_msg.c nettlp_snic.c nettlp_snic_driver.mod.c
Module.symvers nettlp_msg.h nettlp_snic.o nettlp_snic_driver.mod.o
modules.order nettlp_msg.o nettlp_snic_driver.ko nettlp_snic_driver.o
$ sudo insmod nettlp_snic_driver.ko
Then, you can see dmesg output and a netdev interface representing the simple NIC.
[1382475.456615] nettlp_snic_driver: nettlp_snic (v0.0.1) is loaded
[1382475.456710] nettlp_snic_driver: nettlp_snic_pci_init: register nettlp simple nic device 0000:1b:00.0
[1382475.456722] nettlp_snic_driver 0000:1b:00.0: enabling device (0100 -> 0102)
[1382475.456861] nettlp_snic_driver: BAR4 0xb0000000 is mapped to 00000000fdb0aee4
[1382475.456871] nettlp_snic_driver: BAR0 0xc0100000 is mapped to 0000000003c10e6c
[1382475.456904] nettlp_snic_driver: BAR2 0xa0000000 is mamped to 000000007bf72a9a
[1382475.456923] nettlp_snic_driver: nettlp_snic_pci_init: rx_buf is 0x2f003000
[1382475.456934] nettlp_snic_driver: nettlp_snic_init
[1382475.457303] initialize nettlp message socket
[1382475.457357] nettlp_snic_driver: nettlp_snic_pci_init: probe finished.
[1382475.457359] nettlp_snic_driver: nettlp_snic_pci_init: tx desc is 0x2f001000, rx desc is 0x2f002000
[1382475.466553] nettlp_snic_driver 0000:1b:00.0 enp27s0: renamed from eth0
$ ifconfig enp27s0
enp27s0: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 00:11:22:33:44:55 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ ethtool -i enp27s0
driver: nettlp_snic_driver
version:
firmware-version:
expansion-rom-version:
bus-info: 0000:1b:00.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
To clarify this instruction, we change the
name enp27s0
to eth2
to align with Figure 1.
$ sudo ip link set dev enp27s0 name eth2
$ ifconfig eth2
eth2: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 00:11:22:33:44:55 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Warning: DO NOT set eth2 up at this moment. simple-nic process, the substance of the simple NIC, must be running before eth2 up. If doing that, the kernel tries to send packets, e.g., IPv6 router solicitation, and then the driver sends a MWr TLP to inform the device about a TX descriptor. However, there is no response from the device, because the device, which is the software simple-nic process on the device host, has not been running yet.
Device host
Before making eth2 up, start the simple-nic process at the device host.
$ ls
libtlp
$ git clone https://github.com/nettlp/simple-nic
Cloning into 'simple-nic'...
remote: Enumerating objects: 112, done.
remote: Counting objects: 100% (112/112), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 112 (delta 43), reused 98 (delta 29), pack-reused 0
Receiving objects: 100% (112/112), 31.81 KiB | 552.00 KiB/s, done.
Resolving deltas: 100% (43/43), done.
$ ls
libtlp simple-nic
$ cd simple-nic/
$ ls
device driver include
$ cd device/
$ ls
Makefile nettlp_snic_device.c
$ make
gcc -g -Wall -I../../libtlp/include -I../include -L../../libtlp/lib nettlp_snic_device.c -ltlp -lpthread -o nettlp_snic_device
$ ls
Makefile nettlp_snic_device nettlp_snic_device.c
nettlp_snic_device
is the
simple-nic process.
Note: In the current implementation, libtlp directory and simple-nic directory must be located on the same directory. I will develop an install procedure soon.
Then, start the simple-nic process at the device host.
$ ./nettlp_snic_device -h
./nettlp_snic_device: invalid option -- 'h'
usage
-r remote addr
-l local addr
-R remote host addr (not TLP NIC)
-t tunif name (default tap0)
$ sudo ./nettlp_snic_device -r 192.168.10.1 -l 192.168.10.3 -R 172.16.0.1
Device is 1b00
BAR4 start address is 0xb0000000
TX IRQ address is 0xfee1a000, data is 0x00004022
RX IRQ address is 0xfee03000, data is 0x00004022
create tap read thread
start nettlp callback
-r and -l options are identical to the example applications. 192.168.10.1 is the address of the NetTLP adapters, and 192.168.10.3 is the local address connected to the adapter, which is eth1 in Figure 1. -R option specifies the IP address of the adapter host on the management network. When nettlp_snic_device starts, it gathers necessary information from the adapter host, for example, the address of BAR4 and IRQ information of the adapter. For this purpose, nettlp_snic_device uses a UDP-based simple messaging API implemented as a side-channel in the driver.
If nettlp_snic_device successfully gets the information through the management network, it starts to perform the simple NIC: waiting for TLPs following the PCIe interactions of the simple NIC model.
nettlp_snic_device creates tap0, and it is automatically set up. This is the actual port of the simple NIC. To ping on this closed environment, set the tap0 as a bridge port of a linux bridge, and assign an IP address to the bridge.
Please keep nettlp_snic_device running. Use other terminals on the device host.
$ ifconfig tap0
tap0: flags=67<UP,BROADCAST,RUNNING> mtu 1500
inet6 fe80::fc9d:a9ff:fe99:15cd prefixlen 64 scopeid 0x20
ether fe:9d:a9:99:15:cd txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 13 bytes 1006 (1.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ sudo ip link add br0 type bridge
$ sudo ip link set dev br0 up
$ sudo ip addr add dev br0 10.0.0.2/24
$ sudo ip link set dev tap0 master br0
$ ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.2 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::4c0d:f8ff:fe25:4b40 prefixlen 64 scopeid 0x20
ether fe:9d:a9:99:15:cd txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 11 bytes 906 (906.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ping
Now we can set the simple NIC up at the adapter host.
$ sudo ip addr add dev eth2 10.0.0.1/24
$ sudo ip link set dev eth2 up
$ ifconfig eth2
eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::211:22ff:fe33:4455 prefixlen 64 scopeid 0x20
ether 00:11:22:33:44:55 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8 bytes 656 (656.0 B)
TX errors 0 dropped 1 overruns 0 carrier 0 collisions 0
Then, the kernel tries to send some packets, so that you see some output messages from the nettlp_snic_device.
BAR4 start address is 0xb0000000
TX IRQ address is 0xfee1a000, data is 0x00004022
RX IRQ address is 0xfee03000, data is 0x00004022
create tap read thread
start nettlp callback
nettlp_snic_mwr: dma_addr is 0xb0000008
RX desc base is 0x2f005000
nettlp_snic_mwr: dma_addr is 0xb0000014
RX desc update: dma to 0xb0000014, rx desc ptr is 0x2f005000
RX desc update: new rx_desc addr=0x2f006000 len=0
nettlp_snic_mwr: dma_addr is 0xb0000018
nettlp_snic_mwr: dma_addr is 0xb0000000
TX desc base is 0x2f004000
nettlp_snic_mwr: dma_addr is 0xb0000010
idx 0, dma to 0xb0000010, tx desc ptr is 0x2f004000
TX: pkt length is 90, addr is 0x3bb15000
TX: generate interrupt to 0xfee1a000
TX done
nettlp_snic_mwr: dma_addr is 0xb0000010
idx 0, dma to 0xb0000010, tx desc ptr is 0x2f004000
TX: pkt length is 86, addr is 0x3bb15800
TX: generate interrupt to 0xfee1a000
TX done
nettlp_snic_mwr: dma_addr is 0xb0000010
idx 0, dma to 0xb0000010, tx desc ptr is 0x2f004000
TX: pkt length is 90, addr is 0x3bb16000
TX: generate interrupt to 0xfee1a000
TX done
And, you can ping via the simple NIC at the adapter host.
$ ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.297 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.246 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.239 ms
^C
--- 10.0.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2049ms
rtt min/avg/max/mdev = 0.239/0.260/0.297/0.031 ms
This ping sends an ICMP echo packet to eth2, and then the driver notifies nettlp_snic_device about the TX descriptor for the packet (MWr TLP). Then nettlp_snic_device starts DMA Read to read the descriptor and reads the packet content specified by the descriptor. After nettlp_snic_device writes the packet to the tap interface, it sends a MWr TLP as the TX interrupt (MSI-X).
The corresponding ICMP reply packet is generated by the kernel on the device host when the echo is received by br0. nettlp_snic_device reads the reply packet from the tap interface and starts to write the packet to the packet buffer on the adapter host by DMA Write. After the DMA Write, nettlp_snic_device sends a MWr TLP as the RX interrupt (MSI-X).
Observing the PCIe transactions
The PCIe transactions between the driver and the device can be observed on the Ethernet link. This is one of the powerful functionalities of the NetTLP platform. Let us see the TLPs of the simple NIC.
An easy way to see the TLPs is to use the modified tcpdump that can parse TLPs encapsulated in Ethernet, IP, UDP, and NetTLP headers. Capture the eth1 on the device host using the tcpdump.
# At device host
$ sudo tcpdump -ni eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
01:18:00.269163 IP 192.168.10.1.12289 > 192.168.10.3.12289: NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len 1, requester 00:00, tag 0x01, last 0x0, first 0xf, Addr 0xb0000010
01:18:00.269202 IP 192.168.10.3.12289 > 192.168.10.1.12289: NetTLP: MRd, 3DW, tc 0, flags [none], attrs [none], len 4, requester 1b:00, tag 0x01, last 0xf, first 0xf, Addr 0x2f004000
01:18:00.269215 IP 192.168.10.1.12289 > 192.168.10.3.12289: NetTLP: CplD, 3DW, WD, tc 0, flags [none], attrs [none], len 4, completer 00:00, success, byte count 16, requester 1b:00, tag 0x01, lowaddr 0x00
01:18:00.269234 IP 192.168.10.3.12289 > 192.168.10.1.12289: NetTLP: MRd, 3DW, tc 0, flags [none], attrs [none], len 25, requester 1b:00, tag 0x01, last 0x3, first 0xf, Addr 0x3bdc1000
01:18:00.269247 IP 192.168.10.1.12289 > 192.168.10.3.12289: NetTLP: CplD, 3DW, WD, tc 0, flags [none], attrs [none], len 25, completer 00:00, success, byte count 98, requester 1b:00, tag 0x01, lowaddr 0x00
01:18:00.269277 IP 192.168.10.3.12289 > 192.168.10.1.12289: NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len 1, requester 1b:00, tag 0x01, last 0x0, first 0xf, Addr 0xfee1a000
01:18:00.269300 IP 192.168.10.3.12288 > 192.168.10.1.12288: NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len 25, requester 1b:00, tag 0x00, last 0x3, first 0xf, Addr 0x2f006000
01:18:00.269326 IP 192.168.10.3.12288 > 192.168.10.1.12288: NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len 4, requester 1b:00, tag 0x00, last 0xf, first 0xf, Addr 0x2f005000
01:18:00.269337 IP 192.168.10.3.12288 > 192.168.10.1.12288: NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len 1, requester 1b:00, tag 0x00, last 0x0, first 0xf, Addr 0xfee03000
01:18:00.272141 IP 192.168.10.1.12290 > 192.168.10.3.12290: NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len 1, requester 00:00, tag 0x02, last 0x0, first 0xf, Addr 0xb0000014
01:18:00.272173 IP 192.168.10.3.12303 > 192.168.10.1.12303: NetTLP: MRd, 3DW, tc 0, flags [none], attrs [none], len 4, requester 1b:00, tag 0x0f, last 0xf, first 0xf, Addr 0x2f005000
01:18:00.272191 IP 192.168.10.1.12303 > 192.168.10.3.12303: NetTLP: CplD, 3DW, WD, tc 0, flags [none], attrs [none], len 4, completer 00:00, success, byte count 16, requester 1b:00, tag 0x0f, lowaddr 0x00
^C
12 packets captured
12 packets received by filter
0 packets dropped by kernel
It is time to dive into PCIe. The PCIe interaction of the simple NIC model is shown below:
On the TX side:
- Host updates a 4-byte TX queue tail pointer (PICe write)
- Device reads a 16-byte TX queue descriptor (PCIe read)
- Device reads the packet specified by the descriptor (PCIe read)
- Device sends the packet to wire
- Device generates 4-byte interrupt (PCIe write)
On the RX side:
- Host updates a 4-byte RX queue tail pointer (PCIe write)
- Device reads a 16-byte RX queue tail pointer (PCIe read)
- Device writes a received packet to the buffer specified by the descriptor (PCIe write)
- Device writes back 16-byte RX descriptor to host (PCIe write)
- Device generates 4-byte interrupt (PCIe write)
- Host reads 4-byte RX queue tail pointer (PCIe read)
All the PCIe transactions can be observed on the tcpdump. Let us see them step-by-step.
TX side
01:18:00.269163 IP 192.168.10.1.12289 > 192.168.10.3.12289:
NetTLP: MWr, 3DW, WD, tc 0, flags [none], attrs [none], len
1, requester 00:00, tag 0x01, last 0x0, first 0xf, Addr
0xb0000010
This is the first TLP for sending the ICMP echo packet. It is MWr TLP to address 0xb0000010, which is the register for the TX queue tail pointer, from the root complex (requester 00:00) to BAR4 (0xb0000000) on the adapter. This is the first transaction on the TX side: 1. Host updates a 4-byte TX queue tail pointer (PICe write).
01:18:00.269202 IP 192.168.10.3.12289 >
192.168.10.1.12289: NetTLP: MRd, 3DW, tc 0, flags [none],
attrs [none], len 4, requester 1b:00, tag 0x01, last 0xf,
first 0xf, Addr 0x2f004000
Next, the simple-nic starts to read a 16-byte TX descriptor pointed by the DMA write of the last MWr TLP. This TLP is it. The simple-nic (requester 1b:00 is PCIe bus number where the NetTLP adapter is installed) send MRd TLP for 16-byte (len 4 in DWORD) from 0x2f004000, which is the address of the TX descriptor on the main memory.
01:18:00.269215 IP 192.168.10.1.12289 >
192.168.10.3.12289: NetTLP: CplD, 3DW, WD, tc 0, flags
[none], attrs [none], len 4, completer 00:00, success,
byte count 16, requester 1b:00, tag 0x01, lowaddr 0x00
This TLP is the response for the MRd TLP for reading the TX descriptor.
01:18:00.269234 IP 192.168.10.3.12289 >
192.168.10.1.12289: NetTLP: MRd, 3DW, tc 0, flags [none],
attrs [none], len 25, requester 1b:00, tag 0x01, last 0x3,
first 0xf, Addr 0x3bdc1000
Next, as step 4 on the TX side, the simple-nic reads the actual packet to be sent from the main memory on the adapter host. 0x3bdc1000 is the address of the packet (skb->data, or skb_mac_header(skb)). len 25 means 100 bytes, but the Last DW Byte Enable filed is 0x3, thus, actual read length is 98 bytes, which is the length of the ICMP echo packet.
01:18:00.269247 IP 192.168.10.1.12289 >
192.168.10.3.12289: NetTLP: CplD, 3DW, WD, tc 0, flags
[none], attrs [none], len 25, completer 00:00, success,
byte count 98, requester 1b:00, tag 0x01, lowaddr 0x00
Then the root complex sends a response for the DMA read for the packet. After receiving this CplD TLP, the simple-nic sends the packet to a tap interface.
01:18:00.269277 IP 192.168.10.3.12289 >
192.168.10.1.12289: NetTLP: MWr, 3DW, WD, tc 0, flags
[none], attrs [none], len 1, requester 1b:00, tag 0x01,
last 0x0, first 0xf, Addr 0xfee1a000
After transmitting the packet, the simple-nic generates a TX interrupt to inform the driver of the completion of the transmission. In MSI-X, interrupt is implemented by just a MWr TLP to a specified address on main memory with a specified data. In this case, the address is 0xfee1a000.
RX side
01:18:00.269300 IP 192.168.10.3.12288 >
192.168.10.1.12288: NetTLP: MWr, 3DW, WD, tc 0, flags
[none], attrs [none], len 25, requester 1b:00, tag 0x00,
last 0x3, first 0xf, Addr 0x2f006000
On the RX side, the interaction starts with writing the received packet to the host memory (step 3) because the driver told the RX buffer to the simple-nic before receiving new packets. So, this first TLP means that the simple-nic writes the ICMP reply packet to the host memory pointed by an RX descriptor notified in advance.
01:18:00.269326 IP 192.168.10.3.12288 >
192.168.10.1.12288: NetTLP: MWr, 3DW, WD, tc 0, flags
[none], attrs [none], len 4, requester 1b:00, tag 0x00,
last 0xf, first 0xf, Addr 0x2f005000
Next, the simple-nic updates the RX descriptor to notify the driver of information about the received packet, i.e., packet length.
01:18:00.269337 IP 192.168.10.3.12288 >
192.168.10.1.12288: NetTLP: MWr, 3DW, WD, tc 0, flags
[none], attrs [none], len 1, requester 1b:00, tag 0x00,
last 0x0, first 0xf, Addr 0xfee03000
And then the simple-nic generates an RX interrupt by MSI-X as with the TX side.
01:18:00.272141 IP 192.168.10.1.12290 >
192.168.10.3.12290: NetTLP: MWr, 3DW, WD, tc 0, flags
[none], attrs [none], len 1, requester 00:00, tag 0x02,
last 0x0, first 0xf, Addr 0xb0000014
After the host consumed the received packet, the driver returns the RX buffer to the device. The driver writes the freed RX queue tail pointer to the specified address on the BAR4 (0xb0000014) on the NetTLP adapter, and then simple-nic receives this MWr TLP. This is actually the first on the RX side of the simple NIC model.
01:18:00.272173 IP 192.168.10.3.12303 >
192.168.10.1.12303: NetTLP: MRd, 3DW, tc 0, flags [none],
attrs [none], len 4, requester 1b:00, tag 0x0f, last 0xf,
first 0xf, Addr 0x2f005000
Then the simple-nic reads the new RX descriptor notified by updating the RX queue tail pointer. 0x2f005000 is the address of the 16-byte (len 4 in DWORD) RX descriptor on the main memory.
01:18:00.272191 IP 192.168.10.1.12303 >
192.168.10.3.12303: NetTLP: CplD, 3DW, WD, tc 0, flags
[none], attrs [none], len 4, completer 00:00, success,
byte count 16, requester 1b:00, tag 0x0f, lowaddr 0x00
And this TLP is the response for the DMA read for the RX descriptor.
In this manner, you can observe, understand, and debug actual PCIe transactions by IP networking tools.