14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2020)


Best Paper Award Winner

  • In-Network Memory Access Ordering for Heterogeneous Multicore Systems
    Jieming Yin, Antonia Zhai


Best Paper Award Nominees

The following two papers were selected as candidates for the Best Paper Award.

  • In-Network Memory Access Ordering for Heterogeneous Multicore Systems
    Jieming Yin, Antonia Zhai
  • SecONet: A Security Framework for a Photonic Network-on-Chip
    Janibul Bashir, Chandran Goodchild and Smruti R. Sarang


Final Program

NOCS 2020 YouTube Channel for all the video presentations from the conference is available here.

On each day, the conference program starts at 09:00 EDT (Eastern Daylight Time = GMT - 04:00), which corresponds to 21:00 in China (GMT+08:00), 18:30 in India (GMT+05:30), and 15:00 in Spain (GMT+02:00). 

(L) in front of a paper title refers to 20mins presentation time (15mins talk + 5mins Q/A)
(S) in front of a paper title refers to 15mins presentation time (12mins talk + 3mins Q/A)    

Thursday - September 24, 2020

Time (EDT)


09:00 to 9:05

Opening Remarks
Tushar Krishna (Georgia Tech) and John Kim (KAIST)

09:05 to 10:05

Keynote I: “Network Congestion: Analysis and Effective Solutions for Datacenters”
Speaker: Jose Duato (Universitat Politècnica de València)

Session Chair: Tushar Krishna (Georgia Tech)

10:05 to10:10


10:10 to11:20

Regular Paper Session A: "Architecture & RDMA"
Session Chair: Paul Bogdan (USC)

(L) In-Network Memory Access Ordering for Heterogeneous Multicore Systems
Jieming Yin, Antonia Zhai
[Best Paper Award Nominee]

(S) Scheduled Deflections for Resilient Bufferless Networks-on-Chip
Chen Chen, Zirui Tao and Joshua San Miguel

(S) Combinatorics and Geometry of Memory Access Structures for the Many-ported, Distributed and Shared
Memory Architecture

Hao Luan and Alan Gatherer

(L) PART: Pinning Avoidance in RDMA Technologies
Antonis Psistakis, Nikolaos Chrysos, Fabien Chaix, Marios Asiminakis, Michalis Gianioudis, Pantelis Xirouchakis,
Vassilis Papaefstathiou and Manolis Katevenis

11:20 to11:30


11:30 to 13:00

Special Session A: "Unlock the NoC: Transforming NoC Research with Physical Design Awareness"
Session Chairs: Chris Batten (Cornell) and Michael Taylor (University of Washington)

Chris Batten (Cornell) and Michael Taylor (University of Washington)

Ruche Networks: Wire-Maximal, No-Fuss NoCs
Dai Cheol Jung, Scott Davidson, Chun Zhao, Dustin Richmond and Michael Bedford Taylor (University of Washington)

Implementing Low-Diameter On-Chip Networks Using a Tiled Physical Design Methodology
Yanghui Ou, Shady Agwa, Christopher Batten (Cornell University)

NoC Symbiosis
Daniel Petrisko, Chun Zhao, Scott Davidson, Paul Gao, Dustin Richmond and Michael Bedford Taylor (University of Washington)


Friday - September 25, 2020

Time (EDT)


09:00 to 10:00

Regular Paper Session B: "Multicast & Security"
Session Chair: Paul Gratz (Texas A&M)

(L) An Efficient Multicast Router using Shared-Buffer with Packet Merging for Dataflow Architecture
Yi Li, Meng Wu, Xiaochun Ye, Wenming Li and Dongrui Fan

(L) SecONet: A Security Framework for a Photonic Network-on-Chip
Janibul Bashir, Chandran Goodchild and Smruti R. Sarangi
[Best Paper Award Nominee]

(L) SECTAR: Secure NoC using Trojan Aware Routing
Manju R, Abhijit Das, John Jose and Prabhat Mishra

10:10 to 11:10

Keynote II: “Domain-Specific Networks for Machine Learning”
Speaker: Dennis Abts (Groq)

Session Chair: John Kim (KAIST)

11:10 to 11:20


11:20 to 12:00

Regular Paper Session C: "Technology in Communication"
Session Chair: Luca Carloni  (Columbia University)

(L) PROTEUS: Rule-Based Self-Adaptation in Photonic NoCs for Loss-Aware Dynamic Co-Optimization of Performance and Laser Power
Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi and Ishan Thakkar

(S) Improving Inference Latency and Energy of DNNs through Wireless Enabled Multi Chip-Module-based Architectures and Model Parameters Compression
Maurizio Palesi, Giuseppe Ascia, Davide Patti, Salvatore Monteleone, Vincenzo Catania and Andrea Mineo

12:00 to 13:30

Special Session B: "Scalable Platforms for Machine Learning: An Industry Perspective"
Session Chair: Suvinay Subramanian (Google)

Accelerating the Network for Deep Learning at Scale
Benjamin Klenk (NVidia) 

The Wafer Scale Interconnect in the Wafer Scale Engine
Robert Hesse (Cerebras)

13:30 to 13:35

Concluding Remarks
Tushar Krishna (Georgia Tech) and John Kim (KAIST)


Keynote Talks

Keynote I

Date: Thursday - September 24, 2020
Time: 09:05 - 10:05
Speaker: Jose Duato (Universitat Politècnica de València)
Title: Network Congestion: Analysis and Effective Solutions for Datacenters


As the number, variety, and sophistication of Internet applications keeps growing and the number of client requests per time unit keeps increasing, datacenters are adopting computing solutions to scale with the demand, and provide appropriate support for interactive services. As system size increases, the cost of the interconnection network grows faster than system size, thus becoming increasingly important to carefully design it to prevent overprovisioning. However, by doing so, the network operation point moves closer to saturation, and sudden traffic bursts may lead to congestion. This situation is aggravated by the recent introduction of flow control in datacenter networks to cope with RDMA requirements. The result is a massive performance degradation whenever some network region becomes congested. Moreover, performance degradation may remain for long even after the traffic bursts that congested the network have already been transmitted.

In this keynote, I will show why congestion appears in an interconnection network, how it propagates, and why performance may degrade so dramatically. Different kinds of congestion will be identified. Also, a global solution to address the congestion problem will be proposed. It consists of several complementary mechanisms that cooperate to address all kinds of congestion and operate at different time scales. Some of these mechanisms have been recently incorporated into commercial products and are being standardized.


Jose Duato is Professor in the Department of Computer Engineering (DISCA) at the Technical University of Valencia (Universitat Politècnica de València). His current research interests include interconnection networks, multicore and multiprocessor architectures, and accelerators for deep learning. He published over 500 refereed papers. According to Google Scholar, his publications received more than 16,000 citations. He proposed a theory of deadlock-free adaptive routing that has been used in the design of the routing algorithms for the Cray T3E supercomputer, the on-chip router of the Alpha 21364 microprocessor, and the IBM BlueGene/L supercomputer. He also developed RECN, a scalable congestion management technique, and a very efficient routing algorithm for fat trees that has been incorporated into Sun Microsystem's 3456-port InfiniBand Magnum switch. Prof. Duato led the Advanced Technology Group in the HyperTransport Consortium, and was the main contributor to the High Node Count HyperTransport Specification 1.0. He also led the development of rCUDA, which enables remote virtualized access to GP-GPU accelerators using a CUDA interface. Prof. Duato is the first author of the book "Interconnection Networks: An Engineering Approach". He also served as a member of the editorial boards of IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computers, and IEEE Computer Architecture Letters. Prof. Duato was awarded with the National Research Prize in 2009 and the “Rey Jaime I” Prize in 2006. He is a member of the Spanish Royal Academy of Sciences.

Keynote II

Date: Friday - September 25, 2020
Time: 10:10 - 11:10
Speaker: Dennis Abts (Groq)

Title: Domain-Specific Networks for Machine Learning

This talk gives a guided tour of networking, both inside and out, for machine learning on the Groq tensor streaming processor (TSP). We describe the network in terms of the topology, routing, and flow control, look at the Groq TSP's unique on-chip and off-chip network for scaled-out machine learning. The on-chip network makes use of hardware support for tensor data types, which are lowered to a rank-2 tensor for the purpose of efficiently mapping to the underlying hardware. We describe the ISA support for tensor reshapes, rearranging elements of the tensor efficiently, and off-chip communication primitives for partitioning the global shared address space (PGAS) among multiple TSPs so the workload can be efficiently parallelized.


Dennis Abts is an American computer architect with a background in scalable vector architectures for high-performance computing, and more recently for machine learning. Previously at Google, he worked on topologies for energy-proportional networking, and at Cray where he was a Sr. Principal Architect on several Top500 massively-parallel supercomputers. Dennis has published over 20 technical papers in areas of memory systems, interconnection networks, and fault-tolerant systems. He holds over 25 patents spanning two decades of experience at Cray and Google. He holds a PhD in Computer Architecture from the University of Minnesota and is a Senior Member of IEEE and ACM Computer Society.


Special Sessions 

Special Session A - Unlock the NoC: Transforming NoC Research with Physical Design Awareness

Date: Thursday - September 24, 2020
Time: 11:30 - 13:00
Session Chairs: Chris Batten (Cornell) and Michael Taylor (University of Washington)

As more modern technology nodes enable NoCs with 1000’s of endpoints, NoC bandwidth and latency increasingly becomes a limiter. Taking NoCs to the next level requires network designs that are carefully matched to underlying VLSI resources; and that acknowledge both the capabilities and limitations of modern fully automatic CAD flows. This session includes three invited papers that explore these issues.

Special Session B - Scalable Platforms for Machine Learning: An Industry Perspective

Date: Friday - September 25, 2020
Time: 12:00 - 13:30

Invited Talk B.1 - Accelerating the Network for Deep Learning at Scale

Speaker: Benjamin Klenk (NVidia) 

The combination of seemingly unlimited amounts of data and compute power of GPUs has spurred revolutions in many industries. Artificial intelligence excels at tasks like image, object, and speech recognition or natural language understanding and translation. With compute requirements doubling every 3-4 months, large clusters of GPUs are necessary to train massive deep neural networks in a reasonable amount of time. As processors’ compute capabilities increase at a rapid pace, the network remains a critical component and bandwidth becomes a scarce resource. This presentation will exhibit our research on an in-network architecture that allows for faster training of large neural networks on scalable, GPU-centric systems.

Speaker Bio:
Benjamin Klenk is a Senior Research Scientist in NVIDIA’s Networking Research Group. He received his PhD in Computer Engineering from Heidelberg University, Germany. His research covers a broad range in GPU networking and GPU-centric communication models, including mechanisms to accelerate deep learning on various levels from on-chip to system networks.

Invited Talk B.2 - The Wafer Scale Interconnect in the Wafer Scale Engine

Speaker: Robert Hesse (Cerebras)

Deep learning (DL) represents a large and growing portion of the compute workload observed in major datacenters today. However, the vast majority of processors and systems used for this work were originally designed for other fundamentally different tasks and subsequently repurposed for DL applications. This architectural mismatch between workload and processor architecture is a significant contributor to the long run times observed for standard benchmark and emergent state of the art DL models today. There is an opportunity to drastically accelerate DL and enable shorter execution time at improved efficiency with architectures designed specifically for the DL workload.

To meet the computational requirements of deep learning, Cerebras has built the Cerebras Wafer Scale Engine (WSE) -- the largest computer chip in the world. At a size of 46,000 square millimeters and more than one trillion transistors, it is more than fifty times the size of the largest CPU or GPU in the market today. Compute at this scale, with hundreds of thousands of cores integrated on the same chip, presents unique challenges for the interconnect fabric. A high bandwidth, low latency fabric is the key to efficiently scaling neural network training and inference across all the cores of this massive chip. In this talk we will discuss some of the solutions and trade-offs involved in providing the massive amounts of bandwidth and low latency, as well as high flexibility, resilience and efficiency required from the on-chip network that allowed us to unlock unprecedented deep learning performance on a single chip.

Speaker Bio:
Robert Hesse currently works on large scale on-chip interconnects and neural network performance at Cerebras Systems. Prior to Cerebras he has been involved in NoC research since the early 2000s in both industry and academia. Inspired by working on early NoC prototypes at Infineon Technologies in Germany, he decided to pursue a Ph.D. at the University of Toronto focused on Dynamic On-Chip Networks under the supervision of Prof. Natalie Enright Jerger. After receiving his Ph.D., he joined Intel Corp.'s Architecture group in Santa Clara to work on memory and interconnect architectures across multiple SoC generations. During his tenure at Intel he filed several interconnect-related patents and made significant contributions to the scaling and performance of Intel's coherent and non-coherent on-chip interconnects.