Thesis

Linear Equation Solving in Asynchronous Logic Automata

Greenwald, S. "Linear Equation Solving in Asynchronous Logic Automata"

Abstract

A longstanding trend in supercomputing is that as supercomputers scale, they become more difficult to program in a way that fully utilizes their parallel processing capabilities. At the same time they become more power-hungry - today's largest supercomputers each consume as much power as a town of 5000 inhabitants in the United States. In this thesis I investigate an alternative type of architecture, Asynchronous Logic Automata, which I conclude has the potential to be easy to program in a parametric way and execute very dense, high-throughput computation at a lesser energy cost than that of today's supercomputers. This architecture aligns physics and computation in a way that makes it inherently scalable, unlike existing architectures. An ALA circuit is a network of 1-bit processors that perform operations asynchronously and communicate only with their nearest neighbors over wires that hold one bit at a time. In the embodiment explored here, ALA circuits form a 2D grid of 1-bit processors. ALA is both a model for computation and a hardware architecture. The program is a picture which specifies what operation each cell does, and which neighbors it communicates with. This program-picture is also a hardware design - there is a one-to-one mapping of logical cells to hardware blocks that can be arranged on a grid and execute the computation. On the hardware side, it can be seen as the fine-grained limit of several hardware paradigms which exploit parallelism, data locality and application-specific customization to achieve performance. In this thesis I use matrix multiplication as a case study to investigate how numerical computation can be performed in this substrate, and how the potential benefits play out in terms of hardware performance estimates. First we take a brief tour of supercomputing today, and see how ALA is related to a variety of progenitors. Next ALA computation and circuit metrics are introduced - characterizing runtime and number of operations performed. The specification part of the case study begins with numerical primitives, introduces a language called Snap for design for in ALA, and expresses matrix multiplication using the two together. Hardware performance estimates are given for a known CMOS embodiment by translating circuit metrics from simulation into physical units. The theory section reveals in full detail the algorithms used to compute and optimize circuit characteristics based on directed acyclic graphs (DAG's). Finally it is shown how the Snap procedure of assembling larger modules out of modules employs theory to hierarchically maintain throughput optimality.

Related Content