Tsetlin machines are a new machine learning approach founded on sentence logic and Tsetlin Automata (finite-state automata for learning). They learn by extracting common patterns from data, decomposing the data into logical expressions. The simplicity of the logical expressions enables ultra-low energy consumption and fast parallel hardware (HW). Despite this simplicity, Tsetlin machines have outperformed vanilla deep neural networks accuracy-wise on well-established benchmarks.
In this project, we aim to solve three significant challenges in Tsetlin machine research: 1) Current hardware prototypes are limited to small-scale machine learning problems. 2) It is unclear how to do reinforcement learning (learning from penalties and rewards), which is a fundamental machine learning paradigm. 3) We do not know how to pre-train Tsetlin machines from unlabelled data to deal with the well-known data labelling bottleneck. By overcoming these three obstacles, we aim to architect a new software- and hardware ecosystem that outperforms state-of-the-art machine learning. This will enable powerful logic-based machine learning applications at the edge and in the cloud.
For the first challenge, we have shown how to reduce energy consumption by introducing deterministic learning steps. We have further developed a scalable and asynchronous architecture that can utilize parallel hardware. Our most recent hardware designs now scale to support Tsetlin machine convolution for image analysis, and we introduce a fully asynchronous Tsetlin automaton architecture with integrated non-volatile memory. When scaling up the complexity of machine learning datasets, we have noticed significant memory traffic in the hardware designs, affecting our efficiency and throughput. To mitigate the memory bottlenecks, we have designed several in-memory hardware architectures that are capable of processing storage data in modular in-memory units. For in-memory implementations, we have used two types of memory technologies: traditional static RAMs and emerging non-volatile resistive memories. We have also focused on data booleanization for building sparse and accurate patterns. We have developed in-house visualisation methods to understand and explain the impact of Booleanisation and hyperparameter choices on the efficacy of machine learning. Alongside these, we have proposed hardware designs using so-called Petri nets, including latency analysis.
For the second challenge, we have developed the first Tsetlin machine framework for off- and on-policy reinforcement learning evaluated on grid-worlds. Further, for the board game Hex we have designed representations and interpretation mechanisms that allow logic-based winner prediction. We have also solved the problem of on-line reinforcement learning in a context (the so-called contextual bandit problem) with a Tsetlin machine, outperforming other popular comparable base learners. For complex tasks, we have observed that the logical expressions can get quite long. To address this issue, we have proposed a novel variant of Tsetlin machine learning where you can constrain the size of the logical rules. This allowed us to halve the size of the expressions used for winner prediction in Hex, with no loss in accuracy.
For the third challenge, we have developed a Tsetlin machine architecture that supports mapping between multidimensional input and output values. This approach has enabled the first Tsetlin machine autoencoder and self-supervised learning of language models. We have also created the first unsupervised Tsetlin machine architecture for finding interpretable clusters in data. To address dynamic environments, we have investigated a logical approach to modeling and detecting data drift. Finally, we have developed a method for uncovering relations and structure in data using a Tsetlin machine-produced Bayesian network, as well as a Tsetlin machine architecture for processing time series. For all these approaches, we have explored various applications including sensing, planning, natural language processing, image analysis, and decision support.
Throughout this project, we have initiated several collaborative research endeavors addressing individual challenges. These include the effective use of visualization to explore diverse algorithms and their behavioral dynamics, along with the development of ensemble methods to address accuracy issues for large-scale machine learning problems. The findings and outcomes of the project have been foundational for exploring new avenues for Tsetlin machines.
Tsetlin Machines (TMs) are a new machine learning (ML) approach founded on the Tsetlin Automaton. TMs use frequent pattern mining and resource allocation to extract common patterns in the data, rather than relying on minimizing output error, which is prone to overfitting. Unlike the intertwined nature of pattern representation in neural networks (NNs), TMs decompose problems into self-contained patterns, each represented as a conjunctive clause. The clause outputs, in turn, are combined into a classification decision through summation and thresholding, akin to a logistic regression function, however, with binary weights and a unit step output function. TM hardware (HW) has demonstrated up to three orders of magnitude reduced energy and faster learning, compared to NNs alike. Logic-driven fundamental blocks, organized in lean parallel processing units, are major contributors to this comparative advantage over NNs that are heavily arithmetic-based. The TM further outperforms vanilla CNNs and LSTMs accuracy-wise on well-established benchmarks. While the reported results on TMs are promising in terms of scalability, training time, accuracy, and energy, we here address three major obstacles. 1) Current FPGA and ASIC prototypes lack scalable memory elements, constraining them to small-scale ML problems. 2) Reinforcement learning (RL) is key to many ML problems, such as playing board games, however, it is unclear how to model RL in the TM framework. 3) State-of-the-art deep learning models support pre-training on unlabelled data, which significantly improves the accuracy of following supervised learning, dealing with shortage of labelled data. It is unclear how to pre-train TMs from unlabelled data. By overcoming these three obstacles we aim to architect a new TM HW/SW ecosystem that outperforms state-of-the-art ML in terms of energy efficiency and scalability, parametrised by accuracy. This will enable powerful logic-based ML applications at the edge and in the cloud.