Matheus Cavalcante

Follow

Generating author description...

All published works
Action Title Year Authors
+ PDF Chat Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency 2025 Matteo Perotti
Samuel Riedel
Matheus Cavalcante
Luca Benini
+ PDF Chat Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads 2024 Matteo Perotti
Michele Raeber
M. Sinigaglia
Matheus Cavalcante
Davide Rossi
Luca Benini
+ PDF Chat Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads 2024 Matteo Perotti
Michele Raeber
M. Sinigaglia
Matheus Cavalcante
Davide Rossi
Luca Benini
+ PDF Chat Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET 2024 Gianna Paulin
Paul Scheffler
Thomas Benz
Matheus Cavalcante
Tim Fischer
Manuel Eggimann
Yichao Zhang
Nils Wistoff
Luca Bertaccini
Luca Colagrande
+ PDF Chat Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-Based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET 2024 Gianna Paulin
Paul Scheffler
Thomas Benz
Matheus Cavalcante
Tim Fischer
Manuel Eggimann
Yichao Zhang
Nils Wistoff
Luca Bertaccini
Luca Colagrande
+ PDF Chat TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios 2024 Yichao Zhang
Marco Bertuletti
Samuel Riedel
Matheus Cavalcante
Alessandro Vanelli‐Coralli
Luca Benini
+ PDF Chat Ara2: Exploring Single- and Multi-Core Vector Processing With an Efficient RVV 1.0 Compliant Open-Source Processor 2024 Matteo Perotti
Matheus Cavalcante
Renzo Andri
Lukas Cavigelli
Luca Benini
+ PDF Chat MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication 2024 Matteo Perotti
Yichao Zhang
Matheus Cavalcante
Enis Mustafa
Luca Benini
+ MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication 2024 Matteo Perotti
Yichao Zhang
Matheus Cavalcante
Enis Mustafa
Luca Benini
+ PDF Chat MemPool: A Scalable Manycore Architecture With a Low-Latency Shared L1 Memory 2023 Samuel Riedel
Matheus Cavalcante
Renzo Andri
Luca Benini
+ PDF Chat HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement 2023 Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+ PDF Chat PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge 2023 Vikram Jain
Matheus Cavalcante
Nazareno Bruschi
Michael Rogenmoser
Thomas Benz
Andreas Kurth
Davide Rossi
Luca Benini
Marian Verhelst
+ PDF Chat Sparse Hamming Graph: A Customizable Network-on-Chip Topology 2023 Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+ PDF Chat Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference 2023 MohammadHossein AskariHemmat
Théo Dupuis
Yoan Fournier
Nizar El Zarif
Matheus Cavalcante
Matteo Perotti
Frank K. Gürkaynak
Luca Benini
François Leduc-Primeau
Yvon Savaria
+ Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference 2023 MohammadHossein AskariHemmat
Théo Dupuis
Yoan Fournier
Nizar El Zarif
Matheus Cavalcante
Matteo Perotti
Frank K. Gürkaynak
Luca Benini
François Leduc-Primeau
Yvon Savaria
+ MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory 2023 Samuel Riedel
Matheus Cavalcante
Renzo Andri
Luca Benini
+ FlooNoC: A Multi-Tbps Wide NoC for Heterogeneous AXI4 Traffic 2023 Tim Fischer
Michael Rogenmoser
Matheus Cavalcante
Frank K. Gürkaynak
Luca Benini
+ PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge 2023 Vikram Jain
Matheus Cavalcante
Nazareno Bruschi
Michael Rogenmoser
Thomas Benz
Andreas Kurth
Davide Rossi
Luca Benini
Marian Verhelst
+ Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency 2023 Matheus Cavalcante
Matteo Perotti
Samuel Riedel
Luca Benini
+ Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV1.0 Compliant Open-Source Processor 2023 Matteo Perotti
Matheus Cavalcante
Renzo Andri
Lukas Cavigelli
Luca Benini
+ Spatz 2022 Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
+ PDF Chat A “New Ara” for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design 2022 Matteo Perotti
Matheus Cavalcante
Nils Wistoff
Renzo Andri
Lukas Cavigelli
Luca Benini
+ PDF Chat Soft Tiles: Capturing Physical Implementation Flexibility for Tightly-Coupled Parallel Processing Clusters 2022 Gianna Paulin
Matheus Cavalcante
Paul Scheffler
Luca Bertaccini
Yichao Zhang
Frank K. Gürkaynak
Luca Benini
+ PDF Chat MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration 2022 Matheus Cavalcante
Anthony Agnesina
Samuel Riedel
Moritz Brunion
Alberto García-Ortiz
Dragomir Milojevic
Francky Catthoor
Sung Kyu Lim
Luca Benini
+ Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters 2022 Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
+ Soft Tiles: Capturing Physical Implementation Flexibility for Tightly-Coupled Parallel Processing Clusters 2022 Gianna Paulin
Matheus Cavalcante
Paul Scheffler
Luca Bertaccini
Yichao Zhang
Frank K. Gürkaynak
Luca Benini
+ A ''New Ara'' for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design 2022 Matteo Perotti
Matheus Cavalcante
Nils Wistoff
Renzo Andri
Lukas Cavigelli
Luca Benini
+ HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement 2022 Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+ Sparse Hamming Graph: A Customizable Network-on-Chip Topology 2022 Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+ PDF Chat MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect 2021 Matheus Cavalcante
Samuel Riedel
Antonio Pullini
Luca Benini
+ PDF Chat An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication 2021 Andreas Kurth
Wolfgang Rönninger
Thomas Benz
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Luca Benini
+ An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication 2020 Andreas Kurth
Wolfgang Rönninger
Thomas Benz
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Luca Benini
+ PDF Chat Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI 2019 Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Michael Schaffner
Luca Benini
+ Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI 2019 Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Michael Schaffner
Luca Benini
Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ PDF Chat Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI 2019 Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Michael Schaffner
Luca Benini
5
+ PDF Chat MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect 2021 Matheus Cavalcante
Samuel Riedel
Antonio Pullini
Luca Benini
5
+ PDF Chat The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology 2019 Florian Zaruba
Luca Benini
5
+ PDF Chat A “New Ara” for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design 2022 Matteo Perotti
Matheus Cavalcante
Nils Wistoff
Renzo Andri
Lukas Cavigelli
Luca Benini
4
+ PDF Chat Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing 2020 Florian Zaruba
Fabian Schuiki
Luca Benini
4
+ Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads 2020 Florian Zaruba
Fabian Schuiki
Torsten Hoefler
Luca Benini
3
+ Arrow: A RISC-V Vector Accelerator for Machine Learning Inference 2021 Imad Al Assir
Mohamad El Iskandarani
Hadi Rayan Al Sandid
Mazen A. R. Saghir
3
+ PDF Chat Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices 2017 Michael Gautschi
Pasquale Davide Schiavone
Andreas Traber
Igor Loi
Antonio Pullini
Davide Rossi
Éric Flamand
Frank K. Gürkaynak
Luca Benini
3
+ PDF Chat A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures 2020 Marco A. Ramírez
César Alejandro Hernández
Oscar Palomar
Osman Ünsal
Marco Antonio Ramírez
Adrián Cristal
2
+ PDF Chat Kickstarting high-performance energy-efficient manycore architectures with Epiphany 2014 Andreas Olofsson
Tomas Nordström
Zain Ul-Abdin
2
+ PDF Chat Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-Core Processor 2023 Marco Bertuletti
Yichao Zhang
Alessandro Vanelli‐Coralli
Luca Benini
2
+ PDF Chat MemPool: A Scalable Manycore Architecture With a Low-Latency Shared L1 Memory 2023 Samuel Riedel
Matheus Cavalcante
Renzo Andri
Luca Benini
2
+ PDF Chat Fast Stencil-Code Computation on a Wafer-Scale Processor 2020 Kamil Rocki
Dirk Van Essendelft
Ilya Sharapov
Robert Schreiber
Michael Morrison
Vladimir Kibardin
Andrey Portnoy
Jean François Dietiker
Madhava Syamlal
Michael James
2
+ PDF Chat A High-Performance, Energy-Efficient Modular DMA Engine Architecture 2023 Thomas Benz
Michael Rogenmoser
Paul Scheffler
Samuel Riedel
Alessandro Ottaviano
Andreas Kurth
Torsten Hoefler
Luca Benini
2
+ PDF Chat An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication 2021 Andreas Kurth
Wolfgang Rönninger
Thomas Benz
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Luca Benini
2
+ PDF Chat Going deeper with convolutions 2015 Christian Szegedy
Wei Liu
Yangqing Jia
Pierre Sermanet
Scott Reed
Dragomir Anguelov
Dumitru Erhan
Vincent Vanhoucke
Andrew Rabinovich
2
+ PDF Chat The ARM Scalable Vector Extension 2017 Nigel Stephens
Stuart Biles
Matthias Boettcher
Jacob Eapen
Mbou Eyole
Giacomo Gabrielli
Matt Horsnell
Grigorios Magklis
A. Martínez
Nathanaël Prémillieu
2
+ Spatz 2022 Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
2
+ End to End Learning for Self-Driving Cars 2016 Mariusz Bojarski
Davide Del Testa
Daniel Dworakowski
Bernhard Firner
Beat Flepp
Prasoon Goyal
Lawrence D. Jackel
Mathew Monfort
Urs Müller
Jiakai Zhang
2
+ Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards 2022 Rajeev Muralidhar
Renata Borovica-Gajić
Rajkumar Buyya
2
+ PDF Chat Efficient Processing of Deep Neural Networks: A Tutorial and Survey 2017 Vivienne Sze
Yu‐Hsin Chen
Tien-Ju Yang
Joel Emer
2
+ PDF Chat At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads 2023 Jens Domke
Emil Vatai
Balazs Gerofi
Yuetsu Kodama
Mohamed Wahib
Artur Podobas
Sparsh Mittal
Miquel Pericàs
Lingqi Zhang
Peng Chen
1
+ BARVINN 2023 MohammadHossein AskariHemmat
Sean Wagner
Olexa Bilaniuk
Y. Hariri
Yvon Savaria
Jean‐Pierre David
1
+ BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 2016 Matthieu Courbariaux
Yoshua Bengio
1
+ PDF Chat XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks 2016 Mohammad Rastegari
Vicente Ordóñez
Joseph Redmon
Ali Farhadi
1
+ In-Datacenter Performance Analysis of a Tensor Processing Unit 2017 Norman P. Jouppi
Cliff Young
Nishant Patil
David A. Patterson
Gaurav Agrawal
Raminder Bajwa
S. C. Bates
Suresh Bhatia
Nan Boden
Al Borchers
1
+ PDF Chat Learning-Based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems 2018 Biresh Kumar Joardar
Ryan Kim
Janardhan Rao Doppa
Partha Pratim Pande
Diana Marculescu
Radu Mărculescu
1
+ Learned Step Size Quantization 2019 Steven K. Esser
Jeffrey L. McKinstry
Deepika Bablani
Rathinakumar Appuswamy
Dharmendra S. Modha
1
+ PDF Chat Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks 2018 Sayeh Sharify
Alberto Delmás Lascorz
Kevin Siu
Patrick Judd
Andreas Moshovos
1
+ PDF Chat In-Datacenter Performance Analysis of a Tensor Processing Unit 2017 Norman P. Jouppi
Cliff Young
Nishant Patil
David A. Patterson
Gaurav Agrawal
Raminder Bajwa
S. C. Bates
Suresh Bhatia
Nan Boden
Al Borchers
1
+ PDF Chat Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons 2020 Maciej Besta
Raghavendra Kanakagiri
Harun Mustafa
Mikhail Karasikov
Gunnar Rätsch
Torsten Hoefler
Edgar Solomonik
1
+ PDF Chat A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference 2020 Gianmarco Ottavi
Angelo Garofalo
Giuseppe Tagliavini
Francesco Conti
Luca Benini
Davide Rossi
1
+ PDF Chat A Survey of Quantization Methods for Efficient Neural Network Inference 2022 Amir Gholami
Sehoon Kim
Zhen Dong
Zhewei Yao
Michael W. Mahoney
Kurt Keutzer
1
+ PDF Chat GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors 2021 Nazareno Bruschi
Germain Haugou
Giuseppe Tagliavini
Francesco Conti
Luca Benini
Davide Rossi
1
+ PDF Chat AI Accelerator Survey and Trends 2021 Albert Reuther
Peter Michaleas
Michael Jones
Vijay Gadepally
Siddharth Samsi
Jeremy Kepner
1
+ PDF Chat ReCU: Reviving the Dead Weights in Binary Neural Networks 2021 Zihan Xu
Mingbao Lin
Jianzhuang Liu
Jie Chen
Ling Shao
Yue Gao
Yonghong Tian
Rongrong Ji
1
+ Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters 2022 Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
1
+ Data Movement Is All You Need: A Case Study on Optimizing Transformers 2020 Андрей Иванов
Nikoli Dryden
Tal Ben‐Nun
Shigang Li
Torsten Hoefler
1
+ PDF Chat Chiplet actuary 2022 Yinxiao Feng
Kaisheng Ma
1
+ PDF Chat Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference 2022 Nazareno Bruschi
Giuseppe Tagliavini
Francesco Conti
Sergi Abadal
Alberto Cabellos-Aparicio
Eduard Alarcón
Geethan Karunaratne
Irem Boybat
Luca Benini
Davide Rossi
1
+ PDF Chat Kraken: A Direct Event/Frame-Based Multi-sensor Fusion SoC for Ultra-Efficient Visual Processing in Nano-UAVs 2022 Alfio Di Mauro
Moritz Scherer
Davide Rossi
Luca Benini
1
+ PDF Chat DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training 2022 Angelo Garofalo
Yvan Tortorella
Matteo Perotti
Luca Valente
Alessandro Nadalini
Luca Benini
Davide Rossi
Francesco Conti
1
+ Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency 2023 Matheus Cavalcante
Matteo Perotti
Samuel Riedel
Luca Benini
1