Projects
Reading
People
Chat
SU\G
(𝔸)
/K·U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Matheus Cavalcante
Follow
Share
Generating author description...
All published works
Action
Title
Year
Authors
+
PDF
Chat
Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
2025
Matteo Perotti
Samuel Riedel
Matheus Cavalcante
Luca Benini
+
PDF
Chat
Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads
2024
Matteo Perotti
Michele Raeber
M. Sinigaglia
Matheus Cavalcante
Davide Rossi
Luca Benini
+
PDF
Chat
Spatzformer: An Efficient Reconfigurable Dual-Core RISC-V V Cluster for Mixed Scalar-Vector Workloads
2024
Matteo Perotti
Michele Raeber
M. Sinigaglia
Matheus Cavalcante
Davide Rossi
Luca Benini
+
PDF
Chat
Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET
2024
Gianna Paulin
Paul Scheffler
Thomas Benz
Matheus Cavalcante
Tim Fischer
Manuel Eggimann
Yichao Zhang
Nils Wistoff
Luca Bertaccini
Luca Colagrande
+
PDF
Chat
Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-Based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET
2024
Gianna Paulin
Paul Scheffler
Thomas Benz
Matheus Cavalcante
Tim Fischer
Manuel Eggimann
Yichao Zhang
Nils Wistoff
Luca Bertaccini
Luca Colagrande
+
PDF
Chat
TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios
2024
Yichao Zhang
Marco Bertuletti
Samuel Riedel
Matheus Cavalcante
Alessandro Vanelli‐Coralli
Luca Benini
+
PDF
Chat
Ara2: Exploring Single- and Multi-Core Vector Processing With an Efficient RVV 1.0 Compliant Open-Source Processor
2024
Matteo Perotti
Matheus Cavalcante
Renzo Andri
Lukas Cavigelli
Luca Benini
+
PDF
Chat
MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication
2024
Matteo Perotti
Yichao Zhang
Matheus Cavalcante
Enis Mustafa
Luca Benini
+
MX: Enhancing RISC-V's Vector ISA for Ultra-Low Overhead, Energy-Efficient Matrix Multiplication
2024
Matteo Perotti
Yichao Zhang
Matheus Cavalcante
Enis Mustafa
Luca Benini
+
PDF
Chat
MemPool: A Scalable Manycore Architecture With a Low-Latency Shared L1 Memory
2023
Samuel Riedel
Matheus Cavalcante
Renzo Andri
Luca Benini
+
PDF
Chat
HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement
2023
Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+
PDF
Chat
PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge
2023
Vikram Jain
Matheus Cavalcante
Nazareno Bruschi
Michael Rogenmoser
Thomas Benz
Andreas Kurth
Davide Rossi
Luca Benini
Marian Verhelst
+
PDF
Chat
Sparse Hamming Graph: A Customizable Network-on-Chip Topology
2023
Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+
PDF
Chat
Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference
2023
MohammadHossein AskariHemmat
Théo Dupuis
Yoan Fournier
Nizar El Zarif
Matheus Cavalcante
Matteo Perotti
Frank K. Gürkaynak
Luca Benini
François Leduc-Primeau
Yvon Savaria
+
Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference
2023
MohammadHossein AskariHemmat
Théo Dupuis
Yoan Fournier
Nizar El Zarif
Matheus Cavalcante
Matteo Perotti
Frank K. Gürkaynak
Luca Benini
François Leduc-Primeau
Yvon Savaria
+
MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory
2023
Samuel Riedel
Matheus Cavalcante
Renzo Andri
Luca Benini
+
FlooNoC: A Multi-Tbps Wide NoC for Heterogeneous AXI4 Traffic
2023
Tim Fischer
Michael Rogenmoser
Matheus Cavalcante
Frank K. Gürkaynak
Luca Benini
+
PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge
2023
Vikram Jain
Matheus Cavalcante
Nazareno Bruschi
Michael Rogenmoser
Thomas Benz
Andreas Kurth
Davide Rossi
Luca Benini
Marian Verhelst
+
Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
2023
Matheus Cavalcante
Matteo Perotti
Samuel Riedel
Luca Benini
+
Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV1.0 Compliant Open-Source Processor
2023
Matteo Perotti
Matheus Cavalcante
Renzo Andri
Lukas Cavigelli
Luca Benini
+
Spatz
2022
Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
+
PDF
Chat
A “New Ara” for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design
2022
Matteo Perotti
Matheus Cavalcante
Nils Wistoff
Renzo Andri
Lukas Cavigelli
Luca Benini
+
PDF
Chat
Soft Tiles: Capturing Physical Implementation Flexibility for Tightly-Coupled Parallel Processing Clusters
2022
Gianna Paulin
Matheus Cavalcante
Paul Scheffler
Luca Bertaccini
Yichao Zhang
Frank K. Gürkaynak
Luca Benini
+
PDF
Chat
MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration
2022
Matheus Cavalcante
Anthony Agnesina
Samuel Riedel
Moritz Brunion
Alberto García-Ortiz
Dragomir Milojevic
Francky Catthoor
Sung Kyu Lim
Luca Benini
+
Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters
2022
Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
+
Soft Tiles: Capturing Physical Implementation Flexibility for Tightly-Coupled Parallel Processing Clusters
2022
Gianna Paulin
Matheus Cavalcante
Paul Scheffler
Luca Bertaccini
Yichao Zhang
Frank K. Gürkaynak
Luca Benini
+
A ''New Ara'' for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design
2022
Matteo Perotti
Matheus Cavalcante
Nils Wistoff
Renzo Andri
Lukas Cavigelli
Luca Benini
+
HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement
2022
Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+
Sparse Hamming Graph: A Customizable Network-on-Chip Topology
2022
Patrick Iff
Maciej Besta
Matheus Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
+
PDF
Chat
MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect
2021
Matheus Cavalcante
Samuel Riedel
Antonio Pullini
Luca Benini
+
PDF
Chat
An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication
2021
Andreas Kurth
Wolfgang Rönninger
Thomas Benz
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Luca Benini
+
An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication
2020
Andreas Kurth
Wolfgang Rönninger
Thomas Benz
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Luca Benini
+
PDF
Chat
Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI
2019
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Michael Schaffner
Luca Benini
+
Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI
2019
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Michael Schaffner
Luca Benini
Common Coauthors
Coauthor
Papers Together
Luca Benini
34
Matteo Perotti
14
Samuel Riedel
9
Yichao Zhang
7
Frank K. Gürkaynak
7
Thomas Benz
6
Renzo Andri
6
Davide Rossi
6
Tim Fischer
5
Patrick Iff
4
Gianna Paulin
4
Fabian Schuiki
4
Nils Wistoff
4
Andreas Kurth
4
Florian Zaruba
4
Paul Scheffler
4
Maciej Besta
4
Lukas Cavigelli
4
Torsten Hoefler
4
Luca Bertaccini
3
Michael Rogenmoser
3
Enis Mustafa
2
Théo Dupuis
2
Domenic Wüthrich
2
Marian Verhelst
2
Yoan Fournier
2
Wolfgang Rönninger
2
Yvon Savaria
2
François Leduc-Primeau
2
Manuel Eggimann
2
Michael Schaffner
2
Jean‐Pierre David
2
Gianmarco Ottavi
2
M. Sinigaglia
2
Vikram Jain
2
Nizar El Zarif
2
Luca Colagrande
2
Michele Raeber
2
Tim Fischer
2
MohammadHossein AskariHemmat
2
Nazareno Bruschi
2
Dragomir Milojevic
1
Sung Kyu Lim
1
Alberto García-Ortiz
1
Francky Catthoor
1
Luca Bertaccini
1
Antonio Pullini
1
Alessandro Vanelli‐Coralli
1
Moritz Brunion
1
Anthony Agnesina
1
Commonly Cited References
Action
Title
Year
Authors
# of times referenced
+
PDF
Chat
Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor With Multiprecision Floating-Point Support in 22-nm FD-SOI
2019
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Michael Schaffner
Luca Benini
5
+
PDF
Chat
MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect
2021
Matheus Cavalcante
Samuel Riedel
Antonio Pullini
Luca Benini
5
+
PDF
Chat
The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology
2019
Florian Zaruba
Luca Benini
5
+
PDF
Chat
A “New Ara” for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design
2022
Matteo Perotti
Matheus Cavalcante
Nils Wistoff
Renzo Andri
Lukas Cavigelli
Luca Benini
4
+
PDF
Chat
Manticore: A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing
2020
Florian Zaruba
Fabian Schuiki
Luca Benini
4
+
Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
2020
Florian Zaruba
Fabian Schuiki
Torsten Hoefler
Luca Benini
3
+
Arrow: A RISC-V Vector Accelerator for Machine Learning Inference
2021
Imad Al Assir
Mohamad El Iskandarani
Hadi Rayan Al Sandid
Mazen A. R. Saghir
3
+
PDF
Chat
Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices
2017
Michael Gautschi
Pasquale Davide Schiavone
Andreas Traber
Igor Loi
Antonio Pullini
Davide Rossi
Éric Flamand
Frank K. Gürkaynak
Luca Benini
3
+
PDF
Chat
A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures
2020
Marco A. Ramírez
César Alejandro Hernández
Oscar Palomar
Osman Ünsal
Marco Antonio Ramírez
Adrián Cristal
2
+
PDF
Chat
Kickstarting high-performance energy-efficient manycore architectures with Epiphany
2014
Andreas Olofsson
Tomas Nordström
Zain Ul-Abdin
2
+
PDF
Chat
Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-Core Processor
2023
Marco Bertuletti
Yichao Zhang
Alessandro Vanelli‐Coralli
Luca Benini
2
+
PDF
Chat
MemPool: A Scalable Manycore Architecture With a Low-Latency Shared L1 Memory
2023
Samuel Riedel
Matheus Cavalcante
Renzo Andri
Luca Benini
2
+
PDF
Chat
Fast Stencil-Code Computation on a Wafer-Scale Processor
2020
Kamil Rocki
Dirk Van Essendelft
Ilya Sharapov
Robert Schreiber
Michael Morrison
Vladimir Kibardin
Andrey Portnoy
Jean François Dietiker
Madhava Syamlal
Michael James
2
+
PDF
Chat
A High-Performance, Energy-Efficient Modular DMA Engine Architecture
2023
Thomas Benz
Michael Rogenmoser
Paul Scheffler
Samuel Riedel
Alessandro Ottaviano
Andreas Kurth
Torsten Hoefler
Luca Benini
2
+
PDF
Chat
An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication
2021
Andreas Kurth
Wolfgang Rönninger
Thomas Benz
Matheus Cavalcante
Fabian Schuiki
Florian Zaruba
Luca Benini
2
+
PDF
Chat
Going deeper with convolutions
2015
Christian Szegedy
Wei Liu
Yangqing Jia
Pierre Sermanet
Scott Reed
Dragomir Anguelov
Dumitru Erhan
Vincent Vanhoucke
Andrew Rabinovich
2
+
PDF
Chat
The ARM Scalable Vector Extension
2017
Nigel Stephens
Stuart Biles
Matthias Boettcher
Jacob Eapen
Mbou Eyole
Giacomo Gabrielli
Matt Horsnell
Grigorios Magklis
A. Martínez
Nathanaël Prémillieu
2
+
Spatz
2022
Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
2
+
End to End Learning for Self-Driving Cars
2016
Mariusz Bojarski
Davide Del Testa
Daniel Dworakowski
Bernhard Firner
Beat Flepp
Prasoon Goyal
Lawrence D. Jackel
Mathew Monfort
Urs Müller
Jiakai Zhang
2
+
Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards
2022
Rajeev Muralidhar
Renata Borovica-Gajić
Rajkumar Buyya
2
+
PDF
Chat
Efficient Processing of Deep Neural Networks: A Tutorial and Survey
2017
Vivienne Sze
Yu‐Hsin Chen
Tien-Ju Yang
Joel Emer
2
+
PDF
Chat
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads
2023
Jens Domke
Emil Vatai
Balazs Gerofi
Yuetsu Kodama
Mohamed Wahib
Artur Podobas
Sparsh Mittal
Miquel Pericàs
Lingqi Zhang
Peng Chen
1
+
BARVINN
2023
MohammadHossein AskariHemmat
Sean Wagner
Olexa Bilaniuk
Y. Hariri
Yvon Savaria
Jean‐Pierre David
1
+
BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
2016
Matthieu Courbariaux
Yoshua Bengio
1
+
PDF
Chat
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
2016
Mohammad Rastegari
Vicente Ordóñez
Joseph Redmon
Ali Farhadi
1
+
In-Datacenter Performance Analysis of a Tensor Processing Unit
2017
Norman P. Jouppi
Cliff Young
Nishant Patil
David A. Patterson
Gaurav Agrawal
Raminder Bajwa
S. C. Bates
Suresh Bhatia
Nan Boden
Al Borchers
1
+
PDF
Chat
Learning-Based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems
2018
Biresh Kumar Joardar
Ryan Kim
Janardhan Rao Doppa
Partha Pratim Pande
Diana Marculescu
Radu Mărculescu
1
+
Learned Step Size Quantization
2019
Steven K. Esser
Jeffrey L. McKinstry
Deepika Bablani
Rathinakumar Appuswamy
Dharmendra S. Modha
1
+
PDF
Chat
Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks
2018
Sayeh Sharify
Alberto Delmás Lascorz
Kevin Siu
Patrick Judd
Andreas Moshovos
1
+
PDF
Chat
In-Datacenter Performance Analysis of a Tensor Processing Unit
2017
Norman P. Jouppi
Cliff Young
Nishant Patil
David A. Patterson
Gaurav Agrawal
Raminder Bajwa
S. C. Bates
Suresh Bhatia
Nan Boden
Al Borchers
1
+
PDF
Chat
Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons
2020
Maciej Besta
Raghavendra Kanakagiri
Harun Mustafa
Mikhail Karasikov
Gunnar Rätsch
Torsten Hoefler
Edgar Solomonik
1
+
PDF
Chat
A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference
2020
Gianmarco Ottavi
Angelo Garofalo
Giuseppe Tagliavini
Francesco Conti
Luca Benini
Davide Rossi
1
+
PDF
Chat
A Survey of Quantization Methods for Efficient Neural Network Inference
2022
Amir Gholami
Sehoon Kim
Zhen Dong
Zhewei Yao
Michael W. Mahoney
Kurt Keutzer
1
+
PDF
Chat
GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors
2021
Nazareno Bruschi
Germain Haugou
Giuseppe Tagliavini
Francesco Conti
Luca Benini
Davide Rossi
1
+
PDF
Chat
AI Accelerator Survey and Trends
2021
Albert Reuther
Peter Michaleas
Michael Jones
Vijay Gadepally
Siddharth Samsi
Jeremy Kepner
1
+
PDF
Chat
ReCU: Reviving the Dead Weights in Binary Neural Networks
2021
Zihan Xu
Mingbao Lin
Jianzhuang Liu
Jie Chen
Ling Shao
Yue Gao
Yonghong Tian
Rongrong Ji
1
+
Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters
2022
Matheus Cavalcante
Domenic Wüthrich
Matteo Perotti
Samuel Riedel
Luca Benini
1
+
Data Movement Is All You Need: A Case Study on Optimizing Transformers
2020
Андрей Иванов
Nikoli Dryden
Tal Ben‐Nun
Shigang Li
Torsten Hoefler
1
+
PDF
Chat
Chiplet actuary
2022
Yinxiao Feng
Kaisheng Ma
1
+
PDF
Chat
Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference
2022
Nazareno Bruschi
Giuseppe Tagliavini
Francesco Conti
Sergi Abadal
Alberto Cabellos-Aparicio
Eduard Alarcón
Geethan Karunaratne
Irem Boybat
Luca Benini
Davide Rossi
1
+
PDF
Chat
Kraken: A Direct Event/Frame-Based Multi-sensor Fusion SoC for Ultra-Efficient Visual Processing in Nano-UAVs
2022
Alfio Di Mauro
Moritz Scherer
Davide Rossi
Luca Benini
1
+
PDF
Chat
DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training
2022
Angelo Garofalo
Yvan Tortorella
Matteo Perotti
Luca Valente
Alessandro Nadalini
Luca Benini
Davide Rossi
Francesco Conti
1
+
Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency
2023
Matheus Cavalcante
Matteo Perotti
Samuel Riedel
Luca Benini
1