Day 1 (10/1, Sat)
AutoDL
https://sites.google.com/rice.edu/auto-dl/
- DNN 모델 개발은 어렵다
- DNN 하드웨어 가속기 개발도 어렵다
- 둘 모두 시간+노동 집약적
- 둘 다 하더라도, 그 둘을 합쳤을 때 최적의 성능을 보여준다고 말할 수 없다
- In terms of both efficiency & accuracy
→ Hardware-algorithm Co-design이 필요
- keyword: NAS (neural architecture search)
- HW-NAS (HW-aware NAS)
- 관련하여 3개의 연구 소개
- Auto-NBA
http://proceedings.mlr.press/v139/fu21d/fu21d.pdf
- Input: User demands about accuracy & efficiency
- Output: Paired network & bitwidth & accelerator
- HW-NAS-Bench
https://openreview.net/pdf?id=_0kaDkv3dVf
- Search the optimal DNN architectures for the target non-customizable devices
- (maybe a hint on model conversion for specific ai-chip?)
- DNN-Chip Predictor
https://ieeexplore.ieee.org/document/9053977
FireSim & Chipyard
https://fires.im/micro-2022-tutorial/
Day 2 (10/2, Sun)
FastPath
https://fastpathconference.github.io/FastPath2022/
International Workshop on Performance Analysis of Machine Learning Systems
- Ingesting and Processing Data Efficiently for Machine Learning
- Machine Learning for Better Medicine
- Canceled
- System Requirements for Deep Learning Foundational Models
- Using Processing-in-Memory to Accelerate Edge Machine Learning
- AI4Physics: From Conceptualization to AI-Driven Discovery at Scale
- SODA: An End-To-End Open-Source Hardware Compiler for Machine Learning Accelerators
- Faster Learning on Slow Hardware
Day 3 (10/3, Mon)
Session 4B: Machine Learning
- Skipper: Enabling Efficient SNN Training Through Activation-Checkpointing and Time-Skipping
- Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tiles
- Adaptable Butterfly Accelerator for Attention-Based NNs via Hardware and Algorithm Co-Design
- DFX: A Low-Latency Multi-FPGA Appliance for Accelerating Transformer-Based Text Generation
- HARMONY: Heterogeneity-Aware Hierarchical Management for Federated Learning System
Day 4 (10/4, Tue)
Keynote
Democratizing Customized Computing
- How to make programmer w/o knowledge of circuits use FPGA easily?
- AutoSA
- AutoDSE
- GNN-DSE
- HeteroCL
- But compile time too long…
- TAPA
- MLIR? → 조사해볼만한듯
Session 5B: Accelerators In/Near Memory
- GenPIP: In-Memory Acceleration of Genome Analysis by Tight Integration of Basecalling and Read Mapping
- BEACON: Scalable Near-Data-Processing Accelerators for Genome Analysis near Memory Pool with the CXL Support
- Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
- ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity Search Acceleration
Day 5 (10/5, Wed)
Session 9A: Design Methodology
- RemembERR: Leveraging Microprocessor Errata for Improving Design Testing and Validation
- Datamime: Generating Representative Benchmarks by Automatically Synthesizing Datasets
- An Architecture Interface and Offload Model for Low-Overhead, Near-Data, Distributed Accelerators
- Towards Developing High Performance RISC-V Processors Using Agile Methodology
Session 10B: Machine Learning
- 3D-FPIM: An Extreme Energy-Efficient DNN Acceleration System Using 3D NAND Flash-Based In-Situ PIM Unit
- Sparseloop: An Analytical Approach to Sparse Tensor Accelerator Modeling
- DeepBurning-SEG: Generating DNN Accelerators of Segment-Grained Pipeline Architecture
- ANT: Exploiting Adaptive Numerical Data Type for Low-Bit Deep Neural Network Quantization
- Ristretto: An Atomized Processing Architecture for Sparsity-Condensed Stream Flow in CNN
한국 서버 개발자