aiMotive, a supplier of scalable modular automated driving technologies, has released the new aiWare automotive NPU hardware IP, called aiWare4+.
aiWare4+ builds on the successes of aiWare4 in production automotive system on chips (SoCs) due to its refined hardware architecture and upgraded software development kit (SDK). This new combination enables higher efficiency execution of a much broader range of workloads, including transformer networks and other emerging AI network topologies. aiWare4+ also includes support for FP8, in addition to INT8 computation and dedicated sparsity hardware support.
The data-first scalable hardware architecture combines concepts such as near-memory execution, massively parallel on-chip I/O, hierarchical hardware tiling and wavefront processing to provide the highest possible power performance area (PPA).
Upgraded capabilities for aiWare4+ consist of improved programmability with significant enhancements made to the aiWare hardware architecture and SDK portfolio of tools, as well as added support for FP8 and INT8 quantization for workload execution.
Additionally, SDK upgrades will enable users to provide higher performance for convolutional neural networks (CNNs) and also transformer networks, occupancy networks and long short-term memories (LSTMs). Other upgrades included enhanced sparsity support and improved scalability from 10 TOPS up to 1,000+ TOPS, using a multicore architecture to increase throughput while retaining high efficiency.
“When we delivered aiWare4, we knew our highly customized hardware architecture enabled us to deliver superior efficiency and PPA compared to any other automotive inference NPU on the market,” said Mustafa Ali, product director, aiWare for aiMotive. “However, while acknowledging our CNN efficiency leadership, some of our customers were concerned about aiWare’s programmability compared to more conventional architectures such as DSP- or GPU-based NPUs.
“These latest aiWare4+ and aiWare SDK upgrades ensure that our customers can program aiWare for a broad range of AI workloads, achieving futureproof flexibility comparable to some of the best-known SoCs and DSP-based NPUs, without sacrificing our industry-leading NPU efficiency.”