Top 10 edge AI chips

As edge devices become increasingly AI-enabled, more and more chips are emerging to fill every application niche. At the extremes,Continue Reading The post Top 10 edge AI chips appeared first on EDN.

Top 10 edge AI chips
Why the Hen Does Not Have Teeth Story Book

WHY THE HEN DOES NOT HAVE TEETH STORY BOOK

It’s an amazing story, composed out of imagination and rich with lessons. You’ll learn how to be morally upright, avoid immoral things, and understand how words can make or destroy peace and harmony.

Click the image to get your copy!

Why the Hen Does Not Have Teeth Story Book

WHY THE HEN DOES NOT HAVE TEETH STORY BOOK

It’s an amazing story, composed out of imagination and rich with lessons. You’ll learn how to be morally upright, avoid immoral things, and understand how words can make or destroy peace and harmony.

Click the image to get your copy!

Why the Hen Does Not Have Teeth Story Book

WHY THE HEN DOES NOT HAVE TEETH STORY BOOK

It’s an amazing story, composed out of imagination and rich with lessons. You’ll learn how to be morally upright, avoid immoral things, and understand how words can make or destroy peace and harmony.

Click the image to get your copy!

Hailo’s Hailo-10H edge AI accelerator.

As edge devices become increasingly AI-enabled, more and more chips are emerging to fill every application niche. At the extremes, applications such as speech recognition can be done in always-on power envelopes, while tens of watts will be enough for even larger generative AI models today.

Here, in no particular order, are 10 of EDN’s selections for a range of edge AI applications. These devices range from those capable of handling multimodal large language models (LLMs) in edge devices to those designed for vision processing and minimizing power consumption for always-on applications.

Multiple camera streams

For vision applications, Ambarella Inc.’s latest release is the CV7 edge AI vision system-on-chip (SoC) for processing multiple high-quality camera streams simultaneously via convolutional neural networks (CNNs) or transformer networks. The CV7 features the latest generation of Ambarella’s proprietary AI accelerator, plus an in-house image-signal processor (ISP), which uses both traditional ISP algorithms and AI-driven features. This family also includes quad Arm Cortex-A73 cores, hardware video codecs on-chip, and a new, 64-bit DRAM interface.

Ambarella is targeting this family for AI-based 8K consumer products such as action cameras, multicamera security systems, robotics and drones, industrial automation, and video conferencing. It will also be suitable for automotive applications such as telematics and advanced driver-assistance systems.

 

 

Ambarella’s CV7 vision SoC.
Ambarella’s CV7 vision SoC (Source: Ambarella Inc.)

Fallback CPU

The MLSoC Modalix from SiMa Technologies Inc. is now available in production quantities, along with its Llima software framework for deployment of LLMs and generative AI models on Modalix. Modalix is SiMa’s second-generation architecture, which comes as a family of SoCs designed to host full applications.

Modalix chips have eight Arm A-class CPU cores on-chip alongside the accelerator, important for running application-level code, but also allows programs to fall back on the CPU just in case a particular math operation isn’t supported by the accelerator. Also on the SoC are an on-chip ISP and digital-signal processor (DSP). Modalix will come in 25-, 50-, 100-, and 200-TOPS (INT8) versions. The 50-TOPS version will be first to market and can run Llama2-7B at more than 10 tokens per second, with a power envelope of 8–10 W.

Open-source NPU

Synaptics Inc.’s Astra series of AI-enabled IoT SoCs range from application processors to microcontroller (MCU)-level parts. This family is purpose-built for the IoT.

First to market is the SL2610 family of multimodal edge AI processors for applications between smart appliances, retail point-of-sale terminals, and drones. All parts in the family have two Arm Cortex-A55 cores, and some have a neural processing unit (NPU) subsystem. The Coral NPU included was developed at Google—it’s an open-source RISC-V CPU with scalar instructions—sitting alongside Synaptics’ homegrown AI accelerator, the T1, which offers 1-TOPS (INT8) performance for transformers and CNNs.

Synaptics’ SL2610 multimodal edge AI processors.
Synaptics’ SL2610 multimodal edge AI processors (Source: Synaptics Inc.)

Raspberry Pi compatibility

The Hailo-10H edge AI accelerator from Hailo Technologies Ltd. is gaining a large developer base, as it is available in a form factor that plugs into hobbyist platform Raspberry Pi. However, the Hailo-10H is also used by HP in add-on cards for its point-of-sale systems, and it’s also automotive-qualified.

The 10H is the same silicon as the Hailo-10 but runs at a lower power-performance point: The 10H can run 2B-parameter LLMs in about 2.5 W. The architecture of this AI co-processor is based on Hailo’s second-generation architecture, which has improved support for transformer architectures and more flexible number representation. Multiple models can be inferenced concurrently.

Hailo’s Hailo-10H edge AI accelerator.
Hailo’s Hailo-10H edge AI accelerator (Source: Hailo Technologies Ltd.)

Analog acceleration

Startup EnCharge AI announced its first product, the EN100. This chip is a 200-TOPS (INT8) accelerator targeted squarely at the AI PC, achieving an impressive 40 TOPS/W. The device is based on EnCharge’s capacitance-based analog compute-in-memory technology, which the company says is less temperature-sensitive than resistance-based schemes. The accelerator’s output is a voltage (not a current), meaning transimpedance amplifiers aren’t needed, saving power.

Alongside the analog accelerator on-chip are some digital cores that can be used if higher precision is required, or floating-point maths. The EN100 will be available on a single-chip M.2 card with 32-GB LPDDR, with a power envelope of 8.25 W. A four-chip, half-height, half-length PCIe card offers up to 1 TOPS (INT8) in a 40-W power envelope, with 128-GB LPDDR memory.

Encharge AI’s EN100 M.2 card.
Encharge AI’s EN100 M.2 card (Source: Encharge AI)

SNNs

For microwatt applications, Innatera Nanosystems B.V. has developed an AI-equipped MCU that can run inference at very, very low power. The Pulsar neuromorphic MCU targets always-on sensor applications: It consumes 600 µW for radar-based presence detection and 400 µW for audio scene classification, for example.

The neural processor uses Innatera’s spiking neural network (SNN) accelerators—there are both analog and digital spiking accelerators on-chip, which can be used for different types of applications and workloads. Innatera says its software stack, Talamo, means developers don’t have to be SNN experts to use the device. Talamo interfaces directly with PyTorch and a PyTorch-based simulator and can enable power consumption estimations at any stage of development.

Innatera’s Pulsar spiking neural processor.
Innatera’s Pulsar spiking neural processor (Source: Innatera Nanosystems B.V.)

Generative AI

Axelera AI’s second-generation chip, Europa, can support both multi-user generative AI and computer vision applications in endpoint devices or edge servers. This eight-core chip can deliver 629 TOPS (INT8). The accelerator has large vector engines for AI computation alongside two clusters of eight RISC-V CPU cores for pre- and post-processing of data. There is also an H.264/H.265 decoder on-chip, meaning the host CPU can be kept free for application-level software. Given the importance of ensuring compute cores are fed quickly with data from memory, the Europa AI processor unit provides 128 MB of L2 SRAM and a 256-bit LPDDR5 interface.

Axelera’s Voyager software development kit covers both Europa and the company’s first-generation chip, Metis, reserved for more classical CNNs and vision tasks. Europa is available both as a chip or on a PCIe card. The cards are intended for edge server applications in which processing multiple 4K video streams is needed.

Butter wouldn’t melt

Most members of the DX-M1 series from South Korean chip company DeepX Co. Ltd. provide 25-TOPS (INT8) performance in the 2- to 5-W power envelope (the exception being the DX-M1M-L, offering 13 TOPS). One of the company’s most memorable demos involves placing a blob of butter directly on its chip while running inference to show that it doesn’t get hot enough for the butter to melt.

Delivering 25 TOPS in this co-processor chip is plenty for vision tasks such as pose estimation or facial recognition in drones, robots, or other camera systems. Under development, the DX-M2 will run generative AI workloads at the edge. Part of the company’s secret sauce is in its quantization scheme, which can run INT8-quantized networks with accuracy comparable to the FP32 original. DeepX sells chips, modules/cards, and small, multichip systems based on its technology for different edge applications.

Voice interface

The latest ultra-low-power edge AI accelerator from Syntiant Corp., the NDP250, offers 5× the tensor throughput versus its processor. This device is designed for computer vision, speech recognition, and sensor data processing. It can run on as little as microwatts, but for full, always-on vision processing, the consumption is closer to tens of milliwatts.

As with other parts in Syntiant’s range, the devices use the company’s AI accelerator core (30 GOPS [INT8]) alongside an Arm Cortex-M0 MCU core and an on-chip Tensilica HiFi 3 DSP. On-chip memory can store up to 6-million-bit parameters. The NDP250’s DSP supports floating-point maths for the first time in the Syntiant range. The company suggests that the ability to run both automatic speech recognition and text-to-speech models will lend the NDP250 to voice interfaces in particular.

Multiple power modes

Nvidia Corp.’s Jetson Orin Nano is designed for AI in all kinds of edge devices, targeting robotics in particular. It’s an Ampere-generation GPU module with either 8 GB or 4 GB of LPDDR5. The 8-GB version can do 33 TOPS (dense INT8) or 17 TFLOPS (FP16). It has three power modes: 7-W, 15-W, and a new, 25-W mode, which boosts memory bandwidth to 102 GB/s (from 65 GB/s for the 15-W mode) by increasing GPU, memory, and CPU clocks. The module’s CPU has six Arm Cortex-A78AE 64-bit cores. Jetson Orin Nano will be a good fit for multimodal and generative AI at the edge, including vision transformer and various small language models (in general, those with <7 billion parameters).

Nvidia’s Jetson Orin Nano.
Nvidia’s Jetson Orin Nano (Source: Nvidia Corporation)

The post Top 10 edge AI chips appeared first on EDN.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow