Search Results

News

DEV Community
dev. to > bossandboss > i-got-99x-lower-ttft-on-a-real-android-phone-by-reusing-llamacpp-kv-state-1ngi

I Got 9. 9 Lower TTFT on a Real Android Phone by Reusing llama. cpp KV State

12+ min ago (743+ words) Local LLM inference has an expensive habit: It recomputes prefixes it has already seen. A system prompt. A reused RAG document. A few-shot block. A long static context. If the prefix is identical, why pay the prefill cost again? That's…...

Symbols: llm,llms

DEV Community
dev. to > dramasamy > from-api-to-gpu-week-1-understanding-nvidia-dgx-spark-environment-1aol

From API to GPU, Week 1: Understanding NVIDIA DGX Spark Environment

37+ min ago (1456+ words) I've used AI through APIs for years " POST a prompt, get tokens back, ship the feature. I have never once deployed a model myself. No Py Torch, no GPU memory math, no idea what actually happens between my HTTP request…...

Symbols: nvda

Eightfold
nvidia. eightfold. ai > careers > job > 893396253752

Senior Software Engineer, CUDA C++ Core Libraries | NVIDIA Corporation

6+ hour, 45+ min ago (632+ words) NVIDIA's accelerated computing platform is foundational to modern HPC and AI. At the center of this platform are CUDA Core Libraries that provide the algorithms, abstractions, and runtime capabilities needed to build fast, reliable, and scalable GPU-accelerated software. We are…...

Symbols: nvda

DEV Community
dev. to > abdollah_ebadi_cbec8f6471 > running-multiple-comfyui-instances-in-parallel-on-a-single-gpu-what-actually-breaks-first-4n04

Running Multiple Comfy UI Instances in Parallel on a Single GPU " What Actually Breaks First

7+ hour, 54+ min ago (847+ words) The Problem Nobody Has Measured If you have spent any time scaling Comfy UI beyond a single. .. Tagged with comfyui, python, machinelearning, gpu....

Symbols: mnnvl,nvda,btc-usd,nok

Blockchain. News
blockchain. news > news > nvidia-jax-llm-training-host-offloading

NVIDIA Optimizes JAX LLM Training with Host Offloading

23+ hour, 2+ min ago (367+ words) Lawrence Jengar Jul 10, 2026 18: 51 NVIDIA's host offloading for JAX LLM training boosts GPU memory efficiency, enabling larger batch sizes and faster throughput. The Blackwell GPU, paired with NVIDIA's Grace CPU, achieves up to 900 GB/s bidirectional bandwidth via NVLink-C2 C. This high-speed…...

Symbols: llms,llm,fl,nvda

Startup Hub. ai
startuphub. ai > startups > vulkanic > open-source-alternatives

Open Source Alternatives to Vulkanic (2026)

1+ day, 7+ hour ago (47+ words) Startup Hub. ai 60 open source Agentic AI options similar to Vulkanic, ranked by named competitors first, sector overlap, and our 0-100 AI-readiness score. Each profile notes license and self-hosting where we have it. View all alternatives to Vulkanic "...

Symbols: nvda

NVIDIA Technical Blog
developer. nvidia. com > blog > kernel-fusion-in-nvidia-cuda-optimizing-memory-traffic-and-launch-overhead

Kernel Fusion in NVIDIA CUDA: Optimizing Memory Traffic and Launch Overhead

1+ day, 5+ hour ago (1201+ words) There are many ways to optimize code for GPUs. In this post, you'll learn how kernel fusion can improve memory bandwidth and reduce kernel launch overhead, along with multiple ways to apply it in NVIDIA CUDA code. A common bottleneck…...

Symbols: nvda

Startup Hub. ai
startuphub. ai > startups > vulkanic > alternatives

Vulkanic Alternatives & Competitors (2026)

1+ day, 7+ hour ago (59+ words) Startup Hub. ai The top Agentic AI companies similar to Vulkanic, ranked by relevance (named competitors first, then how closely their sectors overlap) and our 0-100 AI-readiness score. Each links to a full profile with funding and team. Open source alternatives…...

Symbols: nvda

Doubleword
blog. doubleword. ai > what-happens-when-you-checkpoint-a-cuda-process

Reverse-engineering NVIDIA's cuda-checkpoint for faster cold starts

2+ day, 6+ hour ago (1307+ words) There's a little known feature in the closed-source NVIDIA driver that lets you freeze a running CUDA process, serialize its GPU state into host memory, and later restore it to the GPU exactly as it was. We used it in…...

Symbols: btc-usd

NVIDIA Technical Blog
developer. nvidia. com > blog > running-low-latency-analytical-workloads-with-gpu-accelerated-presto-on-nvidia-gb200-nvl72

Running Low-Latency Analytical Workloads with GPU-Accelerated Presto on NVIDIA GB200 NVL72

3+ day, 5+ hour ago (477+ words) Presto is an open source, distributed SQL engine for running fast, interactive queries on very large datasets. On NVIDIA GPUs, Presto delivers peak performance for analytical query workloads and provides low latency for users and agents. GPU-accelerated Presto brings low…...

Symbols: nasdaq:nvda,small.en