Skip to content
No models found
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Blog
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Favicon for nvidia

NVIDIA: Llama 3.1 Nemotron Nano 8B v1

nvidia/llama-3.1-nemotron-nano-8b-v1

Llama-3.1-Nemotron-Nano-8B-v1 is a compact large language model (LLM) derived from Meta's Llama-3.1-8B-Instruct, specifically optimized for reasoning tasks, conversational interactions, retrieval-augmented generation (RAG), and tool-calling applications. It balances accuracy and efficiency, fitting comfortably onto a single consumer-grade RTX GPU for local deployment. The model supports extended context lengths of up to 128K tokens.

Note: you must include detailed thinking on in the system prompt to enable reasoning. Please see Usage Recommendations(opens in new tab) for more.

Model weights

Modalities

Context

Avg

131K

Released

Apr 8, 2025

Knowledge Cutoff

Dec 2023

Activity

Activity

Token volume and request traffic to this model over time.