1. Introduction to NeuroCLI

NeuroCLI is a state-of-the-art, secure, and lightning-fast AI chat application built to bridge the gap between advanced large language models and a premium user experience.

System Architecture Overview

NeuroCLI is engineered using a high-efficiency decoupled architecture. The frontend uses highly optimized Vanilla JavaScript and CSS variables to render rich visual details (including Three.js animated particle constellations and 3D card tilt matrices). The backend operates on a Python Flask container that securely proxies client chat requests, manages session databases with SQLite, and provides sandboxed file parsing mechanisms. All external inference queries are routed through encrypted token pipelines, keeping your user keys entirely private.

Core Design Principles

Security-First Design

No user passwords are stored in cleartext. Double-layer token verification codes protect user session cookies from client-side hijacking. All file parsing occurs dynamically in-memory and is purged immediately after payload forwarding.

Sub-50ms Latency

Our endpoints route prompt contexts to optimized serverless GPU infrastructure. Output tokens stream directly to client browsers using highly optimized text rendering protocols.

Advanced Feature Set

  • Multi-Engine Select Options: Hot-swap between five highly specialized reasoning, generalist, fast, and visual models on the fly. The conversation context window dynamically formats history to align with the chosen model's parameters.
  • In-Memory Attachment Parser: Convert complex attachments (PDF, DOCX, CSV, TXT) into raw context vectors server-side. Structured rows and text streams are injected directly into system messages as temporary user inputs.
  • Email OTP Verification: Protect your login sessions. During first-time signups, a secure 6-digit OTP code is dispatched directly to your email to verify your identity.
  • Visual Multimodal Analysis: Attach design layouts, flowcharts, or system diagrams directly into the chat interface to analyze and generate code representation structures.
  • Rolling 24-Hour Limits: Premium models (Mixtral, Maverick, Vision) share a unified 10 chats per exactly 24-hour rolling window limit, ensuring optimal load balancing. Nano and Delta models remain fully unrestricted and free for life.

2. Model Directory

NeuroCLI integrates five advanced model tiers, allowing users to toggle between deep mathematical reasoning, fast response cycles, general text summarization, and vision parsing.

Technical Comparison Matrix

Model Name Base Architecture Max Context Target Workload Latency Index
Mistral Mixtral 8x7B Mixtral 8x7B Instruct 32,000 tokens MoE Multilingual Translation 94/100
Llama 4 Maverick 17B Llama 4 Maverick 17B Instruct 128,000 tokens Advanced Reasoning 95/100
Llama 3.2 11B Vision Llama 3.2 11B Vision Instruct 128,000 tokens Multimodal Image Parsing 96/100
NeuroCLI Delta 0.2 NeuroCLI Delta 0.2 (Llama 8B) 128,000 tokens Low-Latency Completions 99/100
NeuroCLI Nano 0.1 NeuroCLI Nano 0.1 (Llama 3B) 128,000 tokens Lightweight Mini-Fast Chat 98/100

Detailed Model Specs

Mistral Mixtral 8x7B (Premium)

Our Mixture of Experts (MoE) model. Powered by Mixtral 8x7B Instruct. It activates sparse active parameters dynamically per token, yielding superb multilingual translations, agentic command executions, and standard QA pipelines.

Llama 4 Maverick 17B (Premium)

A highly balanced advanced reasoning assistant model. Capable of parsing complex language expressions, writing summaries, editing copy layout text, and answering multi-turn contextual questions with high standard accuracy.

Llama 3.2 11B Vision (Premium)

Our lightweight multimodal visual model. Driven by the Llama 3.2 11B Vision architecture. Offers rapid OCR extractions, screenshot interpretations, and chart explanations directly within the chat interface.

NeuroCLI Delta 0.2 (Free Unlimited)

Designed specifically for speed. Operates on NeuroCLI Delta 0.2 (Llama 8B). Features sub-30ms first-token responses and high throughput metrics, making it perfect for rapid drafting, autocomplete widgets, database formatting, and slow networks.

NeuroCLI Nano 0.1 (Free Unlimited)

An ultra-lightweight text generator. Driven by NeuroCLI Nano 0.1 (Llama 3B). It runs highly optimized parameters to deliver responses in a split second, making it ideal for quick dictionary lookups, vocabulary edits, and micro-dialogue.

3. Authentication & Verification

To safeguard session indices and user conversations, NeuroCLI implements a robust authentication protocol utilizing secure passwords and automated Email OTP verification.

Cryptographic Data Flow

During user signup, passwords are not stored directly. The backend database encodes inputs using highly secure pbkdf2:sha256 hashing salts. Concurrently, the server constructs a random, secure 6-digit verification token dispatched via secure SMTP. This token acts as a crucial identity verification layer to prevent unauthorized account creation.

Step-by-Step Security Protocol

  1. Sign Up: Fill out your name, email, and password on the auth panel.
  2. Email Dispatch: Upon successful sign-up, the server dispatches a 6-digit OTP (One-Time Password) to the provided email address via secure SSL tunneling.
  3. Identity Validation: You will be redirected to an OTP verification panel. Supplying the correct code cross-references the server-side token and permanently flags your account as active (is_verified = 1).
  4. Session Persistence: Once verified and logged in, a secure session cookie maps client requests, preventing session hijack attempts.

Password Recovery Workflow

If you forget your password, trigger the 'Forgot Password' workflow. The backend dispatches a fresh 6-digit OTP code to your email. You can then use this code alongside a new password to instantly restore account access safely.

4. File Processing Guide

NeuroCLI supports dynamic file parsing, extracting text segments from plain text, tables, and documents in real-time to enrich chat model inputs.

Supported Formats & Server Extraction Engines

  • Text (.txt)

    Read directly as a standard UTF-8 stream. Extracted raw strings are forwarded directly into user prompt wraps. Max size: 5MB.

  • Comma-Separated Values (.csv)

    Processed line-by-line using Python's native CSV reader library. Grid systems are formatted as clean structured text blocks, maintaining row structures so the language model can parse tabular data relations.

  • PDF Documents (.pdf)

    Parsed server-side via the PyPDF2 parsing wrapper. Recursively extracts text from all page layers and compiles them sequentially into unified paragraph blocks.

  • Word Documents (.docx)

    Processed paragraph-by-paragraph using the python-docx dependency. Formats headers, standard lists, and paragraph components into readable text strings.

Payload Formatting & Constraints

When files are uploaded, they do not permanently store on server drives. The parser reads them in memory, constructs a wrapper block, and appends them to the chat prompt:

Document Contents of [Filename]:
---
[Extracted Text Content]
---

User Question: [User Prompt]

Ensure uploads remain under 5MB to prevent connection timeout. If document contents exceed the model's maximum context window (e.g., 128K tokens for Maverick/Delta/Nano/Vision, or 32K tokens for Mixtral), text limits are safely enforced to preserve chat performance.

5. Developer API

Developers can access the NeuroCLI proxy chat endpoints programmatically after establishing active web sessions.

Chat Completions Endpoint

POST /api/ai/chat

Required Headers:
Content-Type: application/json
Cookie: session=[your_session_cookie]
Request Body Specification:
  • model (string): The underlying target model ID. Supported ids: mistralai/mixtral-8x7b-instruct-v0.1, meta/llama-4-maverick-17b-128e-instruct, meta/llama-3.2-11b-vision-instruct, meta/llama-3.1-8b-instruct, or meta/llama-3.2-3b-instruct.
  • messages (array): List of message objects. Each object must contain role ("user" or "assistant") and content (string, or array of content blocks for vision).
  • temperature (float): Controls randomness (0.0 to 1.0).
  • max_tokens (int): Maximum output tokens constraint.

Developer Code Implementations

Python Script Example:
import requests

url = "http://127.0.0.1:8000/api/ai/chat"
headers = {
    "Content-Type": "application/json"
}
payload = {
    "model": "meta/llama-4-maverick-17b-128e-instruct",
    "messages": [
        {"role": "user", "content": "Write a clean Python SQL search query."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
}

# In production, pass your session cookies inside the request parameters
response = requests.post(url, json=payload, headers=headers)
print(response.json())
Node.js (Fetch) Example:
const fetch = require('node-fetch');

const queryModel = async () => {
    const url = "http://127.0.0.1:8000/api/ai/chat";
    const payload = {
        model: "meta/llama-4-maverick-17b-128e-instruct",
        messages: [{ role: "user", content: "Optimize this loop structure" }],
        temperature: 0.5
    };

    const response = await fetch(url, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify(payload)
    });
    const data = await response.json();
    console.log(data);
};

queryModel();
Standard Response Payload (JSON):
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "To optimize the Python loops, utilize list comprehensions...",
        "role": "assistant"
      }
    }
  ],
  "created": 1780558000,
  "id": "chatcmpl-uuid-string",
  "model": "meta/llama-4-maverick-17b-128e-instruct",
  "object": "chat.completion"
}