NeuroCLI is a state-of-the-art, secure, and lightning-fast AI chat application built to bridge the gap between advanced large language models and a premium user experience.
NeuroCLI is engineered using a high-efficiency decoupled architecture. The frontend uses highly optimized Vanilla JavaScript and CSS variables to render rich visual details (including Three.js animated particle constellations and 3D card tilt matrices). The backend operates on a Python Flask container that securely proxies client chat requests, manages session databases with SQLite, and provides sandboxed file parsing mechanisms. All external inference queries are routed through encrypted token pipelines, keeping your user keys entirely private.
No user passwords are stored in cleartext. Double-layer token verification codes protect user session cookies from client-side hijacking. All file parsing occurs dynamically in-memory and is purged immediately after payload forwarding.
Our endpoints route prompt contexts to optimized serverless GPU infrastructure. Output tokens stream directly to client browsers using highly optimized text rendering protocols.
NeuroCLI integrates five advanced model tiers, allowing users to toggle between deep mathematical reasoning, fast response cycles, general text summarization, and vision parsing.
| Model Name | Base Architecture | Max Context | Target Workload | Latency Index |
|---|---|---|---|---|
| Mistral Mixtral 8x7B | Mixtral 8x7B Instruct | 32,000 tokens | MoE Multilingual Translation | 94/100 |
| Llama 4 Maverick 17B | Llama 4 Maverick 17B Instruct | 128,000 tokens | Advanced Reasoning | 95/100 |
| Llama 3.2 11B Vision | Llama 3.2 11B Vision Instruct | 128,000 tokens | Multimodal Image Parsing | 96/100 |
| NeuroCLI Delta 0.2 | NeuroCLI Delta 0.2 (Llama 8B) | 128,000 tokens | Low-Latency Completions | 99/100 |
| NeuroCLI Nano 0.1 | NeuroCLI Nano 0.1 (Llama 3B) | 128,000 tokens | Lightweight Mini-Fast Chat | 98/100 |
Our Mixture of Experts (MoE) model. Powered by Mixtral 8x7B Instruct. It activates sparse active parameters dynamically per token, yielding superb multilingual translations, agentic command executions, and standard QA pipelines.
A highly balanced advanced reasoning assistant model. Capable of parsing complex language expressions, writing summaries, editing copy layout text, and answering multi-turn contextual questions with high standard accuracy.
Our lightweight multimodal visual model. Driven by the Llama 3.2 11B Vision architecture. Offers rapid OCR extractions, screenshot interpretations, and chart explanations directly within the chat interface.
Designed specifically for speed. Operates on NeuroCLI Delta 0.2 (Llama 8B). Features sub-30ms first-token responses and high throughput metrics, making it perfect for rapid drafting, autocomplete widgets, database formatting, and slow networks.
An ultra-lightweight text generator. Driven by NeuroCLI Nano 0.1 (Llama 3B). It runs highly optimized parameters to deliver responses in a split second, making it ideal for quick dictionary lookups, vocabulary edits, and micro-dialogue.
To safeguard session indices and user conversations, NeuroCLI implements a robust authentication protocol utilizing secure passwords and automated Email OTP verification.
During user signup, passwords are not stored directly. The backend database encodes inputs using highly secure pbkdf2:sha256 hashing salts. Concurrently, the server constructs a random, secure 6-digit verification token dispatched via secure SMTP. This token acts as a crucial identity verification layer to prevent unauthorized account creation.
is_verified = 1).
If you forget your password, trigger the 'Forgot Password' workflow. The backend dispatches a fresh 6-digit OTP code to your email. You can then use this code alongside a new password to instantly restore account access safely.
NeuroCLI supports dynamic file parsing, extracting text segments from plain text, tables, and documents in real-time to enrich chat model inputs.
Read directly as a standard UTF-8 stream. Extracted raw strings are forwarded directly into user prompt wraps. Max size: 5MB.
Processed line-by-line using Python's native CSV reader library. Grid systems are formatted as clean structured text blocks, maintaining row structures so the language model can parse tabular data relations.
Parsed server-side via the PyPDF2 parsing wrapper. Recursively extracts text from all page layers and compiles them sequentially into unified paragraph blocks.
Processed paragraph-by-paragraph using the python-docx dependency. Formats headers, standard lists, and paragraph components into readable text strings.
When files are uploaded, they do not permanently store on server drives. The parser reads them in memory, constructs a wrapper block, and appends them to the chat prompt:
Document Contents of [Filename]: --- [Extracted Text Content] --- User Question: [User Prompt]
Ensure uploads remain under 5MB to prevent connection timeout. If document contents exceed the model's maximum context window (e.g., 128K tokens for Maverick/Delta/Nano/Vision, or 32K tokens for Mixtral), text limits are safely enforced to preserve chat performance.
Developers can access the NeuroCLI proxy chat endpoints programmatically after establishing active web sessions.
POST /api/ai/chat
Content-Type: application/json Cookie: session=[your_session_cookie]
model (string): The underlying target model ID. Supported ids: mistralai/mixtral-8x7b-instruct-v0.1, meta/llama-4-maverick-17b-128e-instruct, meta/llama-3.2-11b-vision-instruct, meta/llama-3.1-8b-instruct, or meta/llama-3.2-3b-instruct.messages (array): List of message objects. Each object must contain role ("user" or "assistant") and content (string, or array of content blocks for vision).temperature (float): Controls randomness (0.0 to 1.0).max_tokens (int): Maximum output tokens constraint.import requests
url = "http://127.0.0.1:8000/api/ai/chat"
headers = {
"Content-Type": "application/json"
}
payload = {
"model": "meta/llama-4-maverick-17b-128e-instruct",
"messages": [
{"role": "user", "content": "Write a clean Python SQL search query."}
],
"temperature": 0.7,
"max_tokens": 512
}
# In production, pass your session cookies inside the request parameters
response = requests.post(url, json=payload, headers=headers)
print(response.json())
const fetch = require('node-fetch');
const queryModel = async () => {
const url = "http://127.0.0.1:8000/api/ai/chat";
const payload = {
model: "meta/llama-4-maverick-17b-128e-instruct",
messages: [{ role: "user", content: "Optimize this loop structure" }],
temperature: 0.5
};
const response = await fetch(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload)
});
const data = await response.json();
console.log(data);
};
queryModel();
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "To optimize the Python loops, utilize list comprehensions...",
"role": "assistant"
}
}
],
"created": 1780558000,
"id": "chatcmpl-uuid-string",
"model": "meta/llama-4-maverick-17b-128e-instruct",
"object": "chat.completion"
}