N
NSCC AI CLUSTER POC Result

NSCC AI Cluster: Local Infrastructure.
Zero Cloud Costs. Total Governance.

Proof-of-Concept results proving that campus-managed hardware can run enterprise-grade LLMs locally, securely, and with zero subscription invoicing.

Distributed Throughput
57.83 tok/s
Active Model: Mistral 7B
Max Context window
32,768
Full attention span using flash attention.
8k default 32k cached limit
Cluster Footprint
3 Nodes
CPU cores: 36 Cores
RAM capacity: 124 GB RAM
VRAM capacity: 36 GB Total VRAM
Operating Cost
$0.00
Invoiced external API token fees. Completely air-gapped run.

Cloud Cost Avoidance Calculator

Estimate the operational cloud spend avoided by serving prompts locally on the NSCC Cluster.

Combined input & output tokens requested by students/faculty monthly.

Blended average cost (e.g. GPT-4o, Claude Sonnet) per million tokens.

Estimated Monthly Savings
$125.00
Estimated Annual Savings
$1,500.00
Cloud Invoice Cost $125.00
On-Prem Cost $0.00

Executive Comparison: Before vs. After

A side-by-side financial and operational breakdown of local cluster hosting versus vendor-managed public APIs.

NSCC AI Cluster POC Results
The Cloud Approach Our On-Prem Cluster
Monthly Subscription Fees
Escalating operational expenses. Invoices scale linearly with token volume, creating budget unpredictability.
Zero Operational Invoiced Cost
One-time capital hardware cost. Local power and cooling represent the only marginal overhead, unaffected by user requests.
Data Sent Outside College Walls
Prompts leave the campus, traversing third-party servers. Exposes research, student logs, and proprietary data to privacy leak risks.
100% On-Prem & Air-Gapped
Network isolation via isolated VLAN 30. Student submissions and local outputs never exit campus-controlled hypervisors.
Zero Local Infrastructure Assets
Capital departs to hyperscalers. IT staff gain no engineering skills or physical assets that stay inside the college portfolio.
Predictable Local Hardware Asset
Valuable local infrastructure that can be repurposed, clustered, or upgraded physically, strengthening in-house computational research.
Opaque Security Oversight
No access to lower-level networking or model logs. System security relies entirely on third-party security certifications and vendor compliance reports.
Total Admin & Audit Control
Direct access to the hypervisor (Proxmox VE 9.1). Admin dashboards log, rate-limit, and audit all request traffic at the raw hardware and VM levels.

Distributed Architecture

An interactive look at how commodity hardware is bridged over dedicated internal switches to run multi-node LLMs.

VLAN 30 MANAGEMENT NETWORK (10.30.0.0/24) AIProx1 IP: 10.30.0.11 AIProx2 IP: 10.30.0.12 AIProx3 IP: 10.30.0.13 PCIe Passthrough (IOMMU) DEDICATED INTERNAL LLM VM CLUSTER Stage 0: VM 100 IP: 192.168.100.1 (eth1) RTX 4070 (12GB VRAM) Assigned Pipeline Stages: Embeddings + Layers 0-10 Stage 1: VM 200 IP: 192.168.100.2 (eth1) RTX 4070 (12GB VRAM) Assigned Pipeline Stages: Layers 11-21 Stage 2: VM 300 IP: 192.168.100.3 (eth1) RTX 4070 (12GB VRAM) Assigned Pipeline Stages: Layers 22-31 + LM Head GLOO_SOCKET_IFNAME=eth1
Interactive Map

Cluster System Architecture

Click nodes or VMs to inspect detailed hardware maps.

The cluster operates in a distributed pipeline-parallel model across three commodity machines.

Network Pinned Routing: Inter-node tensors are strictly bound to the internal VM-to-VM interface (192.168.100.x) utilizing GLOO_SOCKET_IFNAME=eth1. Traffic never crosses the public management switch, guaranteeing speed and absolute internal security.
Hover over the SVG diagram elements to inspect networking and configuration bindings.

Core Value Pillars

Why hosting our own AI infrastructure makes sense strategically and organizationally.

🛡️

Policy Compliance & Data Governance

Student assignment submissions, administrative queries, and research datasets contain sensitive data. Running models on local VMs (Ubuntu 24.04 LTS guest OS) with persistent local caching at /data/hf-cache ensures data is never analyzed, stored, or processed by external providers.

VLAN 30 (10.30.0.0/24) Isolation
⚙️

Administration & Oversight

Commercial APIs are opaque. Because we control the hypervisor layer via Proxmox VE 9.1, the IT department retains total visibility. We can dynamically reallocate CPU/RAM resources, monitor real-time network flow, set query rate thresholds, and inspect application access logs without relying on third-party SaaS vendors.

Proxmox VE 9.1 Hypervisor Control
🛠️

Engineering Ingenuity

To bypass hardware scaling limits, we implemented targeted optimizations, bypassing the vLLM pipeline initialization crashes using custom Python runtime patches and solving Gloo network constraints to force inter-node traffic onto the secondary Gigabit ring.

# Startup patch correcting pipeline initialization
# in Ray distributed orchestration environments
import os
import vllm.engine.ray_utils as ray_utils

# Patch ray GPU mapping bug during node handshake
def patched_init_workers(*args, **kwargs):
    os.environ["RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES"] = "0"
    return original_init(*args, **kwargs)

original_init = ray_utils.initialize_workers
ray_utils.initialize_workers = patched_init_workers
Live Case Study

Local Curriculum Outcome Mapping

Traditionally, aligning Course Learning Outcomes (LOs) to overall Program Outcomes (POs) is a manual, labor-intensive audit process requiring hours of faculty review. To automate this, we built a fully local curriculum mapping pipeline.

The pipeline ingests raw course syllabus PDFs, extracts structural text into JSON format, and calls the NSCC AI Cluster REST API directly. Using the cluster's Qwen 2.5 32B model, it analyzes alignment logic and builds a comprehensive mapping matrix. The pipeline then auto-generates a formatted Microsoft Word report (LO_PO_Alignment_Report.docx) complete with analysis charts—running entirely on-premise with zero external data exposure.

1

Syllabus PDF Ingestion

Raw course syllabus and program outcome documents are placed in local directories.

2

Text Extraction (extract_curriculum.py)

Extracts text and saves to structured JSON (curriculum_extracted.json) format.

3

Inference & Alignment (llm_align.py)

Queries the local AI Cluster API to run mapping logic and save output matrix.

4

Report & Visuals (generate_report.py)

Generates a formatted MS Word .docx report and charts locally via Python.

Strategic Roadmap

From Proof of Concept to a robust campus-wide enterprise AI network.

1
Phase 1

Pilot Campus Access

Expose the cluster's OpenAI-compatible REST API (serving at http://10.30.0.21:8001/v1) to select IT faculty and a pilot group of students. Leverage client interfaces like AnythingLLM or Open WebUI in a controlled sandbox environment to test latency and routing before broader roll-out.

2
Phase 2

Hardware Upgrade: Stability & Scale

To expand availability, we need to transition to stable, dedicated hardware. While Intel Arc B70s are sold out, we are considering **Radeon AI PRO R9700** or AMD **Radeon PRO W7900** (featuring 48GB VRAM for raw model capacity) or NVIDIA **RTX 4090 / 5090** (the CUDA standard). Swapping out the commodity 12GB RTX 4070 cards resolves the exact 170 MB deficit preventing execution of industry-standard Llama 3.1 70B models.

The Hardware Ceiling

Llama-3.1-70B (Int4 Quantized) requires 36.17 GB of VRAM. The current 3x RTX 4070 cluster offers 36.00 GB, missing the ceiling by a mere 170 MB.

Available Cluster VRAM 36.00 GB
70B Limit (36.17 GB)
Supported Model Parameter Scales Up to 32B Models
3
Phase 3

Collaborative Network (CCN) & Outcome Mapping

Deploy secure access to IT programs across our **College Collaborative Network (CCN)**. Integrate our local outcome mapping tool, nscc-curriculum-outcome-mapping, directly to the backend to automatically align syllabus outcomes securely across member campuses.

4
Phase 4

Enterprise Sandboxes & MS Foundry

Connect the on-premises raw computing power to our **Visual Studio Enterprise subscription M365 Dev Sandboxes**. This enables students to work with enterprise-grade AI payloads using **MS Foundry Local** and **Azure-managed endpoints** in a secure, hybrid college learning ecosystem.