Proof-of-Concept results proving that campus-managed hardware can run enterprise-grade LLMs locally, securely, and with zero subscription invoicing.
Estimate the operational cloud spend avoided by serving prompts locally on the NSCC Cluster.
Combined input & output tokens requested by students/faculty monthly.
Blended average cost (e.g. GPT-4o, Claude Sonnet) per million tokens.
A side-by-side financial and operational breakdown of local cluster hosting versus vendor-managed public APIs.
| NSCC AI Cluster POC Results | |
|---|---|
| The Cloud Approach | Our On-Prem Cluster |
|
Monthly Subscription Fees
Escalating operational expenses. Invoices scale linearly with token volume, creating budget unpredictability.
|
Zero Operational Invoiced Cost
One-time capital hardware cost. Local power and cooling represent the only marginal overhead, unaffected by user requests.
|
|
Data Sent Outside College Walls
Prompts leave the campus, traversing third-party servers. Exposes research, student logs, and proprietary data to privacy leak risks.
|
100% On-Prem & Air-Gapped
Network isolation via isolated VLAN 30. Student submissions and local outputs never exit campus-controlled hypervisors.
|
|
Zero Local Infrastructure Assets
Capital departs to hyperscalers. IT staff gain no engineering skills or physical assets that stay inside the college portfolio.
|
Predictable Local Hardware Asset
Valuable local infrastructure that can be repurposed, clustered, or upgraded physically, strengthening in-house computational research.
|
|
Opaque Security Oversight
No access to lower-level networking or model logs. System security relies entirely on third-party security certifications and vendor compliance reports.
|
Total Admin & Audit Control
Direct access to the hypervisor (Proxmox VE 9.1). Admin dashboards log, rate-limit, and audit all request traffic at the raw hardware and VM levels.
|
An interactive look at how commodity hardware is bridged over dedicated internal switches to run multi-node LLMs.
Click nodes or VMs to inspect detailed hardware maps.
The cluster operates in a distributed pipeline-parallel model across three commodity machines.
192.168.100.x) utilizing GLOO_SOCKET_IFNAME=eth1. Traffic never crosses the public management switch, guaranteeing speed and absolute internal security.
Why hosting our own AI infrastructure makes sense strategically and organizationally.
Student assignment submissions, administrative queries, and research datasets contain sensitive data. Running models on local VMs (Ubuntu 24.04 LTS guest OS) with persistent local caching at /data/hf-cache ensures data is never analyzed, stored, or processed by external providers.
Commercial APIs are opaque. Because we control the hypervisor layer via Proxmox VE 9.1, the IT department retains total visibility. We can dynamically reallocate CPU/RAM resources, monitor real-time network flow, set query rate thresholds, and inspect application access logs without relying on third-party SaaS vendors.
To bypass hardware scaling limits, we implemented targeted optimizations, bypassing the vLLM pipeline initialization crashes using custom Python runtime patches and solving Gloo network constraints to force inter-node traffic onto the secondary Gigabit ring.
# Startup patch correcting pipeline initialization
# in Ray distributed orchestration environments
import os
import vllm.engine.ray_utils as ray_utils
# Patch ray GPU mapping bug during node handshake
def patched_init_workers(*args, **kwargs):
os.environ["RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES"] = "0"
return original_init(*args, **kwargs)
original_init = ray_utils.initialize_workers
ray_utils.initialize_workers = patched_init_workers
Traditionally, aligning Course Learning Outcomes (LOs) to overall Program Outcomes (POs) is a manual, labor-intensive audit process requiring hours of faculty review. To automate this, we built a fully local curriculum mapping pipeline.
The pipeline ingests raw course syllabus PDFs, extracts structural text into JSON format, and calls the NSCC AI Cluster REST API directly. Using the cluster's Qwen 2.5 32B model, it analyzes alignment logic and builds a comprehensive mapping matrix. The pipeline then auto-generates a formatted Microsoft Word report (LO_PO_Alignment_Report.docx) complete with analysis charts—running entirely on-premise with zero external data exposure.
Raw course syllabus and program outcome documents are placed in local directories.
Extracts text and saves to structured JSON (curriculum_extracted.json) format.
Queries the local AI Cluster API to run mapping logic and save output matrix.
Generates a formatted MS Word .docx report and charts locally via Python.
From Proof of Concept to a robust campus-wide enterprise AI network.
Expose the cluster's OpenAI-compatible REST API (serving at http://10.30.0.21:8001/v1) to select IT faculty and a pilot group of students. Leverage client interfaces like AnythingLLM or Open WebUI in a controlled sandbox environment to test latency and routing before broader roll-out.
To expand availability, we need to transition to stable, dedicated hardware. While Intel Arc B70s are sold out, we are considering **Radeon AI PRO R9700** or AMD **Radeon PRO W7900** (featuring 48GB VRAM for raw model capacity) or NVIDIA **RTX 4090 / 5090** (the CUDA standard). Swapping out the commodity 12GB RTX 4070 cards resolves the exact 170 MB deficit preventing execution of industry-standard Llama 3.1 70B models.
Llama-3.1-70B (Int4 Quantized) requires 36.17 GB of VRAM. The current 3x RTX 4070 cluster offers 36.00 GB, missing the ceiling by a mere 170 MB.
Deploy secure access to IT programs across our **College Collaborative Network (CCN)**. Integrate our local outcome mapping tool, nscc-curriculum-outcome-mapping, directly to the backend to automatically align syllabus outcomes securely across member campuses.
Connect the on-premises raw computing power to our **Visual Studio Enterprise subscription M365 Dev Sandboxes**. This enables students to work with enterprise-grade AI payloads using **MS Foundry Local** and **Azure-managed endpoints** in a secure, hybrid college learning ecosystem.