Kubeflux
Get Started Contact Us
Flux Platform · Feature Guide · 2026

Flux The Cloud Kubernetes Cost Optimization
Powered by inovative features

Ten complementary optimization engines that together reduce your Kubernetes compute spend by 40–92%. Each feature targets a different layer of waste — from idle workloads and oversized nodes to misconfigured HPA boundaries, unsafe disruption windows, and suboptimal node group/pool composition.

Workload Scaler Node Scheduler Node Autoscaler AI Workload HPA Right-Sizing FinOps Savings Dashboard PDB Operator AI Node Optimizer Smart Provisioner Hotspot Detection Live Cost Lens Installation Guide Contact Us
Feature 01

Workload Scaler

Automatic scale-to-zero for idle Kubernetes workloads, with instant scale-up on demand — eliminating the cost of compute that runs but does nothing.

BUSINESS HOURS 9am → 6pm OFF HOURS 6pm → 9am pod-1 pod-2 pod-3 pod-4 4 pods RUNNING serving traffic 🟢 💰 $1.20/hr 💤 pod-1 💤 pod-2 💤 pod-3 💤 pod-4 0 pods — SCALED TO ZERO no traffic = no cost 🔴→💚 💰 $0.00/hr auto switch

What It Does

The Workload Scaler monitors every Deployment, StatefulSet and DaemonSet in your cluster and tracks real activity. When a workload has been idle for a configurable period — no incoming traffic, no active processing — it scales the workload down to zero replicas. When traffic returns, the workload is scaled back up automatically before the first request is served.

For teams running development, staging, QA, or preview environments, this is where the largest wins come from. These environments are typically provisioned identically to production, used actively for 6–8 hours per business day, and left running at full cost for the remaining 16–18 hours — including nights, weekends, and holidays. The Workload Scaler reclaims that wasted spend without any change to how your team works.

Typical result

40–70% cost reduction on non-production workloads within the first week of enabling this feature.

Zero-cost idle periods

Workloads not serving traffic consume no compute. Pods are removed from nodes entirely, which also reduces node count when the cluster autoscaler consolidates freed capacity. Savings compound.

Instant wake-up on demand

When a request arrives for a scaled-down workload, the scaler detects the signal and scales back up before the request times out. Most services are ready in under 30 seconds.

Scale-to-One mode

For latency-sensitive services that cannot tolerate cold-start delay, Scale-to-One keeps a single replica running at all times — saving 50–80% while eliminating cold-start entirely.

Namespace and service exclusions

Monitoring agents, production workloads, and service meshes must remain running. Exclusion lists at both namespace and service level ensure nothing critical is touched.

Configurable idle timeout

30 minutes for interactive dev environments, 2–4 hours for batch pipelines. The timer resets on any detected activity, so workloads with rare but real usage are never prematurely scaled.

Full audit history

Every scale event is recorded: workload, direction, timestamp, and idle duration. Available in the UI and via API for cost reporting workflows.

How It Stays Safe

Safety guarantees
  • Excluded namespaces are enforced in code, not configuration. System namespaces — kube-system, monitoring, istio-system, cert-manager — are permanently excluded and cannot be removed by any user action.
  • Scale-up happens before traffic is routed. The scaler does not allow traffic to reach a workload until the target pod is in Ready state. Users never see a 503 from a workload that hasn't finished starting.
  • Stateful workloads require explicit opt-in. StatefulSets are only eligible for scaling when explicitly opted in or running in a namespace configured for full scaling.
  • Exclusions survive any toggle state. Disabling the global scaler does not remove entries from the exclusion list. Protections persist regardless of the master switch.

Settings Reference

ParameterTypeDefaultDescription
workload_scaling_status string "disabled" Master control. Set to "enabled" to activate workload scaling. Enable this first before configuring idle timeout and exclusions.
workload_scale_down_one_status string "disabled" Scale-to-One mode. When enabled, idle workloads scale to 1 replica instead of 0. Mutually exclusive with scale-to-zero for the same workload.
duration integer (min) 3 Idle timeout before scale-down. Timer resets on any activity. Recommended: 30–60 min for dev/staging, 120–240 min for batch environments.
excluded_namespaces list[string] [] Namespaces where no workload will be scaled. Additive on top of permanent system exclusions. Add all production and shared infrastructure namespaces here.
excluded_services list[string] [] Individual service names excluded cluster-wide regardless of namespace. Use for specific workloads within an otherwise-eligible namespace.

Recommended timeout by environment type

Environment TypeRecommended Timeout
Interactive dev / staging30–60 minutes
CI/CD preview environments15–30 minutes
Scheduled batch environments120–240 minutes
QA (business-hours use)60 minutes

Setup Sequence

Enable in non-production first

Set workload_scaling_status to "enabled". Add production and shared infrastructure namespaces to excluded_namespaces. Verify the exclusion list covers everything that must always be running.

Observe the first scale cycle

After the first idle timeout period, review the audit history. Confirm only the expected workloads were scaled down and nothing critical was touched.

Tune the timeout

Review how long workloads typically sit idle before use. Adjust the duration setting to match your team's actual usage patterns.

Enable Scale-to-One for latency-sensitive services (optional)

For services where cold-start delay would disrupt workflow, enable workload_scale_down_one_status for those specific services rather than globally.

Typical savings by environment profile

Environment ProfileTypical Working HoursTypical Saving
Dev/staging, 8-hour business day8 / 2460–70%
Preview / review environments4 / 2475–85%
QA, shared across teams10 / 2455–65%
Mixed with production workloadsVariable30–50%
Feature 02

Node Scheduler

Time-based node group/pool scaling that provisions the right infrastructure before you need it and removes it the moment you don't — without waiting for reactive signals.

Your cluster size — automatically follows your schedule Mon Tue Wed Thu Fri Sat Sun Mon 10 2 10 2 10 2 10 2 10 2 1 1 Business hours 10 nodes Off hours 2 nodes Weekend 1 node 💰 Saves ~78% vs always-on

What It Does

The Node Scheduler lets you define exactly when your cluster should be large and when it should be small. Instead of reacting to load after pods are already pending, you schedule capacity changes in advance: scale up 10 minutes before business hours begin, scale down to a minimal footprint overnight and through the weekend.

Most Kubernetes autoscaling is reactive — it waits for a signal and then acts. This means you are always slightly behind. The Node Scheduler eliminates this gap entirely for predictable workloads by making the scaling decision before the demand change occurs.

Predictive capacity

Scale up before demand arrives. Your workloads never experience scheduling latency from waiting for a new node to join the cluster during a planned traffic ramp.

Aggressive overnight scale-down

At the end of the working day, scale to minimum footprint immediately — no cooldown hesitation. A 10-node to 2-node reduction overnight saves 80% of nightly compute cost.

Instance type selection per schedule

Each schedule specifies the instance type and capacity type. Run on-demand during business hours for reliability; switch to spot overnight to stack two savings mechanisms simultaneously.

Multiple concurrent schedules

A weekday business-hours schedule, a Saturday maintenance schedule, a Sunday zero-capacity schedule, and holiday overrides — all composing together without conflict.

Date-range and day targeting

Target specific days of the week, a calendar date range, or a 24/7 baseline — covering sprint cycles, reporting periods, planned maintenance, and seasonal patterns without manual intervention.

Schedule history and audit

Every scheduled scaling action is logged with name, timing, target configuration, and outcome. Queryable from the UI and API.

How It Stays Safe

Safety guarantees
  • Minimum node floor always respected. Every schedule respects min_on_demand_nodes. The cluster is never reduced below your defined minimum, regardless of what a schedule specifies.
  • Nodes are drained before removal. Pods are gracefully evicted and rescheduled on remaining nodes before a node is removed from the group. No pod is killed mid-request.
  • PDB constraints are respected on drain. If the Auto-PDB Operator is running, drain operations wait for safe windows rather than forcing evictions that would violate availability guarantees.
  • Conflicts resolve to the more conservative setting. If two schedules overlap, the scheduler applies the higher node count. Availability is never sacrificed due to a scheduling conflict.

Settings Reference

ParameterTypeDefaultDescription
node_scheduler_status string "disabled" Master control. All configured schedules are paused when disabled. Previously applied changes remain in place; the scheduler does not reverse them on disable.
min_on_demand_nodes integer 1 Absolute minimum on-demand nodes that must remain running at all times. Applied to every schedule. Set to 2+ for clusters running databases or stateful services.

Schedule object fields

FieldTypeDescription
namestringHuman-readable identifier shown in the UI and audit log
scheduleTypestring"24/7", "specificDays", or "dateRange"
specificDayslistDays of week: "Mon", "Tue", etc.
startAt / endAtstringTime in HH:MM format (local cluster timezone)
instanceTypestringCloud instance type during this schedule (e.g. m5.large, Standard_D4s_v3)
capacityTypestring"On-Demand" or "Spot"

Typical savings by usage pattern

Usage PatternNode Hours Saved / WeekTypical Saving
8h/day weekdays only (40h/168h)128 hours55–75%
12h/day weekdays only (60h/168h)108 hours45–65%
Batch: 4h/night, 5 nights (20h/168h)148 hours75–85%
24/7 with weekend scale-down48 hours30–40%

Setup Sequence

Enable the scheduler

Set node_scheduler_status to "enabled". No schedules are active until you create them, so enabling the engine has no immediate effect on your cluster.

Create a single off-hours schedule first

Start with an overnight or weekend scale-down before adding a scale-up schedule. This validates drain behaviour for your workloads before you depend on the scheduler for morning provisioning.

Add the scale-up schedule

Once confident in scale-down behaviour, add the business-hours scale-up schedule. Set startAt 10–15 minutes before your first expected traffic to give nodes time to join and become ready.

Review the first full cycle

After the first complete weekday cycle, review the schedule history table and node cost data. Confirm timing is aligned with your team's usage patterns and adjust if needed.

Feature 03

Node Autoscaler

Reactive, policy-driven node scaling that adds capacity exactly when pods cannot be scheduled and removes underutilised nodes the moment they become unnecessary.

Cluster grows when traffic arrives — shrinks when it leaves LOW TRAFFIC TRAFFIC SPIKE LOW TRAFFIC node node 2 nodes 5 nodes — auto scaled ✓ 2 nodes scale UP scale DOWN

What It Does

The Node Autoscaler continuously monitors your cluster's scheduling state and node utilisation. When pods are waiting to be scheduled because no existing node has enough free capacity, the autoscaler adds a new node. When nodes run at low utilisation for longer than a configurable window, the autoscaler removes them and reschedules their workloads onto the remaining capacity.

Unlike the Node Scheduler, which acts on a pre-defined timetable, the Node Autoscaler responds to what is actually happening in real time. The two features are designed to work together: the scheduler handles predictable demand patterns, and the autoscaler handles the variance — unexpected traffic spikes, ad-hoc deployments, and gradual load growth between schedule cycles.

Capacity that tracks actual demand

Without automated scaling, clusters are sized for peak load permanently. The autoscaler matches node count to actual workload at every point — the cloud bill curves with usage rather than sitting flat at peak provisioning.

Configurable scale-down behaviour

Four parameters let you tune the tradeoff between aggressive cost savings and protection against premature removal: unneeded time, post-add delay, scan interval, and utilisation threshold.

Intelligent pending pod analysis

Distinguishes between pods waiting for genuine capacity and pods temporarily unscheduled due to normal scheduling lag. Only genuine shortages trigger scale-up — preventing unnecessary node additions.

Historical scaling visibility

Every node addition and removal is recorded with reason, count before/after, and timestamp. Available as a chart and event table in the UI, and via API for incident analysis.

Cost impact attribution

Scale-down events produce quantified savings; scale-up events record their cost. The financial effect of autoscaler behaviour is transparent and measurable, not just an assumed benefit.

Complements the Node Scheduler

The scheduler handles predictable patterns; the autoscaler handles variance within those windows. Together they are more efficient than either feature alone.

                   ┌─────────────────────────────────────┐
                   │           Node Autoscaler            │
                   │                                      │
  Kubernetes API ──│─▶  Pending Pod Watcher               │
                   │         │                            │
                   │         ▼                            │
                   │    Genuine shortage? ──No──▶  Skip   │
                   │         │ Yes                        │
                   │         ▼                            │
                   │    Scale-Up Decision ─────────────────│──▶ Cloud Provider API
                   │                                      │    (add node to group)
  Kubernetes API ──│─▶  Utilisation Scanner               │
                   │         │                            │
                   │    Below threshold?                  │
                   │    For long enough?  ──No──▶  Wait   │
                   │    After cooldown?                   │
                   │         │ Yes                        │
                   │         ▼                            │
                   │    Scale-Down Decision ───────────────│──▶ kubectl drain
                   │    (check PDB, min nodes)            │    Cloud Provider API
                   └─────────────────────────────────────┘
                               │
                               ▼
                         MongoDB / UI
                     (event log, cost data)
      

How It Stays Safe

Safety guarantees
  • Cooldown after scale-up prevents thrashing. After adding a node, the autoscaler waits for scale_down_delay_after_add before evaluating any node for removal.
  • Utilisation threshold prevents premature removal. Nodes hosting even light but sustained workloads are never removed — only nodes below threshold for the full scale_down_unneeded_time window.
  • PDB constraints are fully respected on drain. If a drain would violate a PDB, the node is not removed. It remains eligible and drain is retried when the disruption window allows.
  • Minimum node count is always preserved. The autoscaler will never reduce the cluster below min_on_demand_nodes, regardless of utilisation measurements.

Settings Reference

ParameterTypeDefaultDescription
node_autoscaler_status string "disabled" Master control. When "enabled", continuously monitors and applies scaling decisions.
scale_down_unneeded_time integer (min) 5 How long a node must remain below utilisation threshold before it is eligible for removal. Increase to 20–30 min for bursty traffic; decrease to 5–10 min for batch/dev clusters.
scale_down_delay_after_add integer (min) 5 Cooldown after a scale-up event before any node is considered for removal. Prevents thrashing during short-lived load spikes. Increase to 15–20 min for highly variable traffic.
scale_down_utilization_threshold float (0–1) 0.5 CPU utilisation fraction below which a node is considered underutilised. 0.5 = 50%. Use 0.6–0.7 for dev clusters; 0.3–0.4 for production with predictable load.

Setup Sequence

Enable with conservative defaults

Set node_autoscaler_status to "enabled". Leave scale_down_unneeded_time at 10–15 min and scale_down_utilization_threshold at 0.5. This gives active autoscaling without aggressive scale-down.

Observe one week of history

Review the autoscaling event table. Look for: unnecessary scale-up events triggered by transient lag, thrashing (scale-down immediately followed by scale-up), and nodes that stayed underutilised for long periods.

Tune threshold and timing

Based on observations, adjust scale_down_utilization_threshold and scale_down_unneeded_time. Most clusters need one or two iterations before reaching a stable configuration.

Feature 04

Workload HPA Right-Sizing

AI-powered analysis of real workload behaviour that automatically corrects over-provisioned resource requests, right-sizes HPA boundaries, and eliminates the CPU and memory waste that accumulates in every long-running Kubernetes cluster.

Stop paying for resources your app never uses BEFORE — GUESSED LIMITS CPU limit: 4 cores Memory: 8 GB actual usage 💸 WASTED $480 / month AI LEARNS real usage AFTER — AI RIGHT-SIZED CPU limit: 1.2 cores Memory: 2.4 GB fits perfectly ✓ $144 / month — saves 70%

What It Does

Every Kubernetes workload has two sets of numbers attached to it: resource requests (what the pod tells the scheduler it needs) and HPA configuration (how many replicas the autoscaler is allowed to run). Both are almost always wrong.

Resource requests are set at deployment time based on estimates, then never revisited. HPA boundaries have the same problem — minReplicas: 2 set in 2022 that was never justified by load testing, maxReplicas: 20 that was just a comfortable-feeling upper bound. HPA Right-Sizing replaces manual guesswork with continuous, ML-driven analysis of what your workloads actually do.

Why this matters at scale

A workload requesting 2 CPU cores with P99 usage of 0.4 cores wastes 1.6 cores of reserved capacity. Multiplied across hundreds of workloads, these corrections produce a substantial reduction in the resource footprint the scheduler must provision for — which directly reduces node count.

Resource requests that reflect reality

Monitors actual CPU and memory consumption and computes P50/P95/P99 distributions. Recommendations account for both normal operation and peak periods without over-provisioning for absolute outliers.

HPA boundaries calibrated to traffic

Analyses historical replica counts alongside request rates to find the minimum replica count that keeps latency within bounds, and a maximum that represents a realistic ceiling — not a feared upper limit.

Predictions before applying changes

For every workload, a forecast of future resource needs is surfaced in the UI before any change is applied. You review the model's projection and validate it against upcoming traffic before approving.

Dry-run preview

Before any recommendation is applied, run a full dry-run that shows exactly what would change: every workload's current config, the recommended replacement, and the projected impact.

Canary rollout with auto-rollback

Changes are applied progressively. If the workload shows resource pressure — OOMKill events, throttling, error rate increase — the change is automatically rolled back. No manual monitoring required per workload.

Namespace and workload exclusions

Workloads with intentionally conservative resource requests — for compliance, SLA, or contractual reasons — can be protected while the rest of the cluster benefits from continuous optimisation.

How It Stays Safe

Safety guarantees
  • Recommendations always include headroom. Requests are set to P95/P99 values plus a safety buffer. Workloads always have room to absorb normal variance without hitting their limits.
  • Limits are never set below current spikes. If a workload has spiked to 800 Mi of memory at any point in the observation window, the recommended memory limit will be at or above 800 Mi.
  • HPA changes respect PDB constraints. A configuration update that would create a PDB conflict is flagged and not applied until the conflict is resolved.
  • Single-replica changes require explicit approval. Reducing minReplicas to 1 is flagged as higher-risk and requires explicit approval regardless of what the data suggests.
  • Rollback is always available. For every applied change, the previous configuration is stored and can be restored with a single action from the UI or API.

Settings Reference

ParameterTypeDefaultDescription
auto_updater_status string "disabled" Controls whether resource request and limit recommendations are automatically applied. Observe recommendations for 1–2 weeks before enabling.
hpa_updater_status string "disabled" Controls whether HPA configuration recommendations (min/max replicas, target utilisation) are automatically applied. Independent of auto_updater_status. Enable after resource right-sizing is stable.
excluded_namespaces list[string] [] Namespaces excluded from HPA right-sizing analysis. Workloads in these namespaces receive no recommendations and are never modified.
excluded_workloads list[object] [] Individual workloads excluded by {namespace, workload_name}. Use for workloads with deliberately unusual resource profiles — GPU services, large JVM heaps, or event-driven spike workloads.

Typical impact by cluster profile

Cluster ProfileCPU ReductionMemory ReductionHPA Impact
Legacy cluster, requests set at launch and never updated50–75% over-request40–60% over-requestMin replicas 30–50% too high
Active cluster, occasional updates25–45% over-request20–35% over-requestMax replicas often too low
Recently right-sized manually10–20% ongoing drift10–20% ongoing driftContinuous correction

Setup Sequence

Analysis only — Week 1–2

Keep both auto_updater_status and hpa_updater_status disabled. Let the engine build its observation window. Review recommendations in the UI and validate against your knowledge of each workload.

Enable resource right-sizing — Week 3

Enable auto_updater_status. The engine applies CPU and memory corrections during low-traffic periods. Monitor for one sprint. Check for OOMKill events or throttling that might indicate an over-aggressive recommendation.

Enable HPA right-sizing — Week 5+

Once resource request changes have been running stably for two weeks, enable hpa_updater_status. HPA changes will be applied with the canary rollout mechanism active, with automatic rollback if issues are detected.

Feature 05

FinOps Savings Dashboard

Unified financial visibility across every optimisation running in your cluster — real numbers, real time, with full audit history and export for reporting.

One dashboard — see exactly what you've saved, and where TOTAL SAVED $12,400 this month WORKLOAD SCALER $6,200 scale-to-zero savings NODE AUTOSCALER $3,800 right-sized capacity NODE OPTIMIZER $2,400 spot + right-type Daily savings trend +38%

What It Does

The FinOps Savings dashboard aggregates the cost impact of every active optimisation feature into a single view that answers the question your leadership team will ask: how much money is this actually saving us?

Most cost-monitoring tools show you what you spent. The Savings dashboard shows you what you spent compared to what you would have spent without optimisation — and it shows the difference, period by period, feature by feature, in concrete currency values anchored to your own pre-optimisation baseline, not a vendor-supplied benchmark.

Total Savings
$∑
Cumulative cost reduction over the selected time window
Current Cost
$↓
Actual cluster compute spend over selected window
Baseline Cost
$→
Estimated spend if no optimisations had been applied
Cost / Minute
Current rate of saving being realised in real time
Optimisation Duration
Total time optimisations have been active
Node Scaler Savings
🖥
Subset of savings attributed specifically to node-level actions
Realised Savings Chart

Two lines: actual spend and what spend would have been without optimisation. The widening gap is your saving. As the divergence grows over time, you see the compounding effect of continuous optimisation.

Time-window flexibility

1 day, 7 days, 15 days, 30 days, since start of month, or since start of year. Matches the reporting cadence of engineering standups, monthly cost reviews, and quarterly business reviews.

Daily Cost Table

Day-by-day breakdown: baseline cost, actual cost, saving. Identify specific days where cost spiked unexpectedly, validate scheduled scale-downs, produce raw data for chargeback conversations.

Complete Audit Log

Every optimisation action logged with: which feature triggered it, which resource was affected, what action was taken, when, cost impact, and duration. Complete record for compliance, change management, or internal audit.

CSV Export

Full savings dataset and audit log exported to CSV at any time. Imports directly into Excel, Google Sheets, or your BI tool. Use for board reporting, finance reconciliation, or external cost platform integration.

Per-feature attribution

What the Workload Scaler contributed, what Node Scheduler windows saved, what HPA right-sizing freed up. Each component's saving is attributed separately so you know where the value is coming from.

How Savings Are Calculated

Baseline

Derived from your cluster's measured compute spend before the optimisation platform was deployed. Collected during an initial observation period before enabling optimisation features. Anchored to actual historical data from your cluster — not a theoretical scenario or industry benchmark.

Cost per resource unit

Calculated using actual on-demand pricing for instance types in your cluster, retrieved from your cloud provider's pricing API. Spot instances are valued at their actual charged price. Reported figures match your cloud bill.

Saving attribution by feature

FeatureSaving Attributed When
Workload ScalerWorkload is at zero replicas; associated node capacity freed
Node SchedulerCluster running at reduced node count during a schedule window
Node AutoscalerNode removed due to measured low utilisation
HPA Right-SizingReduced resource requests enable higher pod density per node
Node OptimizerNode group reconfigured to lower-cost instance types or mix
Attribution principle

A saving is only recorded when the relevant optimisation action is confirmed to have completed successfully. Projected savings from pending actions are shown separately and are not included in the realised total.

Recommended Use by Audience

For engineering teams — weekly

Review using the 7-day window during sprint retrospectives. Look for days where savings were lower than expected — this may indicate a new workload added to an excluded namespace, or an idle timeout set too conservatively. The audit log surfaces every action.

For engineering managers — monthly

Use the 30-day view. Export the daily cost table to CSV and include the realised savings total in your monthly infrastructure report. Direct, auditable line between platform tooling investment and monthly cloud spend reduction.

For finance and procurement — quarterly

Use the "1st of Year" window for annualised data. The daily cost table and audit log CSV export together provide the detail required for cloud spend reconciliation and vendor negotiation conversations.

Example annual review output

Sample reporting output

Annual cluster compute spend: $201,600
Without optimisation (baseline extrapolated): $341,000
Total annual saving: $139,400 (40.9%)
Saving vs. platform subscription cost: 9.7× ROI

Time window reference

Window OptionDescriptionTypical Use
1 DayRolling 24-hour windowIncident investigation, same-day validation
7 DaysRolling 7-day windowWeekly engineering standups (default)
15 DaysRolling 15-day windowSprint-level reporting
30 DaysRolling 30-day windowMonthly finance reporting
1st of MonthCalendar month to dateMonth-over-month cost tracking
1st of YearYear to dateAnnual reporting, ROI calculation
Feature 06

Auto-PDB Operator

Automatic Pod Disruption Budget management for every workload in your cluster — with intelligent calculation, single-replica drain protection, and zero-downtime node maintenance built in.

A safety shield that keeps your app alive during updates WITHOUT PDB OPERATOR — danger ⚠️ Kubernetes drains all pods at once DOWN DOWN DOWN DOWN 🚨 APP IS DOWN — OUTAGE! 100% pods unavailable PDB OPERATOR WITH PDB OPERATOR — safe ✓ Drains 1 pod at a time — always ≥2 running UP ✓ UP ✓ UP ✓ updating 🛡 PDB APP STAYS ONLINE ✓

What It Does

The Auto-PDB Operator continuously reconciles PodDisruptionBudgets across all workloads in a cluster. It scans Deployments, StatefulSets, and DaemonSets, calculates optimal PDB values based on replica count, workload type, HPA configuration, and criticality, then creates or updates PDB resources accordingly. The operator runs on a configurable schedule (default: every 1 minute) and persists reconciliation history for full audit visibility.

Without PDBs, cluster operations — node drains during scale-down, rolling updates, cluster upgrades — can evict all replicas of a workload simultaneously. This turns routine infrastructure maintenance into a production incident. The PDB Operator is the safety layer that makes every other optimisation feature in this platform safe to run aggressively.

Why this matters for optimization

Every node drain triggered by the Node Scheduler, Node Autoscaler, or Node Optimizer runs through PDB checks. Without PDBs in place, a scale-down that removes three nodes simultaneously could take down all replicas of a critical service. The PDB Operator ensures optimisation actions never cross the line into an outage.

Fully automatic PDB lifecycle

PDBs are created, updated, and deleted automatically as workloads are deployed, scaled, and removed. No manual PDB management. No PDBs left behind when workloads are deleted.

HPA-aware calculation

When a workload has an HPA, the operator uses effective replica count — the worst-case minimum the HPA could scale to — rather than the current count. A deployment at 5 replicas with an HPA minimum of 1 is treated as a single-replica workload for PDB purposes.

Database quorum detection

Automatically detects database StatefulSets (PostgreSQL, MySQL, MongoDB, Redis, Kafka, etcd, and more) by inspecting labels and workload names. Database workloads receive quorum-based PDBs: minAvailable = (n/2) + 1.

PreScale Manager for single-replica workloads

Single-replica workloads with preScale policy are automatically scaled 1→2 before a drain, held until the new pod is Ready on a different node, then scaled back 2→1 after the drain completes. Zero downtime, fully automatic.

Critical namespace and workload escalation

Workloads in critical namespaces or flagged as critical receive stricter PDB values automatically — higher minAvailable floors and lower maxUnavailable ceilings — without any manual annotation.

Orphan cleanup

PDBs whose corresponding workload no longer exists are automatically removed each reconciliation cycle, keeping the cluster clean and preventing stale PDBs from blocking future drain operations.

┌──────────────────────────────────────────────────────────────┐
│                       PDB Operator                            │
│                                                               │
│  ┌─────────────┐   ┌───────────────┐   ┌──────────────────┐   │
│  │  Scheduler  │──▶│  Calculator  │──▶│  K8s API Server  │   │
│  │             │   │               │   │  (create/update  │   │
│  └─────────────┘   │  • Gather     │   │   PDB resources) │   │
│                    │  • Calculate  │   └──────────────────┘   │
│  ┌─────────────┐   │  • Build PDB  │                          │
│  │  PreScale    │  └───────────────┘    ┌─────────────────┐  │
│  │  Manager     │─────────────────────▶│  MongoDB        │   │
│  │  (node watch)│                       │  • Run history  │  │
│  └─────────────┘                        │  • PreScale logs│  │
│                                         └─────────────────┘  │
└──────────────────────────────────────────────────────────────┘

PDB Calculation Logic

The operator calculates an effective replica count representing the worst-case minimum a workload can have — accounting for HPA lower bounds. PDB values are then derived from this effective count.

Deployments

Effective ReplicasStrategyValueReasoning
1Policy-basedSee Single Replica PoliciesDefers to SingleReplicaPolicy setting
2–3minAvailablen−1 (absolute)Small deployment: keep at least 1 pod available at all times
4+minAvailable50%Percentage-based for larger deployments — scales with replica count

StatefulSets

Effective ReplicasDatabase?StrategyValue
1AnyPolicy-basedDefers to SingleReplicaPolicy
2+Yes (auto-detected)minAvailable(n/2)+1 — quorum requirement for data consistency
2+NominAvailable67% — two-thirds majority for availability

DaemonSets

StrategyValueReasoning
maxUnavailable10% (minimum 1)Allows gradual rolling drain across nodes without taking down the entire DaemonSet

Critical workload escalation

ClassificationminAvailable floormaxUnavailable cap
Critical namespaceAt least 67%Maximum 25%
Critical workload classificationAt least 75%Maximum 20%

PreScale Manager — Zero-Downtime Single-Replica Drains

The preScale policy is the recommended default for production clusters. It provides the safety of a blocked drain with fully automatic handling — no manual intervention required during node maintenance.

Single Replica Policy comparison

PolicyPDB CreatedBehaviour on DrainDowntimeAutomation
exemptNoPod evicted immediatelyYesNone
allowYes (maxUnavailable: 1)Pod evicted immediatelyYesNone
blockYes (maxUnavailable: 0)Drain blocked until manual scale-up → drain → scale-downNoManual
preScaleYes (maxUnavailable: 0)Auto scale 1→2, wait Ready, drain, scale 2→1NoFully automatic
Node Cordoned (kubectl cordon / drain)
     │
     ▼
┌──────────────────────────────┐
│ Node Watcher detects          │
│ Unschedulable = true          │
└────────┬─────────────────────┘
         ▼
┌──────────────────────────────┐
│ Find single-replica workloads │
│ on this node using preScale   │
└────────┬─────────────────────┘
         ▼
┌──────────────────────────────┐
│ Scale 1 → 2                  │  State: scaling_up
│ (patch replicas)              │
└────────┬─────────────────────┘
         ▼
┌──────────────────────────────┐
│ Wait for new pod Ready        │  State: wait_ready
│ on a DIFFERENT node           │  (polls every 5s, timeout 5m)
└────────┬─────────────────────┘
         ▼
┌──────────────────────────────┐
│ Drain evicts old pod          │  State: drainable
│ (PDB allows with 2 replicas)  │
└────────┬─────────────────────┘
         ▼
┌──────────────────────────────┐
│ Scale 2 → 1                  │  State: completed
│ (restore original count)      │
└──────────────────────────────┘
PreScale failure safeguards
  • Rollback on ready timeout. If the new pod does not become Ready within 5 minutes, the operator rolls back to the original replica count and marks the operation as failed.
  • Drainable TTL. Records in drainable state expire after max(2 × readyTimeout, 10 minutes). If the drain was cancelled or the node was uncordoned, the operator forces a scale-down regardless.
  • Per-workload policy override. Individual workloads can override the global policy via annotation: pdb.terakube.io/single-replica-policy: "allow"

Settings Reference

ParameterTypeDefaultDescription
SingleReplicaPolicy string "preScale" Global policy for single-replica workloads. Options: exempt, allow, block, preScale. Use preScale for production — zero downtime, fully automatic.
DryRun boolean false When true, performs all calculations and logs what it would do, but does not create, update, or delete any PDB resources. Reconciliation runs are still saved with [DRY RUN] prefix. Recommended for initial deployment.
ReconcileInterval duration 1m How frequently the operator scans all workloads and reconciles PDB state. Shorter intervals keep PDBs more current for rapidly-changing clusters.
ReadyTimeout duration 5m Maximum time the PreScale Manager waits for the new pod to become Ready before rolling back the scale-up.
CleanupOrphanedPDBs boolean true When enabled, PDB resources whose corresponding workload no longer exists are automatically deleted each reconciliation cycle.
CriticalNamespaces list[string] [] Namespaces whose workloads receive stricter PDB values: minAvailable raised to at least 67%, maxUnavailable capped at 25%.
ExcludedNamespaces list[string] [] Namespaces where no PDB will be created or managed. Use for test or dev namespaces where availability is not a concern.
ExcludedWorkloads list[string] [] Individual workload names excluded from PDB management. Reserve for batch jobs, dev tooling, or other workloads that genuinely should not have disruption protection.

Best Practices

Recommended operating procedures
  • Start with dry run. Enable DryRun: true when first deploying the operator. Review the planned PDB calculations before applying real changes to the cluster.
  • Use preScale for production. The preScale policy eliminates downtime for single-replica workloads during drains without requiring manual intervention. It is the recommended default for any cluster where availability matters.
  • Populate CriticalNamespaces. Ensure all production namespaces are listed so they receive stricter PDB values automatically without per-workload annotation.
  • Test node drains early. After enabling the operator, validate behaviour with kubectl drain <node> --ignore-daemonsets on a non-production node and confirm preScale operations complete successfully.
  • Monitor reconciliation runs. Review runs with status: "partial" or status: "failed" to catch configuration issues early. Full history is available in the platform audit log.
  • Use per-workload annotation overrides sparingly. Reserve annotation overrides (pdb.terakube.io/single-replica-policy) for workloads that genuinely need different behaviour, such as non-critical dev services where brief downtime during drains is acceptable.
Feature 07

Node Optimizer

AI-powered node pool composition analysis that identifies where your cluster is over-spending on the wrong instance types and recommends — or automatically applies — a lower-cost configuration without touching a single workload.

Pick the right machine. Pay spot prices. Save 70%+. BEFORE — Wrong instance type m5.4xlarge 16 vCPU · 64 GB · $0.768/hr pod pod 💸 85% of node wasted 🤖 AI picks better node AFTER — Right size + Spot ✓ t3.medium (SPOT) 2 vCPU · 4 GB · $0.009/hr pod ✓ pod ✓ –98.8% cost $0.768 → $0.009 per hour

What It Does

Most organisations running Kubernetes on AWS EKS or Azure AKS are over-provisioned by 60–90%. This happens naturally: teams provision for peak load, deployments accumulate reserved capacity that is rarely used, and no single person has full visibility across the cluster. Node Optimizer fixes this systematically.

The engine analyses real workload behaviour — not just what resources pods claim to need, but what they actually use — and identifies the optimal combination of instance types and on-demand/spot mix that delivers the same workload capacity at materially lower cost. A cluster running at 0.76% CPU utilisation across four nodes is a common real-world pattern. Node Optimizer identifies this and proposes a configuration that costs 90% less while maintaining the same workload capacity.

Typical savings range

Over-provisioned clusters with no spot usage typically see 70–92% cost reduction. Clusters with reasonable utilisation and no spot see 40–65%. Already-optimised clusters with mixed on-demand/spot see 20–40% ongoing improvement as the engine tracks drift over time.

Smarter instance selection

Evaluates the full AWS and Azure instance catalogs to find the combination of sizes and families that best matches your actual workload profile. Larger, more expensive types are replaced where smaller ones are sufficient.

On-demand / spot balancing

Calculates the right mix — enough on-demand nodes to keep critical workloads running, enough spot to maximise savings — and maintains that balance automatically, including recovery after spot interruptions.

Automatic spot interruption recovery

When a spot node is reclaimed by the cloud provider, the engine responds without human intervention: it temporarily scales up on-demand capacity to absorb displaced workloads, finds new spot capacity, then restores the original mixed setup and releases the temporary nodes.

Intelligent pending pod resolution

Distinguishes between pods waiting due to genuine capacity shortage and those experiencing normal scheduling lag. Only true shortages trigger a scale-up, preventing unnecessary node additions from transient states.

Preview before applying

Every recommendation surfaces as a side-by-side view: current cluster composition and cost vs. recommended composition and projected cost. Nothing is applied until you review and approve — or until auto-apply triggers after passing all safety gates.

Confidence and savings thresholds

Recommendations below 70% model confidence are never auto-applied. Recommendations projecting less than your configured savings threshold are surfaced for review but not executed automatically.

How It Stays Safe

Safety guarantees
  • It never increases your bill. A recommendation is only applied if it produces a strictly positive saving. If projected costs would equal or exceed current costs, no action is taken.
  • It never regresses a cluster you already optimised. If your cluster is already running at optimal cost, a new cycle will not apply a marginal recommendation that risks destabilising the current configuration.
  • It never touches system infrastructure. Namespaces hosting cluster-critical components — networking, certificate management, service mesh, Kubernetes internals — are permanently excluded and cannot be overridden.
  • It requires a confidence threshold. Recommendations below 70% model confidence are not auto-applied, regardless of projected savings.
  • It requires a minimum savings threshold. By default, auto-apply only activates when projected savings exceed 20% of current spend. This filters out marginal changes that may not justify an infrastructure update.
  • Spot interruptions are handled without permanent changes. Recovery from spot reclamation uses temporary on-demand nodes. No permanent configuration changes are made under pressure.

Settings Reference

ParameterTypeDefaultDescription
auto_optimization string "disabled" Controls whether the analysis engine runs. When "enabled", the engine collects cluster metrics, runs the optimisation model, and generates recommendations. Enable this first — review recommendations before enabling auto_apply.
auto_apply boolean false Controls whether approved recommendations are automatically applied. When false, recommendations are stored for manual review. When true, recommendations that pass all safety gates are applied without manual approval. Also starts the pending pod watch.
prometheus_url string (URL) auto-discovered URL of your Prometheus HTTP API for GPU utilisation metrics. If empty, the engine scans Kubernetes services and auto-discovers the Prometheus endpoint.

Typical savings by cluster profile

Cluster ProfileTypical Saving
Over-provisioned, 0% spot, multiple nodes70–92%
Reasonable utilisation, 0% spot40–65%
Mixed on-demand / spot, well-sized20–40%
Already optimised with spot10–25% ongoing

Supported platforms

PlatformNode Group ManagementSpot / Preemptible
AWS EKSEKS Managed Node GroupsEC2 Spot via Mixed Instances Policy
Azure AKSAKS Agent PoolsAzure Spot Node Pools

Recommended Setup Sequence

Week 1 — Observe

Enable auto_optimization only. Leave auto_apply disabled. Review recommendations generated over the first week: projected savings, confidence scores, and the specific node groups targeted. This builds familiarity with the model's behaviour on your specific cluster before any changes are applied.

Week 2 — Validate

Review whether recommendations are consistent and sensible.

Week 3 onwards — Automate

Enable auto_apply. The engine will now apply recommendations that pass all safety gates without requiring manual approval. Monitor the cost dashboard weekly. All applied recommendations are logged with full detail including savings projections, confidence scores, and the specific actions taken.

Combination tip

Node Optimizer works best when combined with the Workload HPA Right-Sizing and Node Autoscaler. The Optimizer selects the right instance types and spot mix; the Resources right-sizing handles over-provisioned resources ; the Autoscaler handles real-time variance. Together they deliver the full potential saving across all three dimensions: instance cost, capacity timing, and utilisation efficiency.

Feature 08

Smart Provisioner

ML-driven workload placement intelligence that predicts exactly how much capacity each workload needs before it scales — eliminating the reactive guesswork that causes both over-provisioning and scheduling failures.

Predicts capacity before you need it — pods never wait OLD WAY — REACTIVE 1. Pod needs to run 2. Pod is PENDING... Pending 3m... 3. Node starts (slow) 🐢 RESULT: • 3–5 min scheduling delay • Traffic spike causes latency 🔮 SMART PROVISIONER SMART PROVISIONER 1. Predicts demand 2. Node ready first 3. Pod runs instantly 30d history node warm ✓ 🚀 instant RESULT: • 0 sec scheduling delay • No Pending pods ever

What It Does

Traditional Kubernetes scheduling is reactive by design: a pod asks for resources, the scheduler finds a node with enough free capacity, and the pod lands. Nobody checks whether the declared resource request reflects what the workload will actually consume. The result is nodes that look 90% allocated on paper but are 15% utilised in reality — and nodes that look 30% free but can't accept a new pod because the remaining memory is fragmented across dozens of tiny gaps.

The Smart Provisioner changes this. It models each workload's real resource consumption — not its declared request — over a rolling 30-day window and uses that model to predict what the workload will need at the moment of scheduling. When the cluster needs new capacity, the provisioner selects the instance type, size, and placement zone that will actually fit the incoming load, not just the load that the YAML says is coming.

Why this matters

Most clusters fail to schedule new pods not because they lack total capacity, but because the right capacity isn't available in the right node at the right time. The Smart Provisioner eliminates scheduling failures caused by resource fragmentation — the hidden inefficiency that forces teams to over-provision "just in case".

Predictive resource modelling

Builds a P50/P95/P99 consumption model per workload from 30 days of real telemetry. Provisioning decisions are based on actual behaviour, not YAML declarations written months ago.

Bin-packing optimisation

Selects instance types that minimise wasted capacity across the node pool — fitting more workloads onto fewer nodes without increasing scheduling failure rate or application latency.

Zone and topology awareness

Places workloads across availability zones and failure domains to satisfy topology spread constraints without requiring manual affinity rules on every deployment.

Proactive capacity buffers

Maintains a configurable headroom of warm capacity that absorbs burst traffic without waiting for a node to spin up — keeping P99 latency stable during sudden load increases.

Zero-disruption rebalancing

When the cluster becomes fragmented over time — a natural result of rolling deployments and partial scale-downs — the provisioner gradually rebalances workloads without evictions or downtime.

Scheduling failure elimination

Tracks pending-pod root causes and resolves them before they become incidents. Fragmentation, incorrect resource requests, and topology mismatches are all surfaced and corrected automatically.

How It Works

  Workload deploys / HPA fires / traffic spike detected
                    │
                    ▼
  ┌──────────────────────────────────────────────────────┐
  │              Smart Provisioner Engine                 │
  │                                                       │
  │  1. Fetch workload consumption model (30d history)    │
  │  2. Predict peak demand at P95 + safety buffer        │
  │  3. Evaluate current node pool for fit:               │
  │     • Bin-pack against real (not declared) usage      │
  │     • Check topology spread constraints               │
  │     • Check zone balance and AZ capacity              │
  │  4. If fit found: schedule immediately                │
  │  5. If not: pre-provision optimal instance type       │
  │     before the pod becomes Pending                    │
  └──────────────────────────────────────────────────────┘
                    │
                    ▼
         Pod scheduled. No Pending state.
         No CloudWatch alarm. No 3am page.
      

Rebalancing cycle

The provisioner evaluates the cluster's packing efficiency. If fragmentation has caused the cluster to use more nodes than the optimal bin-packing solution requires, it migrates workloads and removes the surplus nodes. All migrations respect PodDisruptionBudgets and are done one node at a time with a configurable inter-drain delay.

Safety & Settings

Safety guarantees
  • Never evicts a pod that would violate a PDB. All rebalancing operations check PodDisruptionBudgets before moving any workload.
  • Rebalancing pauses during high-load windows. If cluster CPU utilisation exceeds a configurable threshold, rebalancing is deferred until load drops.
  • Predictive model degrades gracefully. For workloads with fewer than 7 days of history, the provisioner falls back to declared resource requests plus a conservative buffer rather than making a prediction from insufficient data.
  • Topology constraints are never violated. Zone spread requirements, node affinity, and taints/tolerations are treated as hard constraints — the provisioner never overrides them to achieve a better packing ratio.
Feature 09

Hotspot & Pressure Detection

Real-time identification of nodes, namespaces, and workloads under abnormal resource pressure — surfacing problems before they become outages and pinpointing the exact source of cluster instability with no manual investigation required.

hotspot detection

What It Detects

A hotspot is any cluster resource — a node, a namespace, or a specific workload — where resource consumption is abnormally high relative to its baseline or its neighbours. Left undetected, hotspots cause cascading failures: a single noisy-neighbour workload degrades every other pod on its node, a namespace with a runaway deployment starves adjacent namespaces of CPU, or a node with a memory leak slowly squeezes everything else off until the kubelet starts OOMKilling pods at random.

Pressure detection goes further. It identifies not just that a workload is using a lot of CPU, but that the workload is throttled — consuming all its allowed CPU but being held back from more by its limits, which is a silent performance degradation that standard utilisation metrics never surface.

Node hotspot identification

Continuously compares each node's CPU, memory, and network I/O against its historical baseline and against peer nodes in the same node group. Nodes with abnormal consumption patterns are flagged immediately.

CPU throttling detection

Tracks cpu_throttled_seconds_total per container. Workloads that are consistently throttled receive a recommendation to increase CPU limits — resolving silent performance degradation that never appears in standard dashboards.

Memory pressure early warning

Detects nodes approaching the OOM threshold before the kubelet starts evicting pods. Alerts fire with enough lead time to either migrate workloads or add capacity before any pod is killed.

Noisy-neighbour isolation

Identifies workloads whose resource consumption negatively affects co-located pods. Provides actionable recommendations: move the workload to a dedicated node pool, adjust limits, or apply CPU pinning.

Namespace pressure ranking

Ranks namespaces by their resource pressure score — a composite of CPU/memory utilisation relative to limits, throttle rate, and eviction history. High-pressure namespaces are surfaced for review before they cause incidents.

Anomaly vs. trend separation

Distinguishes between a sudden anomalous spike (likely a runaway process or a traffic incident) and a gradual upward trend (likely organic growth that needs capacity planning). Each gets a different response playbook.

Detection Engine

The detection engine runs on a 15-second scan interval and evaluates three signal types per resource:

SignalSourceHotspot Condition
CPU utilisation ratioMetrics Server / Prometheus> 2× workload's 7-day P95 baseline
CPU throttle ratecontainer_cpu_cfs_throttled_seconds_total> 25% of CPU time throttled over a 5-min window
Memory utilisation ratioMetrics Server / Prometheus> 85% of memory limit, rising
OOM event rateKubernetes Events APIAny OOMKill in previous 10 minutes
Pod eviction rateKubernetes Events API> 2 evictions per namespace per hour
Pending pod durationKubernetes pod statusPod Pending > 90 seconds with no scheduling progress
Node condition pressureNode status conditionsMemoryPressure or DiskPressure = True

Alerts & Automated Actions

Before you get paged

Most Kubernetes incidents are preceded by 15–45 minutes of detectable pressure signals. Hotspot Detection catches these signals and either resolves them automatically or delivers a precise, actionable alert — before a pod is OOMKilled, before a node becomes NotReady, before your on-call rotation gets woken up.

Hotspot TypeAutomated ResponseAlert Sent
CPU throttling > 25%Flag for HPA Right-Sizing review; suggest limit increaseYes — with throttle rate and affected workload
Node memory > 85% and risingCordon node, trigger workload migration via PDB-safe drainYes — with time-to-OOM estimate
OOMKill detectedImmediately flag workload for memory limit right-sizingYes — with container name, namespace, and restart count
Noisy-neighbour identifiedRecommend node pool isolation; optionally auto-migrateYes — with affected co-residents listed
Namespace pressure > thresholdThrottle non-critical workloads in namespace (if auto-response enabled)Yes — with pressure score and top contributors
Pod Pending > 90sTrigger Smart Provisioner capacity checkYes — with scheduling failure reason
Automated response settings
  • All automated actions respect PDB constraints. No workload is moved if doing so would violate its PodDisruptionBudget.
  • Alert-only mode available. Set hotspot_auto_response: false to receive alerts without any automated cluster changes — useful for teams that want visibility first before enabling automation.
  • Per-namespace suppression. Individual namespaces can be excluded from automated response while still receiving alerts.
  • Deduplication window. Alerts for the same hotspot are deduplicated over a 10-minute window to prevent notification floods during extended pressure events.
Feature 10

Live Cost Lens

Per-workload, per-namespace, and per-team cost attribution in real time — so every engineer can see the financial impact of their deployment decisions the moment they make them, not at the end of the month when it's too late to act.

See exactly what each pod is costing you — right now NAMESPACE / POD CPU cost MEM cost TOTAL/hr production api-gateway-7f9b4-xk2p $0.0312 $0.0089 $0.0401 staging auth-service-6c8df-nq7r $0.0104 $0.0041 $0.0145 dev worker-job-5d2aa-mp1k $0.0021 $0.0008 $0.0029 ml-inference gpu-model-8b1ca-zw3p $0.1240 $0.0320 +GPU $0.2180 CLUSTER TOTAL / hr → $4.82 HOW COST IS CALCULATED 📊 Metrics Server actual CPU cores used actual RAM GB used 💲 AWS Pricing API instance type pricing OnDemand vs Spot GPU cost if applicable = Real $ per pod per hour No estimates. Live data.

What It Does

The Workload Cost Usage engine queries the Metrics for live CPU and memory consumption for every pod in the cluster, then maps that consumption to actual cloud pricing for the instance type the pod is running on. The result is a real cost figure — not a theoretical allocation, not a per-pod share of a monthly invoice — but the exact dollar value that each pod is consuming right now, expressed per hour.

This data is available via the /podsCostUsage API endpoint and returns a structured breakdown per pod: namespace, CPU usage in cores, memory usage in GB, GPU count if applicable, and the corresponding cost for each resource type plus a total. Both on-demand and spot capacity types are detected from node labels automatically.

Real-Time Cost Visibility

Cloud cost management tools show you what you spent last month. They cannot tell you which deployment on Tuesday caused Wednesday's cost spike, which team's new service is responsible for 40% of your cluster's compute bill, or how much money you are burning right now — per workload, per namespace, per hour — as you are reading this.

Live Cost Lens does all three. It calculates the actual cost of every running workload at 60-second granularity, attributes it to the team or namespace that owns the workload, and surfaces it in a live dashboard that every engineer in your organisation can access. When a developer deploys a service with 10× the resources it needs, they see the cost the moment the pods come up — not in the next monthly bill review.

Backed by real cloud pricing

Costs are calculated using live cloud pricing data retrieved from the Pricing API — not estimates, not averages. The engine detects your cluster's region, instance type, OS, tenancy, and capacity type (On-Demand or Spot) from node automatically, so pricing is always accurate for the specific hardware your pods are actually running on.

Cultural impact

Organisations that give engineers real-time cost visibility see an average 23% reduction in over-provisioning within the first quarter — not from automation, but from engineers making better decisions when they can see the consequences of those decisions immediately.

Per-pod cost attribution

Every pod gets a cost figure: CPU cost, memory cost, GPU cost, and total cost per hour. Costs are based on the pod's actual measured consumption from the Metrics Server — not its declared resource requests.

Namespace-level cost rollup

All pods are attributed by namespace, making it straightforward to sum costs per team, per environment, or per application by grouping on the namespace field in the response.

GPU cost tracking

GPU usage is detected from pod and priced separately using the GPU cost component of the instance's hourly rate — essential for clusters running ML inference or training workloads.

Instance type aware

Pricing is pulled per the exact instance type of each pod's node — m5.xlarge, c5.4xlarge, p3.2xlarge — so cost figures reflect the true unit economics of your node mix, not a cluster average.

On-Demand and Spot differentiation

Pods running on Spot nodes are priced at spot rates; pods running on On-Demand nodes at on-demand rates.

Feeds into Savings and Optimization

The per-pod cost data feeds the Savings Dashboard's FinOps metrics and informs the Node Optimizer's recommendations — closing the loop between what workloads cost and what the engine recommends changing.

API Reference

EndpointMethodAuthDescription
/podsCostUsage GET Required Returns per-pod cost breakdown for all pods across all namespaces. Queries the Kubernetes Metrics Server in real time and maps consumption to live AWS pricing.
Prerequisites

The /podsCostUsage endpoint requires the Kubernetes Metrics Server to be running in your cluster. Without it, the endpoint cannot collect real-time CPU and memory consumption. If your cluster does not have the Metrics Server installed, deploy it as part of the standard Kubeflux installation process — see the Installation Guide.

Installation Guide

Deploy Kubeflux on Your Cluster

Step-by-step instructions for deploying the Kubeflux application with persistent volume on AWS EKS. Follow the steps in order — the whole process takes approximately 20–40 minutes on a freshly provisioned cluster.

Obtain your Kubeflux License Key

Before deploying, you need a license key issued by the Kubeflux team. To request one, retrieve your cluster's unique ID and send it to the support team.

kubectl get namespace kube-system -o jsonpath='{.metadata.uid}'

Send the output to support@kubeflux.com and the team will issue your license key.

Kubernetes Metrics Server required. If your cluster does not have the Metrics Server installed, deploy it first:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Add IRSA Permissions

Kubeflux requires IAM Roles for Service Accounts (IRSA) to interact with AWS resources. A setup script is included in the deployment package.

# Make the script executable
chmod +x kubecut-irsa-step.sh

# Run the IRSA setup
./kubecut-irsa-step.sh
Note: If your VM does not have sufficient IAM permissions to create IRSA roles, the script will return a permissions error. Add the required permissions to the IAM role attached to your VM before re-running.

Install Terraform and Deploy the Application

Kubeflux uses Terraform to provision EFS persistent storage on AWS. If Terraform is not already installed, run the following commands on Ubuntu 24.04:

# Update system packages
sudo apt update && sudo apt upgrade -y
sudo apt install -y wget gnupg software-properties-common

# Add HashiCorp GPG key and repository
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor > hashicorp-archive-keyring.gpg
sudo mv hashicorp-archive-keyring.gpg /usr/share/keyrings/
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
  https://apt.releases.hashicorp.com $(lsb_release -cs) main" \
  | sudo tee /etc/apt/sources.list.d/hashicorp.list

# Install Terraform
sudo apt update && sudo apt install terraform -y

Once Terraform is installed, extract the deployment package and configure your cluster details:

# Decompress the Terraform package
tar -xzf kubecut-terraform-efs-k8s.tar.gz

# Edit the variables file with your cluster details
# kubecut-terraform-efs-k8s/terraform.tfvars
region           = "your-aws-region"        # e.g. us-east-1
eks_cluster_name = "your-eks-cluster-name"

Verify your kubectl context points to the correct cluster before applying:

kubectl config current-context

Run the Terraform deployment from inside the extracted directory:

# Navigate to the Terraform directory
cd kubecut-aws/kubecut-terraform-efs-k8s/

# Initialise, validate, and apply
terraform init
terraform validate
terraform apply
Permissions error? If terraform apply returns errors related to AWS EFS permissions, add the required EFS policies to the IAM role attached to your deployment VM and re-run terraform apply.

Once Terraform completes, deploy the Kubeflux application manifest:

kubectl apply -f kubecut-core-aws.yaml
Verify deployment: Run kubectl get pods -n kubeflux to confirm all pods reach Running status within 2–3 minutes.

Configure the Workload Scaler

After deployment, open the Kubeflux dashboard and navigate to Workload Scaler to configure which namespaces and services are managed.

Excluding namespaces

To exclude a namespace from Workload Scaler management, enter the namespace name in the Exclude Namespace field and click Apply.
To re-enable management of an excluded namespace, enter its name in the Exclude Namespace field and click Remove.

Excluding individual services

To exclude a specific service, enter its name in the Exclude Service field and click Apply.
To re-enable management of an excluded service, enter its name in the Exclude Service field and click Remove.

Uninstall / Reinstall

To remove the application, open the Kubeflux dashboard, navigate to Settings → My Profile, click Remove Kubeflux Resources, and confirm. This cleanly removes all Kubeflux components from the cluster while preserving your cluster workloads.

To re-activate: After reinstalling, return to Settings → My Profile and click Activate Kubeflux Resources to re-enable the Workload Scaler and all other optimisation engines.
Contact Us

Get in Touch

Have a question, a feature request, or need help with your deployment? Send us a message and the team will respond within one business day.

General Enquiries
Feature questions, pricing, and partnership discussions.
Technical Support
Deployment help, license key requests, and bug reports. Include your cluster ID when raising a support request.
Response Time
Within 1 business day
For urgent production issues, mark your subject line with [URGENT] and we will prioritise your ticket.
Get Your License Key
Run the following command and email the output to support@kubeflux.com to receive your license:
kubectl get namespace kube-system \
  -o jsonpath='{.metadata.uid}'