YuniKorn Core Concepts

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

YuniKorn core is a universal scheduler that can be used to assign Application resource Allocations to Nodes that expose resources. Its default implementation allocate Kubernetes pods, where multiple pods belong to an application and request resources like memory, cores and GPUs, to Kubernetes nodes. However, Applications, Allocations and Nodes can be mapped onto an arbitrary domain. The scheduler assumes that different Allocation may have different priorities, and performs the higher priority Allocations first. The scheduler also has the concept of preemption.

Application

An application is an abstract programmatic entity that requires resources to execute. The application expresses its needs of resources by issuing Allocation requests, which are handled by the scheduler in an attempt to find a Node that can accommodate the resource need for that specific allocation request. In the default Kubernetes implementation, an application is any higher level workload resource that creates pods: deployments, jobs, etc.

Application Lifecycle

An application gets added as NEW. The application transitions from NEW to ACCEPTED when the first request (Ask) is added to the application. It then moves to STARTING when the Allocation is created. That is the point that the request (Ask) gets assigned to a node. It now shows as an Allocation on the application.

If another Ask was added and a second one gets allocated the application state changes to RUNNING immediately. If there is no other Ask and thus no second Allocation we stay for a maximum of 5 minutes in the STARTING state and then auto transition to RUNNING. This is to support state-aware scheduling. It has no impact on the scheduler or on the pods etc unless you have turned state-aware scheduling on. To configure application to transition to RUNNING after the first allocation Ask, place the tag "application.stateaware.disable": "true" on the AddApplicationRequest when creating the application.

Allocation

Allocation Ask

Allocation Ask Implementation

This is the sequence of operations of an Allocation Ask:

  • scheduler.Scheduler handles an rmevent.RMUpdateAllocationEvent "update allocation" event in the handleRMEvent() function, which immediately calls into scheduler.ClusterContext#handleRMUpdateAllocationEvent().
  • scheduler.ClusterContext#handleRMUpdateAllocationEvent()scheduler.ClusterContext#processAsks().
  • scheduler.ClusterContext#processAsks() locates the corresponding partition and calls into scheduler.PartitionContext#addAllocationAsk().
  • scheduler.PartitionContext#addAllocationAsk() locates the corresponding application.
  • scheduler.PartitionContext#addAllocationAsk() creates a new objects.AllocationAsk instance.
  • scheduler.PartitionContext#addAllocationAsk() invokes into objects.Application#AddAllocationAsk() with the newly created objects.AllocationAsk instance.
  • objects.Application#AddAllocationAsk():
    • Computes the delta
    • If it is "new" or "completing" state, get into "running" state.
    • Store the ask in requests.
    • Update priority.
    • Update total pending resources up the queue hierarchy.

The allocation attempt won't be executed on this thread, but by one of the asynchronous periodic scheduling runs.

Partition

Node

Resource

Queue

https://yunikorn.apache.org/docs/user_guide/queue_config
https://yunikorn.apache.org/docs/design/scheduler_configuration/#queue-configuration
https://yunikorn.apache.org/docs/user_guide/resource_quota_management

Priority

Preemption

Scheduler

The scheduler instance scheduler.Scheduler is the initiator of scheduling runs.

Scheduling Run

The scheduler can automatically and periodically execute scheduling runs with by invoking scheduler.ClusterContext#schedule() with the periodicity of 100 milliseconds, or it can be triggered manually, is it is started with the manual schedule option true. To manually schedule, start the scheduler with "auto" mode disabled and manually invoke scheduler.Scheduler.MultiStepSchedule().