YuniKorn Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
No edit summary
 
(54 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Internal=
=Internal=
* [[YuniKorn#Subjects| YuniKorn]]
* [[YuniKorn#Subjects| YuniKorn]]
=Partition=


The total "partition resource" is the sum of its nodes' "capacity" (<code>node.GetCapacity()</code>, which is the node's "total resource").
=YuniKorn Core=
{{Internal|YuniKorn Core Concepts|YuniKorn Core Concepts}}


Each partition has a root queue, which is the start of the queue hierarchy per partition.
=Kuberentes Implementation=
 
* A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
=Plugin Mode=
* An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
 
* When a new pod annotated with <code>schedulerName: yunikorn</code> needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s. Service "yunikorn-admission-controller-service". When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, <code>general.Manager.AddPod()</code> is invoked, which creates and Application and Task using the pod metadata → <code>PodEventHandler.addPod()</code> <code>cache.Context.AddApplication()</code>. At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.
=Resource Manager (RM)=
 
YuniKorn communicates with various implementation of resource management systems (Kubernetes, YARN) via a standard interface defined in the <code>[[YuniKorn_Development#yunikorn-scheduler-interface|yunikorn-scheduler-interface]]</code> package.
 
=Allocation=
A core scheduler-level concept. An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
 
=Application=
 
An application is assigned to a partition, there's an "applications" map in each partition. <font color=darkkhaki>Can the same application assigned to two or more partitions at the same time?</font>.
==Application Metadata==
===Application ID===
Looks in this order:
* Annotation "yunikorn.apache.org/app-id"
* Label "applicationId"
* ...
===Queue Name===
===User===
===Tags===
===Groups===
===TaskGroups===
===OwnerReference===
Usually a pod, designated by its UID.
===Scheduling Policy Parameters===
 
 
==Application Task==
For the Kubernetes implementations, Kubernetes pods are mapped onto YuniKorn Core Tasks, and the task ID is the pod UID.
 
A task may be the "originator" of the Application, if it's the first one seen for the application. If that is the case, it is considered "first pod"/"owner"/"driver".
 
===Task Metadata===
====Application ID====
====TaskID====
(same as the Pod ID)
====Pod====
====Placeholder====
====TaskGroupName====
 
==Task Group==
==Application Request==
==Application Priority==
 
=Task=
 
=Node=
 
<font color=darkkhaki>Can a node be declared to be part of a partition with the "si/node-partition" label? It seems that the node Attributes partially come from the node labels.</font>
 
In the Kubernetes implementation, the node is first added, then updated.
 
As part of handling the <code>RMNodeUpdateEvent</code>, <code>RMProxy</code> calls <code>callback.UdpateNode()</code>.
 
=Configuration=
 
=Context=
 
yunikorn-k8shim <code>cache.Context</code>
 
=Resource=
=Quantity=
 
=Queue=
 
A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue".
 
A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
 
===Queue Priority===
==Queue Max Resource==
 
=Reservation=
 
=Manual Scheduling=
 
=Policy Group=
 
Set in the scheduler when a new resource manager is registered.

Latest revision as of 22:06, 18 January 2024

Internal

YuniKorn Core

YuniKorn Core Concepts

Kuberentes Implementation

  • A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
  • An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
  • When a new pod annotated with schedulerName: yunikorn needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s. Service "yunikorn-admission-controller-service". When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, general.Manager.AddPod() is invoked, which creates and Application and Task using the pod metadata → PodEventHandler.addPod()cache.Context.AddApplication(). At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.