YuniKorn Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
No edit summary
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Internal=
=Internal=
* [[YuniKorn#Subjects| YuniKorn]]
* [[YuniKorn#Subjects| YuniKorn]]
=Partition=
{{External|https://yunikorn.apache.org/docs/user_guide/queue_config}}


The total "partition resource" is the sum of its nodes' "capacity" (<code>node.GetCapacity()</code>, which is the node's "total resource").
=YuniKorn Core=
{{Internal|YuniKorn Core Concepts|YuniKorn Core Concepts}}


Each partition has a root queue, which is the start of the queue hierarchy per partition.
=Kuberentes Implementation=
 
* A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
=Queue=
* An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
{{External|https://yunikorn.apache.org/docs/user_guide/queue_config}}
* When a new pod annotated with <code>schedulerName: yunikorn</code> needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s. Service "yunikorn-admission-controller-service". When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, <code>general.Manager.AddPod()</code> is invoked, which creates and Application and Task using the pod metadata → <code>PodEventHandler.addPod()</code> <code>cache.Context.AddApplication()</code>. At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.
{{External|https://yunikorn.apache.org/docs/design/scheduler_configuration/#queue-configuration}}
 
The queue configuration is dynamic and it can be changed while the scheduler is running, without requiring a scheduler restart. The queue configuration will change after invocation of the corresponding Go API method, of the REST based API or after changing the configuration file. Changes made through API <font color=darkkhaki>will be persisted in the configuration file</font>. All queues defined in the configuration are considered managed queues.
 
The queues from a tree. The base of the tree is the <code>root</code> queue. The root queue reflects the entire cluster, and resource settings on the root queue are not allowed. The resources available to the root queue are calculated based on the resources of the nodes registered with the cluster.
 
Applications can only be submitted to '''leaf queues'''. A queue that is not a leaf queue is a parent queue. Except the root queue, any queue must have one and only one parent queue. <font color=darkkhaki>A queue type is either leaf or parent.</font>
 
The individual queue names are separated by dot ("."), thus yielding fully qualified queue names. As result, an individual queue name cannot contain dot characters. A queue in the hierarchy can thus be only uniquely identified by its fully qualified path. This means two individual queues with the same name are allowed, if they are resent in different position in the hierarchy.
==Access Control List==
{{External|https://yunikorn.apache.org/docs/user_guide/acls/}}
There are submission permission and administrative permissions. Submission permission relate to the capability to submit a certain application to a certain queue by specific users or groups. Administrative permissions include submission permission plus stopping an application and moving the application to a different queue. Access control lists are checked recursively up to the root of the tree starting at the lowest point in the tree. In other words when the access control list of a queue does not allow access the parent queue is checked. The checks are repeated all the way up to the root of the queues.
===Submit ACL===
===Admin ACL===
==Queue Resources==
{{External|https://yunikorn.apache.org/docs/user_guide/queue_config/#resources}}
===Guaranteed Resources===
===Maximum Resources===
 
==Running Applications Limit==
A queue can set a running application limit.
==Application Sort Algorithm==
Fair, FIFO.
==Queue Priority==
==Placement Rules==
{{External|https://yunikorn.apache.org/docs/user_guide/placement_rules/}}
Placement rules refer to automatically placing an application onto a queue.
==Kuberentes Implementation==
A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
 
=Application=
 
Application can issue [[#Allocation|allocation]] requests.
 
An application can be added or removed.
 
An application is assigned to a partition, there's an "applications" map in each partition. <font color=darkkhaki>Can the same application assigned to two or more partitions at the same time?</font>.
 
An application has an execution timeout, which is the maxima amount of time this application can be in a running state. <font color=darkkhaki>What happens if the timeout expires?</font>
 
==Application Metadata==
===Application ID===
Looks in this order:
* Annotation "yunikorn.apache.org/app-id"
* Label "applicationId"
* ...
===Queue Name===
===User===
===Tags===
===Groups===
===TaskGroups===
===OwnerReference===
Usually a pod, designated by its UID.
===Scheduling Policy Parameters===
 
 
==Application Task==
For the Kubernetes implementations, Kubernetes pods are mapped onto YuniKorn Core Tasks, and the task ID is the pod UID.
 
A task may be the "originator" of the Application, if it's the first one seen for the application. If that is the case, it is considered "first pod"/"owner"/"driver".
 
===Task Metadata===
====Application ID====
====TaskID====
(same as the Pod ID)
====Pod====
====Placeholder====
====TaskGroupName====
 
==Task Group==
==Application Request==
==Application Priority==
 
=Allocation=
A core scheduler-level concept.
 
An allocation can be issued by an application, or it can be an independent allocation, which does not belong to any application.
 
An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
 
==Allocation Ask==
 
=Identity=
An application is submitted under a certain identity, that consists of a [[#User|user]] and one or more [[#Group|groups]].
 
<font color=darkkhaki>TO PARSE:
*  https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/
* https://yunikorn.apache.org/docs/design/scheduler_configuration/#user-definition
</font>
 
==User==
==Group==
The identity an application is submitted under may be associated with one or more groups.
 
=Plugin Mode=
 
=Resource Manager (RM)=
 
YuniKorn communicates with various implementation of resource management systems (Kubernetes, YARN) via a standard interface defined in the <code>[[YuniKorn_Development#yunikorn-scheduler-interface|yunikorn-scheduler-interface]]</code> package.
 
=Task=
 
=Node=
 
<font color=darkkhaki>Can a node be declared to be part of a partition with the "si/node-partition" label? It seems that the node Attributes partially come from the node labels.</font>
 
In the Kubernetes implementation, the node is first added, then updated.
 
As part of handling the <code>RMNodeUpdateEvent</code>, <code>RMProxy</code> calls <code>callback.UdpateNode()</code>.
 
=Configuration=
 
=Context=
 
yunikorn-k8shim <code>cache.Context</code>
 
=Resource=
=Quantity=
 
=Reservation=
 
=Manual Scheduling=
 
=Policy Group=
 
Set in the scheduler when a new resource manager is registered.

Latest revision as of 22:06, 18 January 2024

Internal

YuniKorn Core

YuniKorn Core Concepts

Kuberentes Implementation

  • A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
  • An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
  • When a new pod annotated with schedulerName: yunikorn needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s. Service "yunikorn-admission-controller-service". When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, general.Manager.AddPod() is invoked, which creates and Application and Task using the pod metadata → PodEventHandler.addPod()cache.Context.AddApplication(). At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.