YuniKorn Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
No edit summary
 
(12 intermediate revisions by the same user not shown)
Line 6: Line 6:


=Kuberentes Implementation=
=Kuberentes Implementation=
A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
* A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
 
* An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
==Application Task==
* When a new pod annotated with <code>schedulerName: yunikorn</code> needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s. Service "yunikorn-admission-controller-service". When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, <code>general.Manager.AddPod()</code> is invoked, which creates and Application and Task using the pod metadata → <code>PodEventHandler.addPod()</code> → <code>cache.Context.AddApplication()</code>. At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.
For the Kubernetes implementations, Kubernetes pods are mapped onto YuniKorn Core Tasks, and the task ID is the pod UID.
 
A task may be the "originator" of the Application, if it's the first one seen for the application. If that is the case, it is considered "first pod"/"owner"/"driver".
 
===Task Metadata===
====Application ID====
====TaskID====
(same as the Pod ID)
====Pod====
====Placeholder====
====TaskGroupName====
 
==Task Group==
==Application Request==
==Application Priority==
==Application States==
* New
* Accepted
* Starting
* <span id='Running'></span>Running
* Rejected
* Completing
* Completed
* Failing
* Failed
* Expired
* Resuming
 
=Allocation=
A core scheduler-level concept.
 
An allocation can be issued by an application, or it can be an independent allocation, which does not belong to any application.
 
An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
 
==Allocation Ask==
 
Each allocation has a key. The key is used by both the scheduler and the resource manager to track allocations. The key does not have to be the resource manager's internal allocation ID, such as the pod name.
 
If the allocation specifies an application ID, the application must be registered in advance, otherwise we get "failed to find application ..."
 
Each allocation ask requests resources.
 
Each allocation ask has a priority.
 
<font color=darkkhaki>What is "MaxAllocations"?</font>
 
An application goes into a state transition after the first allocation ask, to "[[#Running|Running]]".
 
=Identity=
An application is submitted under a certain identity, that consists of a [[#User|user]] and one or more [[#Group|groups]].
 
<font color=darkkhaki>TO PARSE:
*  https://yunikorn.apache.org/docs/user_guide/usergroup_resolution/
* https://yunikorn.apache.org/docs/design/scheduler_configuration/#user-definition
</font>
 
==User==
==Group==
The identity an application is submitted under may be associated with one or more groups.
 
=Plugin Mode=
 
=Resource Manager (RM)=
 
YuniKorn communicates with various implementation of resource management systems (Kubernetes, YARN) via a standard interface defined in the <code>[[YuniKorn_Development#yunikorn-scheduler-interface|yunikorn-scheduler-interface]]</code> package.
 
=Task=
 
=Node=
 
<font color=darkkhaki>Can a node be declared to be part of a partition with the "si/node-partition" label? It seems that the node Attributes partially come from the node labels.</font>
 
In the Kubernetes implementation, the node is first added, then updated.
 
As part of handling the <code>RMNodeUpdateEvent</code>, <code>RMProxy</code> calls <code>callback.UdpateNode()</code>.
 
=Configuration=
 
=Context=
 
yunikorn-k8shim <code>cache.Context</code>
 
=Resource=
=Quantity=
 
=Reservation=
 
=Manual Scheduling=
 
=Policy Group=
 
Set in the scheduler when a new resource manager is registered.
 
=Gang Scheduling=
 
Gang scheduling style can be "hard" (the application will fail after placeholder timeout) or "soft" (after the timeout the application will be scheduled as a normal application).
 
 
=Organizatorium=
 
==1==
When a new pod annotated with <code>schedulerName: yunikorn</code> needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s.
 
Service "yunikorn-admission-controller-service"
 
When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, <code>general.Manager.AddPod()</code> is invoked, which creates and Application and Task using the pod metadata → <code>PodEventHandler.addPod()</code> → <code>cache.Context.AddApplication()</code>. At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.

Latest revision as of 22:06, 18 January 2024

Internal

YuniKorn Core

YuniKorn Core Concepts

Kuberentes Implementation

  • A namespace can have a "queue" if annotated with "yunikorn.apache.org/queue". A namespace can have a "parent queue" is annotated with "yunikorn.apache.org/parentqueue".
  • An allocation can be in one of two states ("Pending" and "In-Progress"). A pending allocation is one which has been decided upon by YuniKorn but has not yet been communicated to the default scheduler via PreFilter()/Filter(). Once PreFilter()/Filter() pass, the allocation transitions to "In-Progress" to signify that the default scheduler is responsible for fulfilling the allocation. Once PostBind() is called in the plugin to signify completion of the allocation, it is removed.
  • When a new pod annotated with schedulerName: yunikorn needs scheduling, the API server (admission controller (?)) calls the "admission-webhook.yunikorn.mutate-pods" webhook with a POST https://yunikorn-admission-controller-service.yunikorn.svc:443/mutate?timeout=10s. Service "yunikorn-admission-controller-service". When running locally, the service does not get deployed, yet the pods get scheduled. This is how: there's a Kubernetes mechanism involving "informers" that periodically updates the state of the resources is interested in. There are "update", "add" and "delete" notifications. When a new pod shows up, general.Manager.AddPod() is invoked, which creates and Application and Task using the pod metadata → PodEventHandler.addPod()cache.Context.AddApplication(). At the same time, there's the main KubernetesShim scheduling loop that finds the new application and so the scheduling process begins.