CLOCK-Pro

Paper Title: CLOCK-Pro: An Effective Improvement of the CLOCK Replacement

Date: November 5, 2007

(1)

1. What is the problem the authors are trying to solve?
Ans: virtual memory system是近幾十年來重要的一種發明，它讓我們可以不用受到memory size的限制。
而page的replacement policy為很重要的問題，因為發生page fault時的overhead會影響到整體的效能。
增加memory的size雖然可以減少memory和I/O之間的access time，
但即使如此working set也不可能將全部的檔案及資料放入memory中。LRU及CLOCK是過去就在使用的replacement policy，
CLOCK甚至目前廣泛的用在各個作業系統中長達幾十年。但CLOCK也繼承了LRU上的一些效能上的缺點，
而在低Locality的情況時，處理這種情況成了一個很嚴重的問題。

2. What other approaches or solutions existed at the time that this work was done?
Ans: CLOCK為一種近似LRU的page replacement policy。它在memory需要page replacement時，
由先前replaced page的下一個往下找，考慮每一個page，先看該page的reference bit是不是為0，
是的話，就replace 該page；如果不是的話，就把reference bit清為0。
如此即使搜尋完一遍後還是找不到可replace的page時，再往下找一定可以找到可replace的page。
而下一次要作page replacement時，再從replaced page的下一個往下找，為一種Circular的方式。

3. What was wrong with the other approaches or solutions?
Ans: 1，一般的VM page replacements通常在發生page-fault時會付出額外的cost，
這個cost通常會耗數數百或數千個cpu cycles，進而影響到整體的效能。
2，其他近似於LRU的方法，通常為了改善效能都會變得太複雜。

4. What is the authors' approach or solution?
Ans: 和CLOCK不同的是，CLOCK-Pro使用3個pointers，分別是Handhot、Handcold、及Handtest。
Handhot用來指向較常使用的page，當它掃過該page時，會將該page設成cold(把reference bit清為0)。
而Handcold則和CLOCK的pointer相同意義，在page不夠時用來找尋欲替換的page。
Handtest則是用來測試最近的cold page，被它掃過的page將會離開這個circular list。

5. Why is it better than the other approaches or solutions?
Ans: 它很類似CLOCK，可以產生較小的cost。而在Linux的system測試中，
它可以減少program的execution time47%。雖然它比CLOCK用了三倍的pointers，
而使得Linked list倍增，但以廣泛來說，不會增加太多的overhead。

6. How does it perform?
Ans:

7. Why is this work important?
Ans: 過去幾十年來，CLOCK一直是被使用在眾多的系統當中。它是根據LRU的特性去設計的，
自然也有LRU的問題所在。作者提供的這個系統，不僅修正CLOCK的一些問題，
同時它還建構於正確的locality定義上(reuse distance)，且可以很快的適應各種access patterns，而不必有任何的引數。

8. Can any improvement be done?

(2)

Q1. What is the problem the authors are trying to solve?
A: There is a gap that between memory and disk of costs of I/O buffer.
     The worse replacement policies will bring higher cost penalty while the page fault
     occurs , it may consumes over hundreds of CPU cycles.
     Although increasing the memory size can reduce the number of page fault ,but it can
     not cache all file data into memory.

Q2. What other approaches or solutions existed at the time that this work was done?
A: CLOCK works similar to LRU ,it uses reference bit to represent a page accessed.
     LIRS (Low Inter-reference Recency Set) without the performance limitation, and
     it uses reuse distance to distinguish hot pages and cold pages.

Q3. What was wrong with the other approaches or solutions?
A: CLOCK has low performance.
     Although LIRS improves performance ,but it has the unacceptable cost at VM management.

Q4. What is the authors' approach or solution?
A: The authors designed a replacement algorithm called CLOCK-Pro.
    CLOCK-Pro uses like CLOCK circular linked list which places all the accessed page.
    But CLOCK-Pro's replacement policy is different from CLOCK ,it has the same policy
    with LIRS which uses reuse distance to distinguish which page will be replaced.
    CLOCK-Pro place the accessed pages into a single list .The large reuse distance called
    hot pages, and the small reuse distance called cold pages. The cold pages are near list
    head ,and the hot pages are near the list tail.
    CLOCK-Pro will replace the smallest reuse distance page while page fault occurs.
    And remove the cold page entry from list if it never be accessed during the test period.

Q5. Why is it better than the other approaches or solutions?
A: 1.CLOCK-Pro works similar to CLOCK at VM management ,and it has low cost .
     2.CLOCK-Pro provides the performance similar to LIRS.
     3.CLOCK-Pro has less page faults than CLOCK at real I/O and VM.
     4.CLOCK-Pro reduces the time of execution in Linux kernel.

Q6. How does it perform?
A: 1.Testing CLOCK-Pro algorithm how to retain the LIRS performance on I/O.
     2.Testing CLOCK-Pro algorithm executions on virtual memory.
     3.Testing CLOCK-Pro algorithm how to respond on virtual memory and file I/O.

Q7. Why is this work important?
A: CLOCK-Pro algorithm can reduce the time of executing programs in real system.

Q8. Can any improvement be done?
A: It may increase page size to decrease the amount of pages ,and then placing all of
    pages into the list so that it can reduce the time that searching a page at disk
    while without in list.

(3)

1. What is the problem the authors are trying to solve?
作者想要研發更好的 virtual memory replace algorithm 來提升電腦整體效率.

2. What other approaches or solutions existed at the time that this work was done?
有三大類:
1.請 application 先暗示他等下要用什麼 page 來事先作 page.

2.發現 LRU 的效率不高時,把LRU 換成別的方法的 algorithm.(SEQ)

3.分析 history 來作為什麼要被 replace 的參考.(ARC)

3. What was wrong with the other approaches or solutions?
作者提出第二類 SEQ 沒有辦法偵測 loop 問題,而會判斷錯誤.
第三類 ARC 有兩個 list 可是在 hot list 尾巴和 cold list 頭,可能會有誤判,
或是會有不公平的狀況發生.

4. What is the authors' approach or solution?
作者改變 CLOCK 變成 CLOCK-pro Size 為 hot 和 cold 相加的兩倍,這樣可以做更常的
histroy 的分析,(整合 ARC 並且解決不公平的問題), hot 有可能變 cold 再被刪掉, cold
在測試期內如果有被 aceess 到一次就可能變成 hot.

5. Why is it better than the other approaches or solutions?
ARC 的不公平問題,作者用 circle list 解決掉了, ARC 的好處也整合進來了,
我想最主要的原因是因為作者考慮到 reuse distance 一個絕對的參考來判斷一個 page
的熱門度.

6. How does it perform?
程式效能提高 47%,實做在 kernel 2.4.21 中,因為 page fault 明顯比原來少很多,所以
程式執行時間就減少了.

7. Why is this work important?
因為減少 page fault 的次數,所以可以減少中斷程式的次數,提高程式的效率.

8. Can any improvement be done?
No

(4)

Q1. What is the problem the authors are trying to solve?
A: The authors want to design a VM page replacement algorithm, which can
improve CLOCK replacement policy and low overhead requirement.

Q2. What other approaches or solutions existed at the time that this work was
done?
A: The related works are listed in the following:
1. Application-Controlled File Caching Policies
2. Informed Prefetching and Caching
3. Adaptive Page Replacement Based on Memory Reference Behavior
4. The EELRU adaptive replacement algorithm
5. A Low-Overhead, High-Performance Unifi ed Buffer Management Scheme that
Exploits Sequential and Looping References
6. LIRS: An Efficient Low Interreference Recency Set Replacement Policy to
Improve Buffer Cache Performance
7. Predicting Whole-Program Locality through Reuse-Distance Analysis

Q3. What was wrong with the other approaches or solutions?
A:
1. Place too many constraints on the applications.
2. FBR, LRU-2, LRFU and MQ are expensive compared with LRU.

Q4. What is the authors' approach or solution?
A: The authors got idea from LIRS and brings all the much-needed performance
advantages and keeping track of limited number of replace pages.
CLOCK-Pro works in a similar fashion as CLOCK with a VM-affordable cost.

Q5. Why is it better than the other approaches or solutions?
A:
1. CLOCK-Pro is a low cost page replacement algorithm that can be easily
accepted by current systems.
2. CLOCK-pro provides a systematic solution to address the CLOCK problems.
3. CLOCK-pro can fully adaptive to strong or weak access patterns without any
predetermined parameters.
4. Extensive simulation experiments and a prototype implementation show its
significant and consistent performance improvements.

Q6. How does it perform?
A: The authors used trace-driven simulation and prototype implementation to
demostrate CLOCK-Pro performance.

Q7. Why is this work important?
A: It provides low cost page replacement algorithm and reduce execution
time up to 47%.

Q8. Can any improvement be done?
A: no

(5)

Q1.What is the problem the authors are trying to solve?
A:隨著memory和disks,以及cpu的效能差距越來越大,virtual memory的管理也變得越來越重
要,然而關鍵的page replacement policy,CLOCK是佔有優勢的,而這replacement policy是40年
前所發展出來的.CLOCK與LRU相似都是low cost,但最近的三十年,它們無法支援weak locality
存取變得越來越嚴重.

Q2.What other approaches or solutions existed at the time that this work was done?
A:
1."Application-Controlled File Caching Policies" and "Informed Prefetching and
Caching":applications需要明確提供未來要存取的hints.
2.SEQ, EELRU and UBM:當發現存取模式不符LRU時,就切換成其它有效率的replacement policy.
3.FBR, LRFU, LRU-2, 2Q, MQ, LIRS and ARC:追查及利用較多的存取資訊.

Q3.What was wrong with the other approaches or solutions?
A:前2種approaches讓applications有許多限制,無法應用在一般OS的VM management,而第3種
approaches相對於LRU(Least Recently Used)需較高的cost.

Q4.What is the authors' approach or solution?
A:他們從I/O buffer cache replacement algorithm, LIRS得到靈感,設計了CLOCK-Pro,改善
CLOCK replacement.

Q5.Why is it better than the other approaches or solutions?
A:
1.CLOCK-Pro與CLOCK的工作方式相似,不會造成VM management太大的負擔.
2.CLOCK-Pro將LIRS高效能帶進CLOCK.
3.CLOCK-Pro可以適應大範圍變化的workloads存取模式,而不用事先定義參數.
4.經過模擬,CLOCK-Pro降低了page fault的發生.
5.在Linux kernel運作的CLOCK-Pro,讓幾個一般使用的程式運作時間減少超過47%.

Q6.How does it perform?
A:在GLIMPSE及MULTI2下的Hit Ratio,與LIRS接近略低於理想值,優於CAR及CLOK,Page Faults per
Million Intructions 優於或近於CAR及CLOCK的次數;在LINUX kenel 2.4.21加上CLOCK-Pro,可以
降低GNUPLOT,GZIP及GAP的Page Faults發生次數,也減少了它們的執行時間.

Q7.Why is this work important?
A:memory是有限的,disks的速度相對的是非常緩慢的,若是VM management的page fault發生的
太頻繁,系統存取disks的次數增加,運作的效能將被disks的速度所支配.

Q8.Can any improvement be done?
A:No.

(6)

Q1. What is the problem the authors are trying to solve?
A:
作者為了在Memory management中，改善page replacement的效能，建立一個新的replacement
policy來取代已經使用了數十年的LRU、CLOCK replacement policy。
有別於之前的研究多數是以改善LRU為主，作者提出一個不同觀點的replacement policy，以
避免掉LRU本身存在的缺點。

Q2. What other approaches or solutions existed at the time that this work was done?
A:
從近50年來提出的方法，大都是針對LRU的問題進行改善，一般分為三類：
1.需要應用程式明確地提供將來存取的提示，例如application controlled file caching and
application-informed prefetching and caching。
2.明確地偵測到存取樣本失效LRU及適應交換到其他有效的替換，例如SEQ，EELRU，UBM。
3.追蹤及利用過去存取的資訊，例如FBR, LRFU, LRU-2, 2Q, MQ, LIRS, ARC。

Q3. What was wrong with the other approaches or solutions?
A:
LRU與CLOCK存在以下問題：
- 在memory scan 時，會造成所有現存在運作的pages被沖出記憶體。
- 在程式記憶體及I/O buffer cache之間不平衡地分配起因於replacement policy處理在buffer
caches裡不常被存取的pages時不起作用。
- loop的問題，如果page無法完全適合memory時，會造成重覆的page fault也不會再次命中page。

Q4. What is the authors' approach or solution?
A:
CLOCK-Pro是作者提出的方法，它的原理是利用reuse distance來取代recency成為replacement
的決定條件。
作者用該page存取時間的resuse distance將pages分成二類；cold page具有較大的resuse
distance，而hot page有較小的resuse distance，以分辨page的使用度。
CLOCK-Pro的資料結構中含有三個hands：HAND(hot)指向hot list的尾端，用來決定page是不是
要轉為cold page；HAND(cold)指向last resident的cold page，它決定了從何處開始搜尋victim
page；HAND(test)在test period內指向最後一個cold page，用來結束cold pages的test period。

Q5. Why is it better than the other approaches or solutions?
A:
CLOCK-Pro經由與LRU不同的replacement policy徹底改善LRU的效能限制，同時維持了LRU的
低消耗性。它不只解決了scan及loop的問題，也可以使用reuse distance量化page本身的locality
strengths來區分page。

Q6. How does it perform?
A:
step1.Simulation on I/O Buffer Caches:
CLOCK-Pro及LIRS的命中率十分接近，顯示出CLOCK-Pro得到了LIRS的優點。同時也展現出較CAR及
CLOCK為高的命中率。
step2.Simulation on Memory for Program Executions:
實驗結果充份顯示出CLOCK-Pro在programs with weak locality勝過CLOCK。
page fault ratios也非常接近OPT。
step3.Simulation on Program Executions with Interference of File I/O:
1.在strong locality program中，CLOCK-Pro and CAR都能有效的防止程式從I/O存取介面執行。
2.在weak locality program中，只有CLOCK-Pro可以可以防止程式從介面執行。

Q7. Why is this work important?
A:
作者採用了原本使用在I/O buff cache 的技術(LIRS)來改善VM management，這個
方法(CLOCK-Pro)產生以下幾點貢獻：
(1) CLOCK-Pro與CLOCK的動作相似，可以很容易的套用到VM management
(2) CLOCK-Pro從LIRS中得到了最需要幾項效能優點，將之帶入CLOCK。
(3) 不需要任何的預先決定參數，CLOCK-Pro使自已適應於改變的access pattern以服務更多的
workloads。
(4) 經過大量的模擬，證明了CLOCK-Pro在significant page fault的減少上和其他的VM
replacement algorithm一樣好。
(5) 在Linux kernel上實行CLOCK-Pro的測量結果，執行常用的程式時，最高可以減少47%的執行
時間。

Q8. Can any improvement be done?
A: No

(7)

1.What is the problem the authors are trying to solve?

想要解決VM paging的問題，希望降低cache miss，增加cache hit ratio

2.What other approaches or solutions existed at the time that this work was done?

在本文作者提出這個方法之前，已經有很多人針對VM paging提出很多解決方法，分別如下

CLOCK

pure LRU

LIRS

CAR

3.What was wrong with the other approaches or solutions?

其他的VM paging，有一個很大的缺點就是，沒有把page分為cold page and hot page，也就是說cold page容易取代

hot pgae，造成cache miss ratio上升

4.What is the authors' approach or solution?

本文作者提出的方法是CLOCK-PRO，是改良自CLOCK這個方法。CLOCK-PRO的運作原理是把所有page，頭尾串在一起，像

一個clock一樣，而且有3個pointer，分別為Hand-hot，Hand-cold，Hand-test，Hand-hot pointer所指向的是常用的

page，Hand-cold所指向的是不常用的page，將來有新的page要進來，將優先從Hand-cold所指向的page移出，以便有新

的空間容納新的page，而Hand-test pointer是每隔一段時間用來測試cold page當中有那些是將會用到，如果會用到，

就會將cold page移到hot page中

5.Why is it better than the other approaches or solutions?

因為CLOCK-PRO將page分為cold page and hot page，而且還會把buffer分開，一個是process專用，一個是I/O專用，分

開的好處是可以大幅降低cache miss。

6.How does it perform?

根據Abstract中提及，CLOCK-PRO在Linux kernel 2.4.21中實作，根據測量的結果，利用CLOCK-PRO這個方法可以使程式

加快執行速度達47%

7.Why is this work important?

因為我們都知道IO相較於main mem，速度相差千倍以上，可說是電腦效能的主要bottleneck，如果可以妥善利用VM，

減輕IO loading，增加cache hit ratio，將可以大幅提升電腦整體效能，而且只需在OS Layer實作即可，不用更改

目前的architecture。

8.Can any improvement be done?

非常的困難

(8)

Q1.What is the problem the authors are trying to solve?
A:
They try to solve the inefficiency method "LRU" an algorithm that decide to leave or clear the data in the memory.
As the memory size is limited, and an efficient way to keep the data which will be reuse later in memory is very important.
Because the cost of page fault can't be ignored.

Q2.What other approaches or solutions existed at the time that this work was done?
A:
Type one: These approaches try to request the applications offer relative hints for reference.
a."Application-Controlled File Caching Policies"
b."Informed Prefetching and Caching"
Type two: Detecting the failing patterns LRU to select other algorithms
a.SEQ
b.EELRU
c.UBM
Type three:Analyze history data to enhance the precision of chaching
a.FBR
b.LRFU
c.LRU-2
d.2Q
e.MQ
f.LIRS
g.ARC

Q3.What was wrong with the other approaches or solutions?
A:
Other approaches either with higher cost or even worse performance than LRU.
Only Approaches "ARC" and "LIRS" perform better than LRU.

Q4.What is the authors' approach or solution?
A:
This approach is similiar with the approach "Clock".
But this approach use different decision factor to replace the page.
This approach use "reuse distance" rather than "recency" to replace the cold page.
"Reuse distance" is the accessed time relates to other pages' last accessed time.
This value catagorize these pages as hot page, cold page and non-resident cold page.
A list records all the pages(hot and cold) and ordered by their recencies.
Cold pages will move out the list if they have not be accessed with a test period.
This approach use three pointers(Hand) to indicate the largest recency for hot page,cold page and non-resident cold page.
All there hand move in clockwise direction. And pages will be changed status due to thier attributes (reference bit and access time ..etc.).
Authors built a dynamic list for cold pages and hot pages and they believe that could enhance the performance.

Q5.Why is it better than the other approaches or solutions?
A:
Other approaches will take more cost to solve the replacement algorithm.
Some approaches need to coordinate with application and make things more complicate and more resources.
And there approaches mostly try to solve LRU not Clock.

Q6.How does it perform?
A:
This approach is compared with other approaches like “CLOCK, LIRS, CAR and OPT.”
The page fault rate is slightly lower than other approaches.
If the running program is a strong locality program, this approach and approach “CAR” could demonstrate their improvement but “CLOCK” failed.
But if the running program is a weak locality program, only this approach could get better performance than others.

Q7.Why is this work important?

A:
This approach could be implemented in modern OS with very low cost.
This approach has more accurate locality definition.
It could easily solve the weak access pattern issue.

Q8.Can any improvement be done?
A:
In my opinion, no further improvment can be done.

(9)

Q1. What is the problem the authors are trying to solve?
對與使用VM的重要課題之一是減少page fault, LRU是十分有效的演算法, 已使用了近四十年,
但是LRU類型的演算法對與weak locality的擊中率並不是很有效率, 作者想提出一個對於strong
locality與weak locality都有不錯擊中率的演算法

Q2. What other approaches or solutions existed at the time that this work was done?
LRU,SEQ,EELRU,UBM,FBR,LRFU,LRU-2,W1,MQ,LIRS,ARC

Q3. What was wrong with the other approaches or solutions?
以LRU為基礎的演算法都繼承了LRU的缺點: 對於weak locality並不是很好, LIRS則是相反, 在
weak locality情況下不錯, 但在strong locality下卻不是很好

Q4. What is the authors' approach or solution?
作者以LIRS為基礎發展clock-pro
1.此clock是一個首尾相接的linked list, page分為hot page和cold page
2.如同LIRS, 使用reuse distance來判定hot或cold, distance最大的hot page會變成cold page,
cold page如果再沒被用到會變為non-resident, 當此cold page超過test period, 則被移除.
3.有三個hand, 順時針處理, hand-hot指向distance最大的hot page, 它會將hot page 轉為cold
page, hand-cold指向distance最小的resident cold page, 並將之變為non-resident, hand-test
會指向第一個cold page(non-resident), 如果該cold page在test period裡沒有被access, 則會被
移除, 如果有, 它會變成hot page.

Q5. Why is it better than the other approaches or solutions?
不管是使用較密集的page, 或是較疏遠的page, 只要有用過, 都有機會轉變為hot page, 因此不管
在weak locality或strong locality都不會太差

Q6. How does it perform?
作者跑了12種各式的程式, 有weak locality的也有strong locality的, 只有少數程式clock-pro稍
微落後clock(LRU based), 大多數的情況clock-pro都大幅領先clock, 表示在weak locality情況下
clock-pro有很好的表現, 在strong locality的情況下也有不錯的結果.
另外實做到linux kernel裡, 與原系統比較, 在大約170~120MB以前, 執行速度比原系統快, 之後則
一般.

Q7. Why is this work important?
與其使用在某種狀況下才有最佳表現的演算法, 不如使用在各種狀況下都有不錯表現的演算法, 尤其
是在記憶體有限的情況下

Q8. Can any improvement be done?
這些演算法會在記憶體極大時失去意義, 但對於嵌入式系統會很重要. 作者的實驗結果只有在大約120
到170MB以前才能和舊系統拉開差距, 現今的電腦大多配1G以上的記憶體, 因此優劣的判別將會放再是否容易
實作.