replayfs

Accurate and Efficient Replaying of File System Traces

Date: December 10, 2007

(1)

Q1. What is the problem the authors are trying to solve?
檔案系統的trace應該要能真實反應它運作的情況, 並重新展現. 作者想要作一個比現有工具更
快更準確的解決方案.

Q2. What other approaches or solutions existed at the time that this work was
done?
NFS level trace, user level trace: Buttress, DFSTrace. device driver level trace

Q3. What was wrong with the other approaches or solutions?
NFS的trace不夠精細, user level澤沒辦法紀錄 memory mapped讀寫, 而且會比較慢, Buttress
會比原程式慢10%, DFSTrace沒有提供replay的功能. device driver level則是紀錄硬體動作,
沒辦法看出和系統的關係

Q4. What is the authors' approach or solution?
作者在VFS之下, lower file system之上建立trace, 綁定在kernel裡面, 好處就是快, 而且可以
紀錄到user-mode沒辦法紀錄到的資訊. 分三個部分: tracefs, trace compiler, replayfs.
tracefs: 紀錄VFS的API calls, trace compiler; 蒐集tracefs給的資料作整理, 再交由replayfs
執行. replayfs: 把自己當作VFS對lower file system下指令.

Q5. Why is it better than the other approaches or solutions?
在VFS層下展示replay是前無古人的. 且最適合和這方法比較的是user level trace, user level
trace只能trace到system call, 除了memory mapped i/o無法紀錄外, 會因為context switch的
關係要在user-mode/kernel-mode作複製, 還有速度會比原程式慢. 做在kernel則移除掉這些問題.

Q6. How does it perform?
1.紀錄Am-utils build, postmark, pread,等程式跑的時候檔案系統動作. 的確能紀錄
2.在replay時, replay Pread trace比原程式快32%, 如果拿掉user-mode buffer和kernel page
之間的copy, 可以快61%, 相較於使用user-mode的System call trace replay, System call
replay會比原程式慢

Q7. Why is this work important?
完整的紀錄檔案系統動作並在受干擾程度最小的情況下重現, 最於benchmark和偵測攻擊是重要的.

Q8. Can any improvement be done?
無

(2)

Q1.What is the problem the authors are trying to solve?
A:Replaying traces用來作benchmarking,stress-testing,以及debugging system,但
System call的replayers沒有memory-mapped的operations,且無法用原來的速度來replay
I/O intensive workloads.若不同level的traces擷取其它levels,會遺漏重要有用的資訊,
所以只能file system.作者要利用VFS API的一致性,在其它file system上,replay在不同的
file system上原來的trace.

Q2.What other approaches or solutions existed at the time that this work was done?
A:現有replay有4類,包含system call,network,disk device,network file system
1.strace software home page:它使用ptrace的system call來擷取連續被喚起的system calls.
2.Tracefs: A File System to Trace Them All,File System Usage in Windows NT 4.0,
A Comparison of File System Workloads.:這些traces包含了memory-mapped的operations.
3.The TCPDump/Libpcap site,New NFS Tracing Tools and Techniques for System Analysis,
NFS Tracing by Passive Network Monitoring:可以擷取及preprocess有關network file
system的packets.
4.UNIX Disk Access Patterns:Driver-level traces.

Q3.What was wrong with the other approaches or solutions?
A:
1.System call的replayers沒有memory-mapped的operations,且無法用原來的速度來replay I/O
intensive workloads.
2.無法重製timing及in-memory相關的file system modifications.
3.User-mode的tools無法replay有較高I/O rates,較高效能的程式,因為需要較大memory及較多的
CPU overheads.

Q4.What is the authors' approach or solution?
A:作者發展了一個新的replaying system名為Replayfs,它建構在kernel裡,在file systems之上,
它利用VFS一致的API,replay由Tracefs所擷取的trace,它可以使用user-level trace compiler,可
以讓它能夠在offline的情況下運行的很好,或是online使用runtime kernel Replayfs的module.
1.它capture及replay所有file system operations-including重要的memory-mapped,
operations-resulting可以更精確的replaying.
2.它可以存取重要的內部kernel caches,並避開不必要的資料複製,減少context switches,且有效
的trace資料預取.
3.它可以精確的控制thread scheduling,可使我們使用oft-wasted pre-spin期間有更多的成果,這
個技術稱之為productive pre-spin.

Q5.Why is it better than the other approaches or solutions?
A:Replayfs用極少的overhead,精確的replay高I/O-rate的trace,其它的replayers由於它的overheads
,而影響它不能精確的replay,尤其在高的event rates.

Q6.How does it perform?
A:在相同的硬體條件下
1.Replayfs與原來的program generated相較,Pread trace的replay速度快 32%.
2.Replayfs-nocopy與原來的program generated相較,Pread trace的replay速度快61%.
而System call的replayer會比原程式慢.

Q7.Why is this work important?
A:精確的replay所擷取的資訊,可以更準確的評估系統的效能,在debug system方面,更能使system更近
完美,在不同的solutions下,亦能找出最佳方案.

Q8.Can any improvement be done?
A:No.

(3)

1. What is the problem the authors are trying to solve?
作者希望可以研究出一個能知道 memory 狀態,和能夠準確播放 trace且又不會
消耗太多資源的 replayer.

2. What other approaches or solutions existed at the time that this work was done?
1.Buttress 和 DFSTrace 可以在 user level replay kernel level 的東西,可是在高I/O 時
效率差.
2.抓封包,重新丟封包.

3.Drive-Thru 抓 deriver level 的東西,並且在 user level 播放,用以檢察效率.
3. What was wrong with the other approaches or solutions?
TRACER:
syscall-call-level tracer: 不知道 memory 在做什麼,只能抓 syscall.(strace,DFSTrace)
virtual file system level: 可以知道 memory 的狀態.(LINUX,WINDOWS NT)
network tracers: 能夠抓封包
driver-level tracers: 用來研究 Disklayout 用

REPLAYER:
syscall replayer replay 高I/O 的 trace 會有 timing 抓不準,和效率低的問題.
如果 replayer 是 userlevel 的話,會消耗大量 Memory 和 CPU,造成 trace 不準確.

4. What is the authors' approach or solution?
Author 要做比 syscall level replayer 更精確的 replayer.
此 replayer 和 Tracefs 是同一個 level,拿到 tracefs 給的
trace 就可以直接 call Filesystem,不像syscall level 的 replayer 要處理資料,再
call VFS 才能動到下層的 file system , 造成不精確的發生.

1.用個 user level compiler 來簡化 Tracefs 所取得的資料.
2.記下 memory 的狀態.
3.可以取得 kernel 的相關資料.
4.可以取得 file system 的資料.

5. Why is it better than the other approaches or solutions?
Author 的 replayer 因為是做於 kernel 之中,所以可以直接取得 memory 的狀態,
而且可以直接 control
6. How does it perform?

7. Why is this work important?
這個 replayer 可以占很小的資源,而且可以精準的把VFS request 很精準的
傳給下面的 filesystem,對與評估 file system 會有很大的幫助.

8. Can any improvement be done?
做在 kernel level 的工具,如果要做到像User level 的那樣容易使用,應該是
會有一段距離.

能夠直接 control filesystem,則此 tracer 可能要針對每個 file system 都出
一個版本,或是,直接拿 VFS 去改,如果只做一個版本,那還是會有 VFS 那層 level
的處理時間,除非那個時間很短,可以忽略.

(4)

Q1. What is the problem the authors are trying to solve?
A:
Replaying traces 是一種測試效能的方法，它具有一個優點是可以正確的重新產生操作
的流程。
現行的trace capture及replay system操作在不同層面。這造成replay時，無法正確的呈
現出在trace時，系統運作的真實情況。
為了能更精確的呈現出trace時的情況，作者設計了在kernel層運作的file system，用來
replay系統過去的運作情況。

Q2. What other approaches or solutions existed at the time that this work was done?
A:
trace capture可以在network packets, disk device drivers, network file system 或
system call等level裡執行。作者在Related Work中提到strace是用來擷取system call的常見
工具。它使用ptrace system call去擷取連續的system call。
Buttress及DFSTrace可以在user level replay system call trace。

Q3. What was wrong with the other approaches or solutions?
A:
作者在文中提到：
- Network tracer無法擷取到滿足從client端或file system cache來的request。
- Device driver tracer擷取原始的disk requests但無法區別file system meta-data event
及data-related event。
- system call replay 會增加系統的overhead，因此無法在高速的I/O運作的應用上執行，否則
就必須限制在低速的I/O rate上執行。

Q4. What is the authors' approach or solution?
A:
作者提出一個在VFS上運作的system稱之為Replayfs。它可以在VFS上replay由Tracefs所
擷取的trace。
Replayfs被放置在與Tracefs同一個level，這是為了replay時能與capture時有著相同的情況
。它並不是一個真正的FS，它的目的是為了重現VFS在被capture時的狀況。它的動作就像VFS直接
與低層的FS互動一樣。
原始的trace與用來replay的trace具有天性的不同，原始的tracer是設計為了適用於不用的
環境而且因為它必須描述事實而造內容冗長。
為了將原始的trace轉換為適合replay的trace，作者創造了Trace compiler，它用來轉換原始
的trace，使其適合replay，它由三個元件組成：
Command:是VFS的操作順序包含與其相關的時間戳記，處理ID，參數，預定回傳值…等的資訊。
Resource Allocation Table (RAT):由於Tracefs及Replayfs處理的VFS物件在記憶體內的位置
不能事先知道且它們被分享在command裡，因此設計了RAT，用來間接的從command中參考其參數及
回傳值。
Buffer:用來保留replay trace時所需要的buffer

Q5. Why is it better than the other approaches or solutions?
A:
採用system call replayer會因為多了overhead而無法replay最高的I/O rates。採用
Replayfs的方案可以replay最高的I/O rate，而且由於Replayfs是在VFS level上執行，使用
VFS API，因此可以在不修改file system的情況下，使用Replayfs的方案。

Q6. How does it perform?
A:
在Memory Overheads的比較中，我們發現AM-utils trace size 降低了56%，Postmark
降低70%，Pread下降了45%。
在以Pread程式為基準，比較不同條件下的replay時間消耗中，我們得知用system call
replayer執行replay所花費的時間，比Pread的原始消耗時間多出了10%，而在其他的條件下
所花費的時間都比Rread消耗的時間為少，因此我們可以知道，即使是在系統全速執行的狀
況下，replayfs仍然能夠準確的replay trace。

Q7. Why is this work important?
A:
有了精確的replay可以在實際的情況下用來做stress test；選擇trace中的一部分來replay
，可以有效的在debug的時候減少search的範圍；replay file system trace可以被用做fine-
graind 版本的參考。
Replayfs是第一個在VFS level上執行的replay system，這是一項重大的突破。

Q8. Can any improvement be done?
A: No

(5)

Q1. What is the problem the authors are trying to solve?
A: The authors want to design the first VFS-level replayer which called
Replayfs. This replayer is useful for file system benchmarking,
stress-testing, debugging, and forensics.

Q2. What other approaches or solutions existed at the time that this work was
done?
A: The related works are listed as following:
1.File System Aging - Increasing the Relevance of File System Benchmarks.
2.A Versatile and User-Oriented Versioning File System.
3.Metadata Efficiency in Versioning File Systems.
4.Tracefs: A File System to Trace Them All.
5.A Comparison of File System Workloads.

Q3. What was wrong with the other approaches or solutions?
A:
1.System call replayers miss memory-mapped operations and cannot replay I/O
intensive workloads at original speeds. Traces captured at other levels
miss vital information that is available only at the file system level.
2.Existing versioning file systems cannot reproduce the timing and in-memory
conditions related to file system modifications.
3.User-mode tools cannot replay the highest possible I/O rates and
spikes of such activity because due to their overheads.
4.Some kernel are not preemptive and have long execution path.

Q4. What is the authors' approach or solution?
A:
The autours' want to design VFS-level replayer. Used traces complier
conversion and optimization of the Traces raw traces.Then splits the raw
Tracefs trace into three components:Command,Resource Allocation Table (RAT),
Buffer. Also chosen the fixed horizon algorithm for data prefetch.
Reuse threads and using timers for replaying precision.
avoid copying read data to user space.
Replaysfs supports three replaying modes for dealing with read operations
to avoid inconsistency between the RAT ertries and the actual dentries.

Q5. Why is it better than the other approaches or solutions?
A:
there are three benefits:
1.Capture and replay all file system operations Include important
memory-mapping.
2.Avoid unnecessary data copying, reduce the number of context switches and
optimize trace data prefetch.
3.Precise control over thread scheduling : pre-spin

Q6. How does it perform?
A:Base on Am-utils build,Postmark and Pread benchmark to simulate system
operations and record CPU time consumption.

Q7. Why is this work important?
A:
Because this is first VFS level replaying traces and reduce context seitched
and increase trace dataprefetch.

Q8. Can any improvement be done?
A: No

(6)

1. What is the problem the authors are trying to solve?
A:
Replaying traces is a emphasis time method. It is a benchmarking ,stress-testing ,and debugging system.
But I/O rate can make difficult to reproduce track. Although it reproduces track successfully, it may
occur higher overhead to capture the trace of system.

2. What other approaches or solutions existed at the time that this work was done?
A:
Capturing traces ,Trace replaying , File system state versioning ,Data prefetching
,and Timing inaccuracy.

3. What was wrong with the other approaches or solutions?
A:
System call can't capture I/O workload and may miss memory-mapped operations and has
overhead to replay I/O requests. System call ,Network tracer ,and driver level all
can't capture file system activity accurately. Trace captured is useful at file system
level ,but the other level can't available ,because they may miss important information.

4. What is the authors' approach or solution?
A:
The authors designed a replay system that is called Replayfs. Although Replayfs works at kernel
mode ,but it is not a file system.
The tracefs is a stackable file system ,and works in kernel mode too ,it captures trace from original
file system requests and captures these requests through user mode to kernel mode. These requests
are called raw traces that have all file system activity information. And using trace compiler to
optimize raw traces to replayfs traces and then transmit those traces to repalyfs. Repalyfs can use
those traces to replay file system activity.
Replayfs trace has three components: commands ,resource allocation table (RAT) ,and the
memory buffers.

5. Why is it better than the other approaches or solutions?
A:
Replayfs's goal is replaying original file system workload efficiently.
Because replayfs works at VFS level ,it can replay all about operations of file system developer
,and doesn't miss any important information.
At VFS level can capture file system activity accurately.

6. How does it perform?
A:
The authors designed a statistics modules to record and evaluate the time of replayfs.
Authors used Am-utils build to evaluate CPU intensive ,and used Postmark to simulate the operation of
I/O intensive ,and used Pread to evaluate consumption of replayfs's CPU time.

7. Why is this work important?
A:
The replaying trace can analyses or debug a system by reproducing their trace
and tracks the behavior of the system.

8. Can any improvement be done?
A: When the machine has higher I/O rate ,the Replayfs still keep higher performance?

(7)

1.What is the problem the authors are trying to solve?

作者觀察到，如果想要測量file system的執行效能是困難的，因為每次的I/O讀寫都是

不同的，也就是說每次的測量數據可能會有極大的差異，而且也無法重製某一次所測量到的

數據，於是作者提出一種可以trace I/O讀寫的file system，不但可以重製某一次的I/O讀

寫動作，還可以做benchmarking，stress-testing，debugging systems and forensic analysis

2.What other approaches or solutions existed at the time that this work was done?

No other approaches or solutions existed.

3.What was wrong with the other approaches or solutions?

No other approaches or solutions existed.

4.What is the authors' approach or solution?

5.Why is it better than the other approaches or solutions?

No other approaches or solutions existed.

6.How does it perform?

第一種做法是Tracefs是stackable file system，它介於VFS and lower file system，Tracefs

負責接收來自VFS request並傳遞到lower file system

7.Why is this work important?

作者提出這種方法，讓人們可以準確的測量每個FS的執行效能，而且針對新開發的FS，還可以做

密集的壓力測試，debugging，而且可以針對各個FS的執行效能做一個客觀的比較

8.Can any improvement be done?

No

(8)

Q1.What is the problem the authors are trying to solve?
A:
在過去研讀的論文中,有許多的數據都是使用trace(歷史紀錄)的方式運算所得到,大部分的研究使用trace的結果應該不會有太大出入
但是如果是用在與檔案系統有關係,那麼因為有牽涉到記憶體映射及硬碟對應的相關問題,使用Replay trace所得的結果可能與實際結果會有
差異.(在同一部機器上實際執行後,將Replay trace在同一部機器再運行一次),這篇研究的目的就是希望能將使用replaying trace得到
的結果能夠更精準.
當然有些研究會使用機器模擬的workload為輸入,但是所呈現的工作行為模式與實際上運作產生的工作負載通常不會一致,所以普遍還是喜好使用
Trace的方式為輸入來進行測試

Q2.What other approaches or solutions existed at the time that this work was done?
A:
使用Replay tracing來測試檔案系統效能的方式行之有年,比較具有代表性的方法如下
[1]Strace [2]DFSTrace [3]Tracefs [4]Buttress [5]TIP2 [6]Replacement Algorithm for Virtual Storage Computers. [6]TPC

Q3.What was wrong with the other approaches or solutions?
A:
由於trace會在不同的硬體上進行,所以當初從別部機器所取得的trace並不會包含底層運作的資訊,例如記憶體的位址及執行的時間順序等.可能會產生差異
雖然差異可能不會太大,可是作者就是希望能讓結果更精準.尤其是現在的系統,可能大部分資料都是已經快取在記憶體內,如果忽略這個部分,可能會有明顯的差異
其他研究收集紀錄的方法差異在於層次的不同,有些是從kernel的system call著手,有些是系統的VFS紀錄(已經在Kernel但也算是蠻上層的).
目前有些使用system call的replayer可能造成系統多餘的負載(10%-100%)
Q4.What is the authors' approach or solution?
A:
為了能夠紀錄關於記憶體行為及時間排程的資料,作者選擇將記錄器放置在VFS與機器的檔案系統之間
主要有兩個元件,Tracefs及Replayfs
Tracefs負責記錄從VFS傳到檔案系統的命令(包含執行的時間記錄),而Replayfs則是將trace的資料取回運算
也因為Tracefs與Replayfs紀錄與執行的階層相同,比較能夠反映真實的狀況.
另外為了減輕系統負擔,作者還使用了Trace Compiler的元件,針對不同的作業環境將trace的資料預先處理
關於VFS與記憶體位置的關聯性,作者使用RAT的元件,Tracefs紀錄時會將相關資訊寫入RAT中,當使用Replayfs時就會參照RAT裡的資訊來運行

Q5.Why is it better than the other approaches or solutions?
A:
這個研究主要是希望能將從系統記錄下來的trace log,在別部機器測試時也能將當運作的條件真實反映,其他的trace capturing system收集紀錄的階層較高
許多與I/O或是記憶體相關的資訊都沒有一併紀錄,所以進行測試所得的數據與本研究相比精準度較差

Q6.How does it perform?
A:
一般使用Replayer(應用程式),產生的CPU負荷都會比原本收集紀錄的機器高,本研究的Replayfs是在Kernel內執行,會減少許多不必要的等待時間,較接近真實條件
作者的研究Replayfs與使用system-call的Replayer的方式在執行相同的trace時,Replayfs會比System-Call Replayer減少10%的時間.
甚至可以在同一部機器上使用Replayfs,用2.5倍的效率執行相同的程序(尤其是work pattern大部分是讀取資料的型態)

Q7.Why is this work important?
A:
使用trace file來進行檔案系統效能的模擬測試已經幾十年,早年由於記憶體容量不大,對於測試的數據可能影響不大,但是近年因為記憶體較大,應用程式可能都預先載入
記憶體中,而且其他的研究方法都是在user-level紀錄,也會遺漏部分在kernel的資訊(Timing,Thread Schedule).對於能夠精準的反應測試結果,本研究與其他方法相比,
有較佳的執行效率及數據的可靠性.

Q8.Can any improvement be done?
A:
沒有想到可以改善的方式