https://github.com/xxrjun/nvmain-gem5

💾 NCU CE3001 Computer Organization Final Project, 2024 Spring
https://github.com/xxrjun/nvmain-gem5
Last synced: 10 months ago
JSON representation
💾 NCU CE3001 Computer Organization Final Project, 2024 Spring
Host: GitHub
URL: https://github.com/xxrjun/nvmain-gem5
Owner: xxrjun
Created: 2024-05-06T04:54:42.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-20T06:55:40.000Z (over 1 year ago)
Last Synced: 2025-01-15T08:25:44.576Z (12 months ago)
Language: C++
Homepage:
Size: 25.6 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # NVMain + Gem5

- [NVMain + Gem5](#nvmain--gem5)

  - [Usage](#usage)

  - [Grading Policy](#grading-policy)

  - [Task Implementations](#task-implementations)

    - [Task 1: Build GEM5 + NVMain](#task-1-build-gem5--nvmain)

    - [Task 2: Enable L3 last level cache in GEM5 + NVMain](#task-2-enable-l3-last-level-cache-in-gem5--nvmain)

    - [Task 3: Config last level cache to 2-way and full-way associative cache and test performance](#task-3-config-last-level-cache-to-2-way-and-full-way-associative-cache-and-test-performance)

    - [Task 4: Modify last level cache policy based on frequency based replacement policy](#task-4-modify-last-level-cache-policy-based-on-frequency-based-replacement-policy)

    - [Task 5: Test the performance of write back and write through policy based on 4-way associative cache with isscc\_pcm](#task-5-test-the-performance-of-write-back-and-write-through-policy-based-on-4-way-associative-cache-with-isscc_pcm)

  - [Evaluation](#evaluation)

    - [Energy Consumption](#energy-consumption)

    - [The Number of Read/Write Requests](#the-number-of-readwrite-requests)

  - [References](#references)

## Usage

> [!TIP]

> Make sure you checkout to the `q5-write-through` branch to run the task 5b. which tests the performance of write through policy.

Clone the repository.

```bash

git clone https://github.com/xxrjun/nvmain-gem5.git

cd nvmain-gem5

```

Follow the instructions in [Environment Setup](docs/EnvironmentSetup.md) to build GEM5 + NVMain. Then, run the following scripts to execute the tasks.

```bash

cd scripts

# Task 1: Build GEM5 + NVMain

./task1_mix_compile_gem5.sh

# Task 2: Enable L3 last level cache in GEM5 + NVMain

./task2_hello_with_l3cache.sh

# Task 3: Config last level cache to 2-way and full-way associative cache and test performance

./task3_quicksort_benchmark.sh

# Task 4: Modify last level cache policy based on frequency based replacement policy

./task4_quicksort_benchmark_frequency_based_policy.sh

# Task 5: Test the performance of write back and write through policy based on 4-way associative cache with isscc_pcm

./task5a_multiply_benchmark_writeback.sh

git checkout -b q5-write-through

./task5b_multiply_benchmark_writethrough.sh

```

## Grading Policy

| Criteria                                                                                                    | Percentage | Details                                                                                                  | Status  |

| ----------------------------------------------------------------------------------------------------------- | :--------: | -------------------------------------------------------------------------------------------------------- | :-----: |

| GEM5 + NVMAIN BUILD-UP                                                                                      |    40%     | 參照投影片教學                                                                                           |   ✅    |

| Enable L3 last level cache in GEM5 + NVMAIN                                                                 |    15%     | -                                                                                                        |   ✅    |

| Config last level cache to 2-way and full-way associative cache and test performance                        |    15%     | 必須跑 benchmark quicksort 在 2-way 跟 full way                                                          |   ✅    |

| Modify last level cache policy based on frequency based replacement policy                                  |    15%     | -                                                                                                        |   ✅    |

| Test the performance of write back and write through policy based on 4-way associative cache with isscc_pcm |    15%     | 必須跑 benchmark multiply 在 write through 跟 write back                                                 | ✅ |

| Bonus                                                                                                       |    10%     | Design last level cache policy to reduce the energy consumption of pcm_based main memory (Baseline: LRU) | Pending |

## Task Implementations

### Task 1: Build GEM5 + NVMain

> [!TIP]

> You can see the count of cache hits in `gem5/m5out/stats.txt`.

Follow the instructions in [Environment Setup](docs/EnvironmentSetup.md) to build GEM5 + NVMain.

### Task 2: Enable L3 last level cache in GEM5 + NVMain

Reference

- [gem5-stable 添加 l3 cache](https://blog.csdn.net/tristan_tian/article/details/79851063)

- [Adding cache to the configuration script](https://www.gem5.org/documentation/learning_gem5/part1/cache_config/)

Modify the following files in `gem5/`

#### `Options.py`

> gem5/configs/common/Options.py

Add the `--l3cache` option.

```python

parser.add_option("--l3cache", action="store_true")

```

#### `Caches.py`

> gem5/configs/common/Caches.py

Add the `L3Cache` class.

```python

class L3Cache(Cache):

    assoc = 32

    tag_latency = 32

    data_latency = 32

    response_latency = 32

    mshrs = 20

    tgts_per_mshr = 12

    write_buffers =16

```

- **Associativity (assoc)**: Determines the number of ways in the set associative cache.

- **Tag Latency (tag_latency)**: Cycles to access the tag array.

- **Data Latency (data_latency)**: Cycles to access the data array.

- **Response Latency (response_latency)**: Cycles to respond to a cache request.

- **MSHRs (mshrs)**: Number of Miss Status Holding Registers.

- **Targets per MSHR (tgts_per_mshr)**: Number of requests each MSHR can handle.

- **Write Buffers**: Number of write buffers.

#### `Xbar.py`

> gem5/src/mem/Xbar.py

Add the `L3XBar` class. This file is primarily used to define and configure memory crossbars in the GEM5 simulator. A crosssbar is crucial connection component used to tranfer data between different memory modules and processor cores.

```python

# We use a coherent crossbar to connect multiple masters to the L3

# caches. Normally this crossbar would be part of the cache itself.

class L3XBar(CoherentXBar):

    # 256-bit crossbar by default

    width = 32

    # Assume that most of this is covered by the cache latencies, with

    # no more than a single pipeline stage for any packet.

    frontend_latency = 1

    forward_latency = 0

    response_latency = 1

    snoop_response_latency = 1

    # Use a snoop-filter by default, and set the latency to zero as

    # the lookup is assumed to overlap with the frontend latency of

    # the crossbar

    snoop_filter = SnoopFilter(lookup_latency = 0)

    # This specialisation of the coherent crossbar is to be considered

    # the point of unification, it connects the dcache and the icache

    # to the first level of unified cache.

    point_of_unification = True

```

#### `BaseCPU.py`

> gem5/src/cpu/BaseCPU.py

```python

from XBar import L3XBar

def addThreeLevelCacheHierarchy(self, ic, dc, l2c, l3c, iwc=None, dwc=None,

                                xbar=None):

    self.addPrivateSplitL1Caches(ic, dc, iwc, dwc)

    self.toL3Bus = xbar if xbar else L3XBar()

    self.connectCachedPorts(self.toL3Bus)

    self.l3cache = l3c

    self.toL3Bus.master = self.l3cache.cpu_side

    self._cached_ports = ['l3cache.mem_side']

```

#### `CacheConfig.py`

> gem5/configs/common/CacheConfig.py

Add L3 cache configuration.

```python

if options.cpu_type == "O3_ARM_v7a_3":

    try:

        from cores.arm.O3_ARM_v7a import *

    except:

        print("O3_ARM_v7a_3 is unavailable. Did you compile the O3 model?")

        sys.exit(1)

    dcache_class, icache_class, l2_cache_class, walk_cache_class = \

        O3_ARM_v7a_DCache, O3_ARM_v7a_ICache, O3_ARM_v7aL2, \

        O3_ARM_v7aWalkCache

else:

    # NOTE: Add L3 cache here

    dcache_class, icache_class, l2_cache_class, l3_cache_class, walk_cache_class = \

        L1_DCache, L1_ICache, L2Cache, L3Cache, None

```

Note that L3 cache is only enabled when L2 cache is enabled, so we have two cases:

- L2 and L3

- L2 but no L3

```python

if options.l3cache and options.l2cache: # L2 and L3

    system.l2 = l2_cache_class(clk_domain=system.cpu_clk_domain,

                        size=options.l2_size,

                        assoc=options.l2_assoc)

    system.l3 = l3_cache_class(clk_domain=system.cpu_clk_domain,

                        size=options.l3_size,

                        assoc=options.l3_assoc)

    system.tol2bus = L2XBar(clk_domain = system.cpu_clk_domain)

    system.tol3bus = L3XBar(clk_domain = system.cpu_clk_domain)

    system.l2.cpu_side = system.tol2bus.master

    system.l2.mem_side = system.tol3bus.slave

    system.l3.cpu_side = system.tol3bus.master

    system.l3.mem_side = system.membus.slave

elif options.l2cache: # L2 but no L3

    # Provide a clock for the L2 and the L1-to-L2 bus here as they

    # are not connected using addTwoLevelCacheHierarchy. Use the

    # same clock as the CPUs.

    system.l2 = l2_cache_class(clk_domain=system.cpu_clk_domain,

                                size=options.l2_size,

                                assoc=options.l2_assoc)

    system.tol2bus = L2XBar(clk_domain = system.cpu_clk_domain)

    system.l2.cpu_side = system.tol2bus.master

    system.l2.mem_side = system.membus.slave

```

### Task 3: Config last level cache to 2-way and full-way associative cache and test performance

Download the benchmark file provided by TAs. Then execute the scripts [scripts/task3_quicksort_benchmark.sh](scripts/task3_quicksort_benchmark.sh) to run the benchmark.

### Task 4: Modify last level cache policy based on frequency based replacement policy

Reference: [Replacement Policies](https://www.gem5.org/documentation/general_docs/memory_system/replacement_policies/)

> [!TIP]

> Refer to [gem5/src/mem/cache/replacement_policies/ReplacementPolicies.py](gem5/src/mem/cache/replacement_policies/ReplacementPolicies.py) for all replacement policies.

In this part, I modified two files

- [gem5/configs/common/Options.py](gem5/configs/common/Options.py)

  ```python

  parser.add_option("--l3_replacement_policy", type="string", default="LRU")

  ```

- [gem5/configs/common/CacheConfig.py](gem5/configs/common/CacheConfig.py)

  ```python

  # Task 4: Modify last level cache policy based on frequency based replacement policy

  if options.l3_replacement_policy == "LFU":

      system.l3.replacement_policy = LFURP()

  else:

      system.l3.replacement_policy = LRURP() # default policy

  ```

### Task 5: Test the performance of write back and write through policy based on 4-way associative cache with isscc_pcm

> Run benchmark multiply in write through and write back policy. (In GEM5, the default policy is write back, we can judge the policy by the number of write requests.)

> [!TIP] 

> [`gem5/src/mem/cache/base.cc`](gem5/src/mem/cache/base.cc) and [`gem5/src/mem/cache/cache.cc`](gem5/src/mem/cache/cache.cc) are the files that define the cache policy. We can check the number of _total requests_ and _write requests_ to determine the policy.

```cpp

// ...

} else if (blk && (pkt->needsWritable() ? blk->isWritable() :

                    blk->isReadable())) {

    // OK to satisfy access

    incHitCount(pkt);

    satisfyRequest(pkt, blk);

    maintainClusivity(pkt->fromCache(), blk);

    // Add this part to the code

    // Write back the block if it is writable when we are doing a normal read/write request

    // This has the same effect as write through policy

    if (blk->isWritable()) {

        PacketPtr writeclean_pkt = writecleanBlk(blk, pkt->req->getDest(), pkt->id);

        writebacks.push_back(writeclean_pkt);

    }

    return true;

}

// ...

```

#### Write Back

- Write Request -> If hit, write to cache block

- When a dirty block is evicted, write back to memory.

- Note that in this policy, read misses would cause write backs.

#### Write Through

- Write Request -> If hit, write to cache block

## Evaluation

> [!TIP]

> You can use [utils/extract.py](utils/extract_stats.py) to get an integrated [out/output_stats.csv](out/output_stats.csv).

The following metrics are extracted from the `stats.txt` and `log.txt` files.

- `sim_seconds`

- `sim_ticks`

- `system.l3.overall_hits::total`

- `system.l3.overall_misses::total`

- `system.l3.overall_miss_rate::total`

### Energy Consumption

- `system.mem_ctrls.pwrStateResidencyTicks::UNDEFINED`

- `system.pwrStateResidencyTicks::UNDEFINED`

Above are in `stats.txt`. Below are in `log.txt`.

### The Number of Read/Write Requests

- `i0.defaultMemory.totalReadRequests`

- `i0.defaultMemory.totalWriteRequests`

## References

- Final project_Ch.pptx

- [The gem5 Memory System](https://www.gem5.org/documentation/general_docs/memory_system/gem5_memory_system/)

- [[2022 種子教師培訓 (4/5)] Gem5 實作流程 (詳細介紹與實作)](https://www.youtube.com/watch?v=W5JXM3wIdcY)

- MSHR

    - [Miss Status Holding Registers(MSHR)](https://miaochenlu.github.io/2020/10/29/MSHR/)

- Crossbar

    - [Interconnection Network](https://www.gem5.org/documentation/general_docs/ruby/interconnection-network/)

    - [Classic Caches](https://www.gem5.org/documentation/general_docs/memory_system/classic_caches/)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xxrjun/nvmain-gem5

Awesome Lists containing this project

README