https://github.com/kawai125/NativeStringCollections
The toolset to parse text files using C# JobSystem on Unity.
https://github.com/kawai125/NativeStringCollections
Last synced: 7 months ago
JSON representation
The toolset to parse text files using C# JobSystem on Unity.
- Host: GitHub
- URL: https://github.com/kawai125/NativeStringCollections
- Owner: kawai125
- License: mit
- Created: 2020-11-02T08:55:18.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2022-09-14T08:51:28.000Z (about 3 years ago)
- Last Synced: 2024-11-10T17:47:06.543Z (about 1 year ago)
- Language: C#
- Homepage:
- Size: 7.29 MB
- Stars: 19
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- anything_about_game - NativeStringCollections
README
# NativeStringCollections
## Introduction
The toolset to parse generic text files using C# JobSystem and Burst on Unity.
解説記事(日本語)はこちら:
[JobSystem編](https://qiita.com/kawai125/items/13390f25700dd89c0f2e)
[Burst編](https://qiita.com/kawai125/items/540dd8e5d2b4c7c1fa3b)
## Environment
This library was tested in the below system.
- Unity 2019.4.35f1
- Collections 0.9.0-preview.6
- Burst 1.4.11
- Unity 2020.3.25f1
- Collections 0.17.0 or 1.1
- Burst 1.6.3
## Git Path for Unity Package Manager
[https://github.com/kawai125/NativeStringCollections.git?path=Assets/NativeStringCollections](https://github.com/kawai125/NativeStringCollections.git?path=Assets/NativeStringCollections)
## Demo scene
- single file & single data user demo:
`/Assets/NativeStringCollections/Samples/Demo/Scenes/Demo_ReadingLargeFile.unity`
- multiple files & multiple data users demo:
`/Assets/NativeStringCollections/Samples/Demo/Scenes/Demo_AsyncMultiFileManagement.unity`
## Performance
Target file: 500k charactors (with comment lines and Base64 encoded external data, total 590k lines & 37MB size).
The sample file generator was implemented in demo scenes.
Measured environment:
- Windows10
- Ryzen5 3600X
- GTX 1070
- NVMe SSD (PCIe Gen3 x4)
(1) Single file loading performance:
|condition|Time [ms]|remarks|
|:-|-:|:-|
|C# standard: `File.ReadAllLines()`|710 ~ 850||
|`ITextFileParser` (without Burst)|540 ~ 550|disable `csFastFloat`|
|`ITextFileParser` (without Burst)|450 ~ 460||
|`ITextFileParser` (with Burst)|220 ~ 240|disable `csFastFloat`|
|`ITextFileParser` (with Burst)|150 ~ 170||
(2) Parallel file loading performance:
|# of Parallel Jobs|Time [ms] with Burst|Time [ms] without Burst|
|-:|-:|-:|
|1|150 ~ 170|440 ~ 480|
|2|155 ~ 170|450 ~ 500|
|3|165 ~ 190|480 ~ 540|
|4|170 ~ 200|500 ~ 580|
|6|190 ~ 250|550 ~ 650|
|8|200 ~ 300|600 ~ 700|
This library uses the [csFastFloat](https://github.com/CarlVerret/csFastFloat) to parse float and double internally.
The internal csFastFloat is modified version for compatibility with Burst compiler.
If you want to disable the csFastFloat, Add the macro of `DISABLE_CS_FAST_FLOAT`.
## Usage
```C#
using NativeStringCollections;
public class TextData : ITextFileParser
{
NativeList Data;
public void Init()
{
/* initialize class. called at once after new(). */
/* managed types can be used here because this function called in main thread. */
}
public void Clear()
{
/* preparing to parse. called at once before start calling ParseLines(). */
}
public bool ParseLines(NativeStringList lines)
{
for(int i=0; i reader;
void Start() { reader = new AsyncTextFileReader(Allocator.Persistent); }
void OnClickLoadFile()
{
// ordering to Load file. (give Encoding if necessarily)
reader.Encoding = Encoding.UTF8;
reader.LoadFile(path);
}
void Update()
{
// it can display progress. (Read, Length field is int by BlockSize unit)
var info = reader.GetState
float progress = (float)info.Read / info.Length;
// call Complete() when the job finished.
if(reader.JobState == ReadJobState.WaitForCallingComplete)
{
reader.Complete();
// it can display the elapsed time [ms] for loading file.
double delay = reader.GetState.Delay;
Debug.Log($" file loading completed. time = {delay.ToString("F2")} [ms].");
// something to do with loaded data.
var data = reader.Data;
}
}
void OnDestroy()
{
// calling Dispose() of data class. there are no ordering to dispose reader and loaded data.
var data = reader.Data;
reader.Dispose();
data.Dispose();
}
}
```
More detailed sample for `ITextFileParser.ParseLines()` is shown in below.
```C#
using NativeStringCollections;
public class TextData : ITextFileParser
{
public NativeList Data;
private NativeStringList mark_list;
private StringEntity check_mark;
// if you want to use string in parse process,
// add string into NativeStringList or NativeList in Init().
public void Init()
{
Data = new NativeList(Allocator.Persistent);
mark_list = new NativeStringList(Allocator.Persistent);
mark_list.Add("STRONG");
mark_list.Add("Normal")
// to pick StringEntity from NativeStringList, it must be after all data were inputed into NativeStringList.
// (or set Capacity enough to large to contain all elements at first.)
// if you access StringEntity after buffer reallocating in NativeStringList, it causes crash by invalid memory access.
check_mark = mark_list[0];
}
// LF and CR were parsed and the results were input into NativeStringList.
// use it as similar to List.
public bool ParseLines(NativeStringList lines)
{
bool continueRead = true;
for(int i=0; i(Allocator.Temp);
// in the case of input line = "CharaName_STRONG,11,64,15.7,1.295e+3" as CSV,
// you can parse as shown in below.
line.split(',', str_list);
var name = str_list[0];
bool success = true;
success = success && str_list[1].TryParse(out long ID);
success = success && str_list[2].TryParse(out int HP);
success = success && str_list[3].TryParse(out float Attack);
success = success && str_list[4].TryParse(out double Speed);
int mark_index = name.IndexOf(check_mark); // search "STRONG" in `name`
if(mark_index >= 0)
{
/* specified treat for "STRONG" charactor. */
}
str_list.Dispose()
// check to parse the line correctly or not.
if(!success)
return false; // failed to parse. abort.
Data.Add(new DataElement(ID, HP, Attack, Speed));
return true; // success to parse. go next line.
}
}
```
## Usage for Burst optimization
In this library, the UInt16 based struct `Char16` is used instead of `System.Char` .
Thus, you can use [Burst function pointers](https://docs.unity3d.com/Packages/com.unity.burst@1.4/manual/docs/AdvancedUsages.html#function-pointers) to optimize your `ITextFileParser` class.
If you use Burst 1.6 or later, you can use [Burst direct-call.](https://docs.unity3d.com/Packages/com.unity.burst@1.6/manual/docs/CSharpLanguageSupport_Lang.html#directly-calling-burst-compiled-code)
It is easier to write code than Burst function pointers and has same features.
(see `/Assets/NativeStringCollections/Samples/Scripts/CharaDataParser.cs` for sample.)
## API
### ▽Namespace
All implementations are written in the namespace `NativeStringCollections`
### ▽Job scheduler
```C#
class AsyncTextFileReader // for single file
class AsyncTextFileLoader // for multi files and users
where T : class, ITextFileParser, new()
```
These classes can accept `class System.Text.Encoding` to decode byte stream into chars.
The `ITextFileParser` is defined as below.
```C#
namespace NativeStringCollections
{
public interface ITextFileParser
{
///
/// called once at the first in main thread (you can use managed object in this function).
///
void Init();
///
/// called every time at first on reading file.
///
void Clear();
///
/// when you returned 'false', the AsyncTextFileLoader discontinue calling the 'ParseLines()'
/// and jump to calling 'PostReadProc()'.
///
///
/// continue reading lines or not.
bool ParseLines(NativeStringList lines);
///
/// called every time at last on reading file.
///
void PostReadProc();
///
/// called when the AsyncTextFileLoader.UnLoadFile(index) function was called.
///
void UnLoad();
}
}
```
### ▽String like NativeContainer
```C#
struct NativeStringList
struct StringEntity
struct ReadOnlyStringEntity
```
The `NativeStringList` is a jagged array container similar to `List`, using `NativeList` internally.
`StringEntity` and `ReadOnlyStringEntity` are the slice view of `NativeStringList`.
**Note:** Because of reallocation of internal buffer, calling `NativeStringList.Add()` function makes `StringEntity` to invalid reference.
(The tracer system is also implemented on the macro `ENABLE_UNITY_COLLECTIONS_CHECKS`.)
### ▽Parse functions
```C#
bool StringEntity.TryParse(out T value)
bool StringEntity.TryParseHex(out T value)
where T : int32, int64, float32, or float64
```
The conversion accuracy compared with `System.T.Parse()` is shown in below:
|type|relative error|
|:--|:--|
|int32, int64, and float32| no differ |
|float64| < 1.0e-15 |
|(Hex input)|no differ|
Tested in thousands random strings.
(see `/Assets/NativeStringCollections/Tests/EditMode/Editor/Test_StringParser.cs` for more details.)
The converters between Base64 encoded string and byte stream are also available.
```C#
struct NativeBase64Encoder
struct NativeBase64Decoder
```
### ▽manipulation functions
```C#
// Split()
var split_result = new NativeList(Allocator.Temp);
StringEntity.Split(delim, split_result); // delim: a char or StringEntity.
StringEntity.Split(split_result); // split by Char.IsWhiteSpace()
// Strip(), Lstrip() and Rstrip()
StringEntity strip_result = StringEntity.Strip(delim); // delim: a char or StringEntity.
StringEntity strip_result = StringEntity.Strip(); // strip for Char.IsWhiteSpace() in both side.
// Slice()
StringEntity slice_result = StringEntity.Slice(begin, end);
```
These modification functions are available.
These functions generate new `StringEntity` as new slice.
```C#
// search index from first
int index = StringEntity.IndexOf(char);
int index = StringEntity.IndexOf(string);
int index = StringEntity.IndexOf(StringEntity);
int index = NativeStringList.IndexOf(StringEntity);
// search index from last
int index = StringEntity.LastIndexOf(char);
int index = StringEntity.LastIndexOf(string);
int index = StringEntity.LastIndexOf(StringEntity);
int index = NativeStringList.LastIndexOf(StringEntity);
```
The `IndexOf()` function is available.
### ▽Utility for using Burst Function Pointers
In Burst Function Pointers, typical NativeContainer cannot be used because their safety system is implemented with managed object.
To workaround this probrem, UnsafeReference utility structs and functions are provided.
```C#
using NativeStringCollections;
using NativeStringCollections.Utility;
// for container
UnsafeRefToNativeList
ref_to_native_list = NativeList.GetUnsafeRef();
UnsafeRefToNativeStringList
ref_to_native_string_list = NativeStringList.GetUnsafeRef();
UnsafeRefToNativeJaggedList
ref_to_native_jagged_list = NativeJaggedList.GetUnsafeRef();
// for Base64 converter
UnsafeRefToNativeBase64Encoder
ref_to_base64_encoder = NativeBase64Encoder.GetUnsafeRef();
UnsafeRefToNativeBase64Decoder
ref_to_base64_decoder = NativeBase64Decoder.GetUnsafeRef();
```
Unfortunately, the other than `NativeList` has no accessor to internal unsafe container.
If you want to use these container such as `NativeHashMap` or `NativeQueue`,
you have to use compatible unsafe container such as `UnsafeHashMap` or `UnsafeRingQueue` and copy data before/after calling Burst function pointers.
### ▽Utility for Sorting file paths
The sort function which treats a digits part as an integer and sorts naturally is provided.
```C#
string[] paths = { /* paths */ };
var sorted_paths = new List();
// sort file paths naturally
FilePathUtility.Sort(paths, sorted_paths);
// sort & filtering paths
string filter = directly + "/file_00.dat"; // digit part "00" is treated as a place holder for integer value.
FilePathUtility.Sort(paths, filter, sorted_paths);
```
(see `/Assets/NativeStringCollections/Tests/EditMode/Editor/Test_SortFilePaths.cs` for more details.)
### ▽Debug mode
```C#
var reader = new AsyncTextFileReader(Allocator.Persistent);
var loader = new AsyncTextFileLoader(Allocator.Persistent);
reader.LoadFileInMainThread(path);
loader.AddFile(new List{path_1, path_2, path_3});
loader.LoadFileInMainThread(0); // index = 0: load path_1.
```
When use the function `LoadFileInMainThread()`, all functions are processed in the main thread.
In this condition, managed objects such as `(obj).ToString()`, `StringBuilder`, and `Debug.Log()` can be used
in `Clear()`, `ParseLine()`, `PostReadProc()`, and `UnLoad()` functions of `ITextFileParser`.