{"id":23031992,"url":"https://github.com/uwrit/mush","last_synced_at":"2025-06-14T11:34:54.113Z","repository":{"id":57509836,"uuid":"173823422","full_name":"uwrit/mush","owner":"uwrit","description":"Clinical Note Processing Pipeline Driver and Framework","archived":false,"fork":false,"pushed_at":"2020-01-18T08:51:23.000Z","size":29,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-02T21:42:35.608Z","etag":null,"topics":["clinical-notes","clinical-research","go","nlp"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uwrit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-04T21:21:51.000Z","updated_at":"2021-09-30T22:01:49.000Z","dependencies_parsed_at":"2022-09-26T17:51:21.674Z","dependency_job_id":null,"html_url":"https://github.com/uwrit/mush","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/uwrit/mush","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uwrit%2Fmush","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uwrit%2Fmush/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uwrit%2Fmush/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uwrit%2Fmush/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uwrit","download_url":"https://codeload.github.com/uwrit/mush/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uwrit%2Fmush/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259809672,"owners_count":22914911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clinical-notes","clinical-research","go","nlp"],"created_at":"2024-12-15T15:49:14.244Z","updated_at":"2025-06-14T11:34:54.098Z","avatar_url":"https://github.com/uwrit.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mush\n\nMush is a pipeline driver and application framework for developing highly concurrent and performant clinical note processing pipelines. It acts as a root event loop for actuating a processing pipeline that may or may not rely on networked services.\n\nIt is used internally in UW Medicine Research IT's NLP infrastructure. It can be used to process a static set of notes or listen for notes as they become available.\n\n## Architecture\nMush composes three core pieces of functionality:\n- The `stream` package implements a streaming mechanism on top of any static store of clinical notes.\n- The `wp` package implements a configurable worker pool for concurrently processing notes into results.\n- The `sink` package implements a configurable concurrent IO sink for capturing results.\n\nThese packages are connected and managed by the root mush package via the Compose function.\n\n## Benchmarks\nTo date, we've seen mush sustain ~600,000 notes/hour, CPU 10%, \u003c70MB memory. This run was pointed at an API endpoint that de-identified the notes, the endpoint was capped at 80 TPS. The application was sharing a VM with the SQL Server instance it was using for storage. SQL Server used 90% CPU, 12GB memory.\n\n## Examples\nA prototypical mush application must define how to fetch a batch of notes, what to do with a single note, and how to store the results of a single note. Concretely, you must implement a `stream.BatchProvider`, a `wp.Runner`, `wp.Handler`, and a `sink.Writer`.\n\n#### `stream.BatchProvider`\nnio/reader.go\n```go\npackage nio\n\nimport (\n    \"context\"\n    \"database/sql\"\n    \"strconv\"\n\n    // driver\n    _ \"github.com/denisenkom/go-mssqldb\"\n    \"github.com/pkg/errors\"\n    \"github.com/uwrit/mush/note\"\n)\n\n// NewBatchProvider returns a new stream.BatchProvider for MS SQL Server.\nfunc NewBatchProvider(ctx context.Context, db *sql.DB) *MSSQLBatchProvider {\n    return \u0026MSSQLBatchProvider{ctx, db}\n}\n\n// MSSQLBatchProvider implements stream.BatchProvider for MS SQL Server.\ntype MSSQLBatchProvider struct {\n    ctx context.Context\n    db  *sql.DB\n}\n\n// Batch retrieves a batch of notes no greater than provided batchSize.\nfunc (m *MSSQLBatchProvider) Batch(batchSize int) ([]*note.Note, error) {\n    tx, err := m.db.BeginTx(m.ctx, nil)\n    if err != nil {\n        return nil, errors.Wrap(err, \"could not start transaction\")\n    }\n    rows, err := tx.QueryContext(m.ctx, \"exec dbo.sp_FetchNotes @p1\", batchSize)\n    if err != nil {\n        return nil, errors.Wrap(err, \"could not fetch note batch\")\n    }\n    notes := []*note.Note{}\n    for rows.Next() {\n        var id int\n        var text string\n        err = rows.Scan(\u0026id, \u0026text)\n        if err != nil {\n            tx.Rollback()\n            return nil, errors.Wrap(err, \"could not scan note row\")\n        }\n        notes = append(notes, note.New(id, text))\n    }\n    return notes, tx.Commit()\n}\n```\n\n#### `sink.Writer`\nnio/writer.go\n```go\npackage nio\n\nimport (\n    \"context\"\n    \"database/sql\"\n\n    // driver\n    _ \"github.com/denisenkom/go-mssqldb\"\n    \"github.com/pkg/errors\"\n    \"github.com/uwrit/mush/note\"\n)\n\n// StatusCodes ...\nconst (\n    ThrottledErr note.Status = -1\n    Success      note.Status = 1\n    EncodingErr  note.Status = 2\n    TooLongErr   note.Status = 3\n    ValidateErr  note.Status = 4\n    APIErr       note.Status = 5\n    MarshalErr   note.Status = 6\n)\n\n// NewWriter returns a new sink.Writer for MS SQL Server.\nfunc NewWriter(ctx context.Context, db *sql.DB) *MSSQLWriter {\n    return \u0026MSSQLWriter{ctx, db}\n}\n\n// MSSQLWriter implements sink.Writer for MS SQL Server.\ntype MSSQLWriter struct {\n    ctx context.Context\n    db  *sql.DB\n}\n\n// Write persists a note.Result to a SQL Server.\nfunc (w *MSSQLWriter) Write(r *note.Result) error {\n    tx, err := w.db.BeginTx(w.ctx, nil)\n    if err != nil {\n        return errors.Wrapf(err, \"could not start transaction to save result: %s\", r)\n    }\n    _, err = tx.ExecContext(w.ctx, \"exec dbo.sp_SaveResult @p1, @p2, @p3\", r.ID, r.Status, r.Body)\n    if err != nil {\n        return errors.Wrapf(err, \"could not save result: %s\", r)\n    }\n    return tx.Commit()\n}\n```\n\ncmd/main.go\n```go\npackage main\n\nimport (\n    \"context\"\n    \"database/sql\"\n    \"fmt\"\n    \"log\"\n    \"os\"\n    \"unicode/utf8\"\n\n    \"github.com/my-user/my-mush-impl/nio\"\n\n    \"github.com/uwrit/mush\"\n    \"github.com/uwrit/mush/note\"\n    \"github.com/uwrit/mush/utf\"\n\n    \"github.com/pkg/errors\"\n\n    _ \"github.com/denisenkom/go-mssqldb\"\n)\n\nconst databaseConnectionString = \"DEMO_MUSH_DBSTRING\"\nconst poolWorkerCount = \"DEMO_MUSH_WORKER_COUNT\"\n\nfunc init() {\n    log.SetOutput(os.Stdout)\n    log.SetFlags(log.Ldate | log.Ltime | log.Lmicroseconds | log.Lshortfile)\n}\n\nfunc main() {\n    log.Println(\"demonstrating mush...\")\n    ctx, _ := context.WithCancel(context.Background())\n    reader, writer := mustGetServices(ctx)\n    config := mush.Config {\n        StreamBatchSize: 100,\n        StreamWaterline: 40,\n        SinkWorkerCount: 4,\n    }\n\n    musher := mush.Compose(ctx, reader, run, handle, writer, config)\n    musher.Mush()\n\n    musher.Wait()\n    log.Println(\"done!\")\n}\n\nfunc mustGetServices(ctx context.Context) (mush.BatchProvider, mush.Writer) {\n    cstring := os.Getenv(databaseConnectionString)\n    if cstring == \"\" {\n        log.Fatalln(fmt.Sprintf(\"no connection string found in env var %s\", databaseConnectionString))\n    }\n    db, err := sql.Open(\"sqlserver\", cstring)\n    if err != nil {\n        log.Fatalln(fmt.Sprintf(\"could not open db pool: %s\", err))\n    }\n    return nio.NewBatchProvider(ctx, db), nio.NewWriter(ctx, db)\n}\n\n// wp.Runner\nfunc run(p *Pool) {\n\n    // Get total workers\n    workerstr := os.Getenv(poolWorkerCount)\n    if workers == \"\" {\n        log.Fatalln(fmt.Sprintf(\"no worker count found in env var %s\", poolWorkerCount))\n    }\n    workercount, err := strconv.Atoi(s)\n    if err != nil {\n        log.Fatal(fmt.Sprintf(\"worker count %s is not an integer\", workerstr))\n    }\n\n    // This loop is functionally the same as the default wp.DefaultRunner() but \n    // defined here as an example of how to do custom implementations.\n\tfor i := 0; i \u003c workercount; i++ {\n\t\tp.WaitGroup().Add(1)\n\t\tnum := i\n\t\tgo func() {\n\t\t\tlog.Println(\"worker\", num, \"starting up\")\n\t\t\tfor {\n\t\t\t\tselect {\n\t\t\t\tcase n, ok := \u003c-p.Incoming():\n\t\t\t\t\tif !ok {\n\t\t\t\t\t\tlog.Println(\"worker\", num, \"shutting down\")\n\t\t\t\t\t\tp.WaitGroup().Done()\n\t\t\t\t\t\treturn\n\t\t\t\t\t}\n\t\t\t\t\tlog.Println(\"worker\", num, \"received note\", n.ID)\n\t\t\t\t\tp.Results() \u003c- p.handler(n)\n\t\t\t\tcase \u003c-p.ctx.Done():\n\t\t\t\t\tlog.Println(\"worker\", num, \"shutting down\")\n\t\t\t\t\tp.WaitGroup().Done()\n\t\t\t\t\treturn\n\t\t\t\t}\n\t\t\t}\n\t\t}()\n\t}\n\tp.WaitGroup().Wait()\n\tlog.Println(\"worker pool shut down\")\n\tclose(p.Results())\n}\n\n// wp.Handler\nfunc handle(n *note.Note) *note.Result {\n    log.Println(fmt.Sprintf(\"processing note %d\", n.ID))\n    result := note.Result{\n        ID: n.ID,\n    }\n\n    set := func(status note.Status, e error) *note.Result {\n        result.Status = status\n        result.Err = e\n        if e != nil {\n            log.Println(\"note\", n.ID, \"processing failed:\", \u0026result)\n        }\n        return \u0026result\n    }\n\n    text := utf.EncodeUTF8(n.Text)\n\n    if !utf8.ValidString(text) {\n        return set(nio.EncodingErr, errors.New(\"note contains invalid utf8 characters\"))\n    }\n\n    if bl := len([]byte(text)); bl \u003e= 20000 {\n        return set(nio.TooLongErr, errors.Errorf(\"note too long: %d\", bl))\n    }\n\n    result.Body = \"{\\\"Entities\\\":[],\\\"UnmappedAttributes\\\":[]}\"\n    return set(nio.Success, nil)\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuwrit%2Fmush","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuwrit%2Fmush","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuwrit%2Fmush/lists"}