{"id":47235410,"url":"https://github.com/aliengiraffe/deidentify","last_synced_at":"2026-03-13T22:13:01.578Z","repository":{"id":294748773,"uuid":"987323216","full_name":"aliengiraffe/deidentify","owner":"aliengiraffe","description":"Simple yet powerful tool for identifying and anonymizing personal information in various formats.","archived":false,"fork":false,"pushed_at":"2026-02-11T04:58:16.000Z","size":89,"stargazers_count":27,"open_issues_count":7,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-11T10:28:15.767Z","etag":null,"topics":["anonymization","compliance","data-anonymization","data-masking","data-privacy","data-protection","data-security","deidentification","gdpr","go","golang","llm","pii","pii-detection","privacy","privacy-tools","redaction","security-tools","sensitive-data","zero-dependency"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aliengiraffe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":"deidentify.go","publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-20T23:04:22.000Z","updated_at":"2026-02-10T18:00:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"58938e92-772d-4f42-bd6f-5ef1e7ce932b","html_url":"https://github.com/aliengiraffe/deidentify","commit_stats":null,"previous_names":["aliengiraffe/deidentify"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/aliengiraffe/deidentify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aliengiraffe%2Fdeidentify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aliengiraffe%2Fdeidentify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aliengiraffe%2Fdeidentify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aliengiraffe%2Fdeidentify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aliengiraffe","download_url":"https://codeload.github.com/aliengiraffe/deidentify/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aliengiraffe%2Fdeidentify/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30477523,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-13T20:45:58.186Z","status":"ssl_error","status_checked_at":"2026-03-13T20:45:20.133Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anonymization","compliance","data-anonymization","data-masking","data-privacy","data-protection","data-security","deidentification","gdpr","go","golang","llm","pii","pii-detection","privacy","privacy-tools","redaction","security-tools","sensitive-data","zero-dependency"],"created_at":"2026-03-13T22:13:00.858Z","updated_at":"2026-03-13T22:13:01.536Z","avatar_url":"https://github.com/aliengiraffe.png","language":"Go","readme":"# Deidentify\n\n![Version](https://img.shields.io/github/v/release/aliengiraffe/deidentify.svg)\n[![Go Report Card](https://goreportcard.com/badge/github.com/aliengiraffe/deidentify?1=2)](https://goreportcard.com/report/github.com/aliengiraffe/deidentify)\n[![GoDoc](https://godoc.org/github.com/aliengiraffe/deidentify?status.svg)](https://godoc.org/github.com/aliengiraffe/deidentify)\n[![License](https://img.shields.io/github/license/aliengiraffe/deidentify.svg?1=1)](LICENSE)\n\n![Release](https://github.com/aliengiraffe/deidentify/actions/workflows/release.yml/badge.svg)\n\nA Go library for detecting and removing personally identifiable information (PII) from text and structured data.\n\n## Overview\n\n`deidentify` is an open source Go package created by AlienGiraffe, Inc. that provides simple yet powerful tools for identifying and anonymizing personal information in various formats. It preserves data utility while protecting privacy through consistent, deterministic replacements.\n\n## Features\n\n- **Multiple PII types support**: Emails, phone numbers, SSNs, credit cards, names, and addresses\n- **Format preservation**: Maintains the original data format for better usability  \n- **Deterministic replacements**: Same inputs produce the same outputs for referential integrity\n- **Context awareness**: Uses column names as context to prevent correlation\n- **Table processing**: Handles structured data with type-aware deidentification\n- **Thread-safe**: Suitable for concurrent processing\n\n## Installation\n\n```bash\ngo get github.com/aliengiraffe/deidentify\n```\n\n## Usage\n\n### Basic Example\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"log\"\n    \n    \"github.com/aliengiraffe/deidentify\"\n)\n\nfunc main() {\n    // Generate a secure secret key (or provide your own)\n    secretKey, err := deidentify.GenerateSecretKey()\n    if err != nil {\n        log.Fatal(\"Failed to generate secret key:\", err)\n    }\n    \n    // Create a deidentifier instance\n    d := deidentify.NewDeidentifier(secretKey)\n    \n    // Deidentify text containing PII\n    text := `Contact Frodo Baggins at frodo.baggins@shire.me or (555) 123-4567.\nHis SSN is 123-45-6789 and he lives at 1 Bagshot Row, Hobbiton.`\n\n    redacted, err := d.Text(text)\n    if err != nil {\n        log.Fatal(\"Failed to deidentify text:\", err)\n    }\n    \n    fmt.Println(redacted)\n    // Output example:\n    // Contact Taylor Miller at member4921@demo.co or (555) 642-8317.\n    // His SSN is 304-51-9872 and he lives at 2845 Oak Ave.\n}\n```\n\n### Processing Structured Data\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"log\"\n    \n    \"github.com/aliengiraffe/deidentify\"\n)\n\nfunc main() {\n    secretKey, err := deidentify.GenerateSecretKey()\n    if err != nil {\n        log.Fatal(\"Failed to generate secret key:\", err)\n    }\n    \n    d := deidentify.NewDeidentifier(secretKey)\n    \n    // Create a table with PII data\n    table := \u0026deidentify.Table{\n        Columns: []deidentify.Column{\n            {\n                Name:     \"customer_name\",\n                DataType: deidentify.TypeName,\n                Values:   []interface{}{\"Gandalf Grey\", \"Aragorn Strider\", nil},\n            },\n            {\n                Name:     \"email\",\n                DataType: deidentify.TypeEmail,\n                Values:   []interface{}{\"mithrandir@wizard.com\", \"ranger@gondor.me\", \"\"},\n            },\n        },\n    }\n    \n    // Deidentify the table\n    result, err := d.Table(table)\n    if err != nil {\n        log.Fatal(\"Failed to deidentify table:\", err)\n    }\n    \n    // Process the result\n    for i, col := range result.Columns {\n        fmt.Printf(\"Column: %s\\n\", col.Name)\n        for j, val := range col.Values {\n            fmt.Printf(\"  [%d]: %v\\n\", j, val)\n        }\n    }\n}\n```\n\n### Processing Slice Data\n\n```go\n// Deidentify [][]string data (CSV-like format)\ndata := [][]string{\n    {\"Alice Johnson\", \"alice@example.com\", \"555-123-4567\"},\n    {\"Bob Smith\", \"bob@company.org\", \"(555) 987-6543\"},\n}\n\n// Option 1: Automatic type inference (recommended)\nresult, err := d.Slices(data)\nif err != nil {\n    log.Fatal(\"Failed to deidentify:\", err)\n}\n// Types are automatically detected: Name, Email, Phone\n// Result: [[\"Taylor Miller\", \"user4921@demo.co\", \"555-642-8317\"], ...]\n\n// Option 2: Explicit column types only\ncolumnTypes := []deidentify.DataType{deidentify.TypeName, deidentify.TypeEmail, deidentify.TypePhone}\nresult, err = d.Slices(data, columnTypes)\n\n// Option 3: Both explicit types and custom column names\ncolumnNames := []string{\"customer_name\", \"customer_email\", \"customer_phone\"}\nresult, err = d.Slices(data, columnTypes, columnNames)\n```\n\n## More Examples\n\nSee the [examples](./examples) directory for comprehensive usage patterns:\n\n- [Basic usage](./examples/basic/main.go): Simple text deidentification\n- [Table processing](./examples/table/main.go): Structured data with multiple columns and types  \n- [Slice processing](./examples/slices/main.go): CSV-like data processing with [][]string\n- [International address handling](./examples/international/main.go): Support for addresses across different regions\n\n## Configuration\n\nThe `deidentify` package uses a deterministic approach for consistency. The secret key provides the randomness source, making the anonymization both reproducible and secure.\n\n## Supported PII Types\n\n| PII Type     | Description                 | Example Input                | Example Output            |\n|--------------|-----------------------------|-----------------------------|---------------------------|\n| TypeName     | Personal names              | Bilbo Baggins               | Taylor Miller             |\n| TypeEmail    | Email addresses             | bilbo@bag-end.shire         | user4921@demo.co          |\n| TypePhone    | Phone numbers               | (555) 123-4567              | (555) 642-8317            |\n| TypeSSN      | Social Security Numbers     | 123-45-6789                 | 304-51-9872               |\n| TypeCreditCard| Credit card numbers        | 4111-1111-1111-1111         | 4000 8521 7694 3217       |\n| TypeAddress  | Street addresses            | Bag End, Bagshot Row        | 2845 Oak Ave              |\n\n## Security\n\nWhile this library aims to detect common PII patterns, no automated system can guarantee 100% detection. Always verify the results in sensitive applications.\n\nNote: By default, the library preserves area codes in phone numbers for better usability, as they often indicate geographic regions rather than individuals. Consider your specific requirements when implementing.\n\n## Data Variety\n\nThe library provides rich anonymization with:\n\n- 110+ gender-neutral first names\n- 130+ diverse last names\n- 105+ fictional email domains\n- 100+ email username patterns\n- 120+ street name variations with international formats\n\nThis extensive variety of replacement options enhances privacy by increasing the anonymization space and reducing the likelihood of pattern recognition.\n\n## International Support\n\nThe library includes support for international address formats:\n\n- North American: US and Canadian style addresses\n- European: UK, French, German, Italian, Spanish, etc.\n- Asian: Japanese, Chinese, Southeast Asian formats\n- Middle Eastern and global formats\n\nThe detection patterns have been optimized to recognize common address structures across different languages and regional conventions, while the anonymization preserves format and readability.\n\n## Releases\n\n### Creating a New Release\n\nThe library uses GitHub Actions to automate the release process. To create a new release:\n\n1. Update your code and commit all changes\n2. Create and push a new tag with semantic versioning format:\n   ```bash\n   git tag v1.0.0\n   git push origin v1.0.0\n   ```\n3. The GitHub Actions workflow will automatically:\n   - Run tests to ensure everything works\n   - Generate a changelog based on commits since the last tag\n   - Create a GitHub release with documentation\n   - Publish the new version to the Go module proxy\n\nThis makes the new version immediately available for users to install via `go get github.com/aliengiraffe/deidentify@v1.0.0`.\n\n## Performance\n\nTo run performance benchmarks:\n\n```bash\n# Run all benchmarks\ngo test -bench=. -benchtime=10s\n\n# Run only the paragraph deidentification benchmark\ngo test -bench=BenchmarkParagraphDeidentification -benchtime=1x\n\n# Run benchmarks with memory allocation stats\ngo test -bench=. -benchmem\n\n# Run parallel benchmarks to test concurrent performance\ngo test -bench=BenchmarkParagraphDeidentificationParallel\n```\n\n### CPU and Memory Profiling with pprof\n\nFor detailed performance analysis, you can use [pprof](https://github.com/google/pprof) to profile CPU usage and memory allocations:\n\n```bash\n# Generate CPU profile\ngo test -bench=BenchmarkParagraphDeidentification -cpuprofile=cpu.prof -benchtime=10s\n\n# Generate memory profile\ngo test -bench=BenchmarkParagraphDeidentification -memprofile=mem.prof -benchtime=10s\n\n# Analyze CPU profile in terminal\ngo tool pprof cpu.prof\n# Then use interactive commands like 'top', 'list', 'web'\n\n# Analyze memory profile in terminal\ngo tool pprof mem.prof\n```\n\n#### Interactive Web UI\n\nThe most powerful way to analyze profiles is using pprof's built-in web server, which provides an interactive visualization:\n\n```bash\n# Start interactive web UI for CPU profile (opens browser automatically)\ngo tool pprof -http=:8080 cpu.prof\n\n# Start interactive web UI for memory profile on different port\ngo tool pprof -http=:8081 mem.prof\n\n# If browser doesn't open automatically, navigate to:\n# http://localhost:8080 (for CPU)\n# http://localhost:8081 (for memory)\n```\n\nThe web UI provides:\n- **Flame Graph**: Interactive flame graph showing call stack and CPU/memory usage\n- **Graph View**: Call graph with edges showing relationships and costs\n- **Top View**: Sorted list of functions by resource consumption\n- **Source View**: Line-by-line annotation of source code with costs\n- **Peek View**: Shows callers and callees of selected functions\n- **Disassembly View**: Assembly-level analysis\n\n#### Advanced Analysis\n\n```bash\n# Focus on specific functions (e.g., deidentify package)\ngo tool pprof -focus=deidentify cpu.prof\n\n# Compare two profiles (e.g., before and after optimization)\ngo tool pprof -base=cpu_before.prof cpu_after.prof\n\n# Generate a PDF report (requires graphviz)\ngo tool pprof -pdf cpu.prof \u003e cpu_profile.pdf\n\n# Filter by specific time range or samples\ngo tool pprof -show_from=Text -show=deidentify cpu.prof\n```\n\n#### Automated Profiling\n\nFor convenience, use the included profiling script:\n\n```bash\n./scripts/profile-benchmarks.sh\n```\n\nThis script will:\n- Run benchmarks with CPU and memory profiling\n- Generate text reports (top consumers, full profiles)\n- Create visual graphs (SVG/PNG) if graphviz is installed\n- Save all artifacts in the `profiles/` directory\n\n#### CI/CD Integration\n\nPull requests automatically generate profiling reports through GitHub Actions. The workflow:\n- Runs benchmarks with CPU and memory profiling\n- Generates pprof reports and visualizations\n- Posts a summary comment on the PR with key metrics\n- Uploads full profiling artifacts for download\n\nThe benchmarks measure the time to deidentify paragraphs containing various types of PII. On modern hardware, the library can process over 600 paragraphs per second with an average processing time of ~1.5ms per paragraph.\n\n## Contributing\n\nContributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for detailed information on how to contribute to this project.\n\n**Quick start for contributors:**\n\n1. Fork the repository and clone your fork\n2. Set up the development environment:\n   ```bash\n   ./scripts/setup-pre-commit-hook.sh\n   go mod download\n   ```\n3. Create your feature branch (`git checkout -b feature/amazing-feature`)\n4. Make your changes and ensure tests pass (`go test ./...`)\n5. Commit your changes (pre-commit hook will format code automatically)\n6. Push to your fork and submit a Pull Request\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines on code standards, testing, and the development workflow.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## About\n\nCreated and maintained by [AlienGiraffe, Inc.](https://github.com/aliengiraffe)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faliengiraffe%2Fdeidentify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faliengiraffe%2Fdeidentify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faliengiraffe%2Fdeidentify/lists"}