{"id":24154897,"url":"https://github.com/mikayl/spreadsheet-importer","last_synced_at":"2025-03-01T22:28:19.834Z","repository":{"id":48924604,"uuid":"377904679","full_name":"Mikayl/spreadsheet-importer","owner":"Mikayl","description":"Java library aiming to make importing and validating excel contents easy and safer","archived":false,"fork":false,"pushed_at":"2024-01-02T19:43:30.000Z","size":1805,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-12T12:33:05.707Z","etag":null,"topics":["annotation-processing","annotation-processor","excel","excel-import","importer","importing","java","java-8","library","spreadsheet"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mikayl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-17T17:03:00.000Z","updated_at":"2021-10-07T11:13:31.000Z","dependencies_parsed_at":"2025-01-12T12:29:18.687Z","dependency_job_id":"6ab8700e-074d-4334-a89d-905e71a6307c","html_url":"https://github.com/Mikayl/spreadsheet-importer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mikayl%2Fspreadsheet-importer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mikayl%2Fspreadsheet-importer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mikayl%2Fspreadsheet-importer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mikayl%2Fspreadsheet-importer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mikayl","download_url":"https://codeload.github.com/Mikayl/spreadsheet-importer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241435158,"owners_count":19962399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotation-processing","annotation-processor","excel","excel-import","importer","importing","java","java-8","library","spreadsheet"],"created_at":"2025-01-12T12:26:37.771Z","updated_at":"2025-03-01T22:28:19.809Z","avatar_url":"https://github.com/Mikayl.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Project logo](logo.png)\n\nAbout this\n===========\n**Spreadsheet Importer** is a **library** that was born out of my perceived need to have an easy way of not only\nimporting well formatted and prepared data but also of validating the contents of the file. The reality is that often\ntimes the source of a spreadsheet may lack technical or even business knowledge and expecting such a source to produce a\nperfect file is unreasonable. Using this library you *should* have a better chance of catching problems while also\nspending less time writing code and thinking of all possible combination of factors (of which there are many).\n\nAfter you parse a file you can access all the problems, predefined in the library or business specific and then return\nthem to the user. No more generic messages like \"The file is invalid\" when you have thousands of rows and no idea what's\nthe problem. Using this library you will be able to return a list of specific problems down to a single row and column.\nThe library is designed to allow any developer an easy way of adding custom validations or business specific logic.\n\nYou as a programmer are free to treat the imported data as you see fit. You can make it a best-effort and take only the\nvalid rows or ignore everything if there's even a single validation problem. You can even consume the rows as they are\nbeing imported.\n\nThe library uses **annotation processing**. This means that most of the magic happens at **compile time**. This gives a\nbetter runtime performance and is also typesafe.\n\nOn top of standard importing functionalities, this also provides support for **dynamic column ordering**, **regex column\nand spreadsheet recognition**, **metadata injection** (like spreadsheet name/index or grouping all unmatched columns\ninside a map) as well as **many other features**.\n\n![GIF showing a short presentation of how the library can be used](presentation.gif)\n\nFeatures\n===========\n\n- Based on **annotation processing**, the magic happens at compile time increasing **runtime performance** and being **\n  type-safe**;\n- Works both with ```xlsx``` and ```xls```;\n- Importing any of the following: String, Boolean, Byte, Short, Integer, Long, Float, Double, Enums, LocalDateTime,\n  LocalDate, LocalTime as well as custom classes having a public constructor accepting a String;\n- \"Importing\" extra data such as the sheet name, sheet index, row number, the index of the row or importing all the\n  columns that were not otherwise matched;\n- Importing **formula results**;\n- **Ordinal imports** (based on the index of the row/sheet, header is optional);\n- **Named imports** (based on the name of the column/sheet or a regex);\n- Importing multiple columns(that match a regex) as a list;\n- Column presence validation (min/max number of times a column can be found);\n- **Dynamic column ordering**(no more columns must be in this exact order for this to work);\n- **Predefined validations** down to the column level: not null, matches regex, are formulas allowed;\n- Custom **static validations** down to the row level;\n- Custom **dynamic validations** down to the row level (can be used to check against a database or against an external\n  system);\n- Real-time **hooks** with for the mapped rows (valid/invalid/all, including the validation problems for each one);\n- Allows **processing** a cells value before it is imported allowing you to change things such as regional\n  particularities or other specific things;\n- Access valid/invalid rows, validation problems for a specific row or all rows that have a specific validation problem;\n- Ability to return to the user of list of problems, so they can know what is wrong and correct the problems;\n- Ability to **customize those messages** (like in other languages);\n- **IoC** compatible (can be integrated and tested/mocked inside an **IoC** environment like **Spring**);\n- Most functionalities should be covered by integration tests;\n\n**Planned (possibly)**:\n\n- Being able to split column values into a list of basic java types;\n- Don't store state, use only the hooks to allow large file imports (using ```com.monitorjbl.xlsx-streamer```);\n- Support password-protected files;\n- Min, Max, Length validations?;\n- Injecting a map/list of the columns/field names that are invalid;\n\nInstalation\n===========\nFrom **Maven central**:\n```\n    \u003cdependency\u003e\n        \u003cgroupId\u003ero.nom.vmt\u003c/groupId\u003e\n        \u003cartifactId\u003espreadsheet-importer\u003c/artifactId\u003e\n        \u003cversion\u003e0.1.0\u003c/version\u003e\n    \u003c/dependency\u003e    \n```\n\nHow to use\n===========\n\n1. Create the class that matches the data you want to import:\n\n```\n// getters \u0026 setters are ommited\n\npublic class Employee{  \n    private String firstName;      \n    private String lastName;\n    private GenderEnum gender;\n    private Byte age;\n    private Email email;\n    private String phone;\n    private Integer salary;\n    private Boolean isMarried;\n    List\u003cString\u003e bonuses;\n    LocalDateTime lastLogin;\n    LocalDate hiredOn;\n    LocalTime startsWorkAt;\n    LocalDateTime passwordExpires;\n}\n```\n\n2. Annotate the class using ```@Importable```\n\n| Options ```@Importable``` |   Type   | Default | Explanation                                                                                                                                                                  |\n| ------------------------- | :------: | :-----: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n| isNamed                   | boolean  |  false  | Whether the fields inside the class will be imported based on the column names or column positions                                                                           |\n| hasHeader                 | boolean  |  true   | If ```isNamed``` is ```false``` this is used to specify if the excel rows include a header or not. If ```true``` then a header must be present                               |\n| sheetIndexes              | int[]    |   {0}   | This specifies the indexes of the sheets that contain the data. By default it will import only from the first sheet                                                          |\n| sheetNames                | String[] |   { }   | This specifies the name(s) or **regular expresion(s)** used to determine the sheets to import data from. If this is set the *default* value of ```sheetIndexes``` is ignored |\n\nE.g.:\n\n```\n   @Importable(isNamed = true, sheetNames = {\"London.*\"}) //will use the column names and will import from all sheets mathing the regex (e.g.: \"London E\" and \"London W\")\n   public class Employee{\n```\n\n3. Annotate the fields you want to import using ```@Import``` and ```@Named``` or ```@Ordinal``` depending on the value\n   you used for ```isNamed```. (You can't combine both named and ordinal imports inside the same class)\n\nYou can import the following:\n\n* **Standard java types**: ```String```, ```Byte```, ```Short```, ```Integer```, ```Long```, ```Float```, ```Double```\n  , ```Boolean```[^1], ```LocalDateTime```, ```LocalDate```, ```LocalTime```;\n* **Enums** (using the ```valueOf``` method of the enum class; If no enum exists for the string the value and row will\n  be considered invalid; You could use ```@PreProcess``` combined with ```DefaultEnumProcessor``` on enum fields to\n  better match the fields);\n* **List of standard java types** (if you use named imports you can get a list with all the values from the columns\n  matching the name);\n* **Map** with the value a standard java type and the key either an Integer or a String. The keys are the index/names of\n  the columns matching the regex and the value is the content found in the respective column;\n* **Any other class** having a public constructor taking a String as a parameter. To pass back specific validation\n  errors throw an ```InstantiationProblem```. If this or any other exceptions are thrown the value and row will be\n  considered invalid;\n\n| Options ```@Import``` |  Type                                 | Default | Explanation                                                                                                                                                                                                                                                                                                                                                                                                                         |\n| --------------------- | :-----------------------------------: | :-----: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |\n| required              | boolean                               |  false  | Whether the **value** associated with this field must be present in order for it and the row to be considered **valid**. Empty strings (```String::isEmpty```) are interpreted as null since a normal user editing an excel cannot really differentiate between the 2 cases                                                                                                                                                         |\n| trim                  | boolean                               |  true   | If set to ```true``` then the input is trimmed of spaces (uses ```String::trim```). This happens after ```@PreProcess``` and before the **regex** check                                                                                                                                                                                                                                                                             |\n| formulaAllowed        | boolean                               |  true   | If formulas can be present in the cell and if yes, they will be evaluated and then the result will continue as normal. When used with Temporal fields support may be only partial especially if used with ```matches```                                                                                                                                                                                                             |\n| matches               | String                                |   \"\"    | Specifies the regular expression that a column must match in order for the value and row to be considered valid (uses ```String::matches```). This is checked **after** the optional **trim** and optional ```@PreProcess```. \u003cbr /\u003eFor temporal fields (```LocalDateTime```, ```LocalDate``` and ```LocalTime```) this column specifies the format to pass to ```DateTimeFormatter.ofPattern``` if the column is a string in excel |\n| preProcess            | Class\u003c? extends ColumnPreProcessor\u003e[] |   {}    | The class for an implementation of the ColumnPreProcessor interface. This will be used to process the string value if needed. You can remove special characters, replace letters or any other operation. You can use multiple processors but generally one will probably be enough. If multiple processors are used, they are applied sequentially in the order they are given.                                                     |\n\n| Options ```@Named``` |  Type  | Default | Explanation                                                                                                                                                                  |\n| -------------------- | :----: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |\n| value                | String |         | The name or regular expression to look for in the header                                                                                                                     |\n| minimumMatches       |   int  |    1    | The minimum number of times that columns matching the name must be found in order for the file to be valid                                                                   |\n| maximumMatches       |   int  |    1    | The maximum number of times that columns matching the name must be found in order for the file to be valid. Can't be 0 and if greater than 1, the field must be a collection |\n\n| Options ```@Ordinal``` | Type  | Default | Explanation                                                              |\n| ---------------------- | :---: | :-----: | :----------------------------------------------------------------------: |\n| value                  |  int  |         | The index of the column this field should be imported from (starts at 0) |\n\nE.g.:\n\n```\n@Ordinal(4)\n@Import(required = true, matches = EMAIL_REGEX)\nprivate Email email; //this is a custom class with a public constructor taking a String\n```\n\n```\n@Named(\"Gender\")\n@Import\nprivate GenderEnum gender;\n```\n\n```\n@Named(\"Bonus.*\")\n@Import\nList\u003cString\u003e bonuses;\n```\n\n```\n@Named(\"Bonus.*\")\n@Import\nMap\u003cString,String\u003e bonusesMap1;\n```\n\n```\n@Named(value = \"Non existent column\", required = false)\n@Import\nprivate Boolean nonExistentColumn;\n```\n\n```\n@Named(\"Password expires\")\n@Import(matches = \"yy-MM-dd_HH:mm\")\nLocalDateTime passwordExpires;\n```\n\n4. (Optional) For special use cases you can also \"Inject\" certain information inside your object using ```@Inject```\n\n| Options ```@Inject``` | Type       | Default | Explanation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |\n| --------------------- | :--------: | :-----: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |\n| value                 | InjectType |         | Enum with the following values:\u003cbr/\u003e ```IMPORT_INDEX```: For ```Integer``` fields this will set the index of the row irrespective of the sheet. Starts at 1 and will be unique for each row imported. Even invalid rows get an index; \u003cbr/\u003e```ROW_NUMBER```: For ```Integer``` fields, sets the number of the row in the sheet (starts at 0);\u003cbr/\u003e```SHEET_NAME```: For ```String``` fields, sets the sheet name the row is present in. Useful when using regex for the sheet name;\u003cbr/\u003e```SHEET_INDEX```: For ```Integer``` fields, sets the sheet index (starts from 0) the row is present in. Useful when using regex for the sheet name; \u003cbr/\u003e```UNMATCHED_COLUMNS```: For ```Map\u003cString, String\u003e``` fields, this gathers all columns that were not otherwise matched; |\n\n5. Create **public setters** for **all** the fields you want to import or inject based on the ```setColumnName``` format\n   accepting a single parameter of the same type as the field;\n\n6. (Optional) If you need cross validation (e.g.: if x field is null then y field must also be null) or more complex\n   static validations not covered by the annotations you can have your mapped class implement ```Validatable``` and use\n   the ```validate``` method to return a list of validation problems;\n\n7. Build the project in order for the annotations to be processed and the mapper to be generated (The mapper will have\n   the same package as your class and the name will have \"ImportMapper\" appended);\n\n8. Create a new instance of ```Importer``` passing the class of the generated mapper; You could even make a bean out of\n   it if using this alongside dependency injection;   \n   E.g.:\n\n```\nImporter\u003cEmployee\u003e importer=Importer.build(EmployeeImportMapper.class);\n``` \n\n9. (Optional) If you want to do even more advanced validations (non static, e.g.: check a value is present in a\n   database) you can use the builder (```Importer.builder(EmployeeImportMapper.class)```) and add validators to the\n   importer using ```.withValidator(validator)``` where ```validator``` is of\n   type ```Function\u003cT, List\u003cValidationProblem\u003e\u003e```; Mutating the object here is NOT encouraged or supported;\n\n10. (Optional) For \"hooks\" that get called for valid/invalid/all rows you can use ```.withConsumerForValid```\n    ,```.withConsumerForInvalid``` or ```.withConsumer```; At the time the \"hooks\" are called the validity of the object\n    was already determined; Mutating the object here is NOT encouraged or supported;\n\n11. After everything is set up, you can call ```.process``` (with ```.build()``` before if youa dded extra options) and\n    provide an ```InputStream``` of the excel file; This will return an immutable[^2] instance of\n    type ```ImportData\u003cT\u003e``` that will contain the imported data and the validation problems;\n\n12. Access what you need from ```ImportData\u003cT\u003e```:\n\n```\n    public boolean isValid();\n    \n    public List\u003cT\u003e getValidRows();\n    public List\u003cT\u003e getInvalidRows();\n    public List\u003cT\u003e getInvalidRows(Class\u003c? extends Problem\u003e validationProblemClass);\n    public List\u003cT\u003e getAllRows();\n\n    public long getRowNoTotal();\n    public long getRowNoValid();\n    public long getRowNoInvalid();\n\n    public List\u003cProblem\u003e getValidationProblems();\n    public List\u003cRowProblem\u003e getValidationProblems(T data);\n    public List\u003cProblem\u003e getValidationProblems(Class\u003c? extends Problem\u003e clazz);\n    public List\u003cProblem\u003e getValidationProblemsTree(Class\u003c? extends Problem\u003e clazz);\n```\n\nExtra step (optional):\nIf you are using an **IoC** framework (like **Spring**) the ```Importer``` can function as a ```Bean``` since it does\nnot change state across imports:\n\n```\n    @Bean\n    public Importer\u003cEmployee\u003e getImporterEmployee(EmployeeService employeeService) {        \n        return Importer.builder(EmployeeNamedImportMapper.class)                \n                .withValidator(employeeService.getImportValidator())\n                .withConsumerForValid(employeeService.getKafkaConsumer());\n                .build();\n    }\n```\n\nCompatibility\n===========\nDesigned to work both with ```xlsx``` and ```xls``` files thanks to ```apache-poi```;\n\nThe library targets **Java 8** and tries to keep dependencies at a minimum;\n\nThe library uses ```org.apache-poi:5.0.0``` (latest as of writing this). I found that there can be problems if you or\nanother of your dependency are using an older version that provides older jars like ```ooxml-schemas-1.4.jar```. Don't\nask me why but if you need to include the apache-poi dependencies in your POM in order to upgrade the version, try to\ninclude them among the top of your dependencies;\n\nDue to **Lombok** using annotation processing in unintended ways (changing existing classes instead of strictly\ngenerating new ones) importing classes that have the setters generated with Lombok may create problems if this library\ngets to process its annotations before Lombok. I've found the problem between Lombok and other libraries using\nannotation processing in the past too. For me the solution was to put the lombok dependency first in my pom but that may\nnot be that safe or work for in every situation. Worst case scenario, use the IDE to generate your setters or write them\nmanually.\n\nTesting\n===========\nMost if not all the features of this library should be covered by **integration tests** (see the test folder). This\nallows safer development and more reliable code however it is hard to cover all combinations of annotations used,\nsettings and most importantly the state of the provided excel.   \nIf you end up using this, I suggest you write your own integration tests for your specific POJOs. Given the way the\nlibrary works this shouldn't take you long so no excuse not to have this covered by tests too. Tests are also a great\nway to **contribute** to this if you want to. Adding your specific POJOs and dummy data can help improve the quality of\nthis library.\n\nContact\n===========\nIf you want to reach out about this repository you can find me at: spreadsheet.importer (at) vmt.nom.ro. **No support**\nvia email, for bugs please open an issue.\n\nLicense - MIT\n===========\n\n**Copyright**  (c) 2021 - Mihai Vasile\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated\ndocumentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the\nrights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit\npersons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the\nSoftware.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE\nWARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR\nCOPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR\nOTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n**[^1]**: For boolean values strings are trimmed, upper-cased and then the following are mapped to ```true```: \"TRUE\", \"\n1\", \"T\", \"Y\", \"YES\" and the following to ```false```: \"FALSE\", \"0\", \"F\", \"N\", \"NO\"; Any other value will be considered\ninvalid together with the row;  \n**[^2]**: Apart from the instances of the user provided class ```\u003cT\u003e```, the ```ImportData``` instance **is intended**\nto be immutable at the time the developer gets access to it (the instance can and is mutated internally by the library\nbefore it is provided to the dev);\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikayl%2Fspreadsheet-importer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmikayl%2Fspreadsheet-importer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmikayl%2Fspreadsheet-importer/lists"}