https://github.com/electrovir/statement-parser

Parse bank and credit card statements
https://github.com/electrovir/statement-parser
bank chase citi credit-card finances financial parser pdf pdf-to-json statement usaa
Last synced: 30 days ago
JSON representation
Parse bank and credit card statements
Host: GitHub
URL: https://github.com/electrovir/statement-parser
Owner: electrovir
License: mit
Created: 2019-12-02T01:14:54.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2023-08-27T11:23:03.000Z (about 2 years ago)
Last Synced: 2025-09-02T12:56:02.852Z (about 1 month ago)
Topics: bank, chase, citi, credit-card, finances, financial, parser, pdf, pdf-to-json, statement, usaa
Language: TypeScript
Homepage: https://www.npmjs.com/package/statement-parser
Size: 948 KB
Stars: 35
Watchers: 1
Forks: 8
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # statement-parser

[![tests](https://github.com/electrovir/statement-parser/workflows/tests/badge.svg)](https://github.com/electrovir/statement-parser/actions/workflows/master-tests.yml)

Parse bank and credit card statements.

For English USD only (currently at least).

See the [Parsers](#parsers) section below for the available parsers.

**DISCLAIMER**: I don't necessarily have sufficient data for all of the contained parsers to cover edge cases. See the [Development](#development) section for how to contribute.

# Usage

Install from the [`statement-parser` npm package](https://www.npmjs.com/package/statement-parser).

```sh

npm i statement-parser

```

Currently tested on Node.js versions 12.x and 14.x in combination with the latest macOS, Ubuntu, and Windows.

## Api

The high level most useful api function is the asynchronous [`parsePdfs`](https://github.com/electrovir/statement-parser/tree/master/src/parser/parse-api.ts) function. Simply pass in an array that has details for each PDF file you wish to parse. Note that there is no synchronous alternative.

```TypeScript

import {parsePdfs, ParserType} from 'statement-parser';

parsePdfs([

    {

        parserInput: {

            filePath: 'files/downloads/myPdf.pdf',

        },

        type: ParserType.ChasePrimeVisaCredit,

    },

]).then((results) => console.log(results));

```

`parsePdfs` accepts an array of [`StatementPdf`](https://github.com/electrovir/statement-parser/tree/master/src/parser/parse-api.ts) objects. Thus, each element in the array should look like the following:

```TypeScript

import {ParserType, StatementPdf} from 'statement-parser';

const myPdfToParse: StatementPdf = {

    parserInput: {

        /**

         * This is the only necessary parserInput property. For more examples of parserInput (such

         * as parserOptions), see the Examples section in the README.

         */

        filePath: 'my/file/path.pdf',

    },

    /**

     * Any ParserType can be assigned to the "type" property. See the Parsers section in the README

     * for more information.

     */

    type: ParserType.CitiCostcoVisaCredit,

};

```

For more examples see the [Examples](#examples) section.

## Parsers

Currently built parsers are the following:

-   **`ParserType.ChasePrimeVisaCredit`**: for credit card statements from Chase for the Amazon Prime Visa credit card.

-   **`ParserType.CitiCostcoVisaCredit`**: for credit card statements from Citi for the Costco Visa credit card.

-   **`ParserType.UsaaBank`**: for checking and savings account statements with USAA.

-   **`ParserType.UsaaVisaCredit`**: for Visa credit card statements from USAA.

-   **`ParserType.Paypal`**: for statements from PayPal.

Simply import `ParserType` to use these keys, as shown below and in the other [Examples](#examples) in this README:

```TypeScript

import {ParserType} from 'statement-parser';

// possible ParserType keys

ParserType.ChasePrimeVisaCredit;

ParserType.CitiCostcoVisaCredit;

ParserType.UsaaBank;

ParserType.UsaaVisaCredit;

ParserType.Paypal;

```

## Examples

-   There are extra parser inputs:

    

    ```TypeScript

    import {parsePdfs, ParserType} from 'statement-parser';

    parsePdfs([

        {

            parserInput: {

                /** FilePath is always required. What would the parser do without it? */

                filePath: 'my/paypal/file.pdf',

                /**

                 * Optional name property to help identify the pdf if any errors occur. (By default file

                 * paths will be used in errors so this is only for human readability if desired.)

                 */

                name: 'pdf with all options',

                /**

                 * Optional debug property to see LOTS of output which shows the internal state machine

                 * progressing over each line of the file.

                 */

                debug: true,

                /**

                 * Optional input that provides additional parser configuration. Each parser type has

                 * slightly different parser options.

                 */

                parserOptions: {

                    /** Every parser includes this option. See Year prefix section in the README for details. */

                    yearPrefix: 19,

                },

            },

            /** Type is always required. Without it, the package doesn't know which parser to use. */

            type: ParserType.Paypal,

        },

        {

            parserInput: {

                filePath: 'my/chase-prime-visa-credit/file.pdf',

                parserOptions: {

                    /**

                     * Example of an extra ParserType specific option that will change the parsing

                     * behavior. This option is not valid for any of the other parser types except for

                     * the ParserType.ChasePrimeVisaCredit parser.

                     */

                    includeMultiLineDescriptions: true,

                },

            },

            type: ParserType.ChasePrimeVisaCredit,

        },

    ]).then((result) => console.log(result));

    ```

-   If you're less familiar with asynchronous programming, here's a good way (but not the _only_ way) to deal with that:

    

    ```TypeScript

    import {parsePdfs, ParserType} from 'statement-parser';

    async function main() {

        const results = await parsePdfs([

            {

                parserInput: {

                    filePath: 'my/paypal/file.pdf',

                },

                type: ParserType.Paypal,

            },

        ]);

        // do something with the result

        return results;

    }

    if (require.main === module) {

        main().catch((error) => {

            console.error(error);

            process.exit(1);

        });

    }

    ```

-   Parsing files can be done directly with a single parser:

    

    ```TypeScript

    import {parsers, ParserType} from 'statement-parser';

    const parser = parsers[ParserType.Paypal];

    parser.parsePdf({filePath: 'my/paypal/file.pdf'}).then((result) => console.log(result));

    ```

-   With a single parser you can parse text lines directly (if somehow that's how your statements are stored), rather than using a PDF file:

    

    ```TypeScript

    import {parsers, ParserType} from 'statement-parser';

    const parser = parsers[ParserType.Paypal];

    parser.parseText({textLines: ['text here', 'line 2 here', 'line 3', 'etc.']});

    ```

## Year prefix

**You don't even need to think about this option** unless you're parsing statements from the `1900`s or this package is somehow relevant still in the year `2100`.

Year prefix is an optional parser option. Many statements only include an abbreviated year (like `09` or `16`). As such, the first two digits of the full year, or "year prefix" must be assumed. This value defaults to `20`. Thus, any statements getting parsed from the year `2000` to the year `2099` (inclusive) don't need to set this option.

# Development

Contributions are welcome! This can take the form of one of the following:

-   adding [fixes to current parsers](#fixing-current-parsers)

-   creating [entirely new parsers](#creating-a-new-parser)

-   fixing or filing [bugs](#general-bug-fixes) (including sanitization bugs)

Each change must be accompanied by a new test to make sure that what you add does not get broken.

**Be extra careful to not commit any bank information along with your changes. Do not commit actual statement PDFs to the repo.** See the [sanitizing pdfs](#sanitizing-pdfs) section for steps on how to create sanitized, testable versions of statement PDFs that can be committed to the repo.

## Fixing current parsers

If you're encountering errors when parsing one of your statement PDFs (when hooked up to the correct `ParserType` for course), an already implemented parser may need fixing. This can be done through one of the following:

-   add a new parser option to handle an edge case

-   fix parser code to not fail in the first place

Make sure to add a sanitized file test (see [sanitizing pdfs](#sanitizing-pdfs)) and run tests (see [testing](#testing)) before committing.

## Creating a new parser

If you find that your statement PDF is coming from a bank or credit card that this package does even have a parser for yet, you can add that parser! See [`example-parser.ts`](https://github.com/electrovir/statement-parser/tree/master/src/parser/implemented-parsers/example-parser.ts) for a good starting point.

## General bug fixes

1. Add a test that fails because of the bug. See [Adding tests](#adding-tests) for details.

2. Verify that the test fails before fixing the bug.

3. Commit the new test.

4. Fix the bug.

5. Verify that your test from step 2 now passes and all other tests still pass. See [Running tests](#running-tests) for details.

6. Commit, push, open a PR!

## Sanitizing PDFs

1.  Run `npm run sanitize` with the relevant arguments.

    -   For argument help run the following: `npm run sanitize -- --help`

2.  **Extra super quadruple check** that the sanitized `.json` file does not contain any confidential information in it, such as names of people or businesses, exact transaction amounts, actual dates, etc.

    -   If there is confidential information, please [open a bug](https://github.com/electrovir/statement-parser/issues) or fix the bug.

3.  Run tests. (See [Running tests](#running-tests) for details.)

4.  Verify that your sanitized `.json` file has been added to the appropriate parser folder in `files/sample-files/sanitized`.

5.  Commit away!

## Testing

### Running tests

-   to run all TypeScript tests (usually all you need): `npm test`

-   to test a specific file: `npm run test:file path/to/file.ts`

    -   example: `npm run test:file repo-paths.ts`

-   to run _all_ repository tests (this is what runs in GitHub Actions): `npm run test:full`

### Adding tests

1. If it does not exist already, add a new `X.test.ts` file next to the file that contains the function to be tested, where `X` is the name of the file to be tested.

2. If it does not exist already, add a new `testGroup()` call (imported from `test-vir`) for the function that will be tested.

3. Add new `runTest` calls for the tests you want to add.

See other test files for examples, such as [`array.test.ts`](https://github.com/electrovir/statement-parser/tree/master/src/augments/array.test.ts).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/electrovir/statement-parser

Awesome Lists containing this project

README