{"id":16278108,"url":"https://github.com/helgesverre/receipt-scanner","last_synced_at":"2025-04-08T17:29:26.930Z","repository":{"id":196194979,"uuid":"694840246","full_name":"HelgeSverre/receipt-scanner","owner":"HelgeSverre","description":"🧾✨ AI-Powered Receipt and Invoice Scanner for Laravel, with support for images, documents and text","archived":false,"fork":false,"pushed_at":"2024-07-14T09:21:35.000Z","size":2702,"stargazers_count":131,"open_issues_count":2,"forks_count":17,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-16T11:48:22.865Z","etag":null,"topics":["invoice","laravel-package","openai","receipt","scanner"],"latest_commit_sha":null,"homepage":"","language":"Rich Text Format","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HelgeSverre.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-21T19:59:51.000Z","updated_at":"2025-03-09T07:40:54.000Z","dependencies_parsed_at":"2024-07-14T10:45:56.468Z","dependency_job_id":null,"html_url":"https://github.com/HelgeSverre/receipt-scanner","commit_stats":null,"previous_names":["helgesverre/receipt-scanner"],"tags_count":8,"template":false,"template_full_name":"spatie/package-skeleton-laravel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelgeSverre%2Freceipt-scanner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelgeSverre%2Freceipt-scanner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelgeSverre%2Freceipt-scanner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HelgeSverre%2Freceipt-scanner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HelgeSverre","download_url":"https://codeload.github.com/HelgeSverre/receipt-scanner/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247891856,"owners_count":21013598,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["invoice","laravel-package","openai","receipt","scanner"],"created_at":"2024-10-10T18:57:20.595Z","updated_at":"2025-04-08T17:29:26.912Z","avatar_url":"https://github.com/HelgeSverre.png","language":"Rich Text Format","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg src=\".github/header.png\"\u003e\u003c/p\u003e\n\n\u003e *Need more flexibility?* Try the [Extractor](https://github.com/HelgeSverre/extractor) package instead, a AI-Powered\n\u003e data extraction library for Laravel\n\n# AI-Powered Receipt and Invoice Scanner for Laravel\n\n![Latest Version on Packagist](https://img.shields.io/packagist/v/helgesverre/receipt-scanner.svg?style=flat-square)\n![Total Downloads](https://img.shields.io/packagist/dt/helgesverre/receipt-scanner.svg?style=flat-square)\n\nEasily extract structured receipt data from images, PDFs, and emails within your Laravel application using OpenAI.\n\n## Features\n\n- Light wrapper around OpenAI Chat and Completion endpoints.\n- Accepts text as input and returns structured receipt information.\n- Includes a well-tuned prompt for parsing receipts.\n- Supports various input formats including Plain Text, PDF, Images, Word documents, and Web content.\n- Integrates with [Textract](https://aws.amazon.com/textract/) for OCR functionality.\n\n## Installation\n\nInstall the package via composer:\n\n```bash\ncomposer require helgesverre/receipt-scanner\n```\n\nPublish the config file:\n\n```bash\nphp artisan vendor:publish --tag=\"receipt-scanner-config\"\n```\n\nAll the configuration options are documented in the configuration file.\n\nSince this package uses the [OpenAI Laravel Package](https://github.com/openai-php/laravel), so you also need to publish\ntheir config and add the `OPENAI_API_KEY` to your `.env` file:\n\n```shell\nphp artisan vendor:publish --provider=\"OpenAI\\Laravel\\ServiceProvider\"\n```\n\n```dotenv\nOPENAI_API_KEY=\"your-key-here\n```\n\n## Usage\n\n### Extracting receipt data from Plain Text\n\nPlain text scanning is useful when you already have the textual representation of a receipt or invoice.\n\nThe example is from a Paddle.com receipt email, where I copied all the text in the email, and removed all the empty\nlines.\n\n```php\n$text = \u003c\u003c\u003cRECEIPT\nLiseth Solutions AS\nvia software reseller Paddle.com\nThank you for your purchase!\nYour full invoice is attached to this email.\nAmount paid\nPayment method\nNOK 2,498.75\nvisa\nending in 4242\nTest: SaaS Subscription - Pro Plan\nSeptember 22, 2023 11:04 am UTC - October 22, 2023 11:04 am UTC\nNOK 1,999.00\nQTY: 1\nSubtotal\nNOK 1,999.00\nVAT\nNOK 499.75\nAmount paid*\nNOK 2,498.75\n*This payment will appear on your statement as: PADDLE.NET* EXAMPLEINC\nNEED HELP?\nNeed help with your purchase? Please contact us on paddle.net.\nlogo\nPaddle.com Market Ltd, Judd House, 18-29 Mora Street, London EC1V 8BT\n© 2023 Paddle. All rights reserved.\nRECEIPT;\n\n\nReceiptScanner::scan($text);\n```\n\n### Extracting data from other formats\n\n```php\nuse HelgeSverre\\ReceiptScanner\\Facades\\Text;\n\n$textPlainText = Text::text(file_get_contents('./receipt.txt'));\n$textPdf = Text::pdf(file_get_contents('./receipt.pdf'));\n$textImageOcr = Text::textract(file_get_contents('./receipt.jpg'));\n$textPdfOcr = Text::textractUsingS3Upload(file_get_contents('./receipt.pdf'));\n$textWord = Text::word(file_get_contents('./receipt.doc'));\n$textWeb = Text::web('https://example.com');\n$textHtml = Text::html(file_get_contents('./receipt.html'));\n```\n\nAfter loading, you can pass the `TextContent` or the plain text (which can be retrieved by calling `-\u003etoString()`) into\nthe `ReceiptScanner::scan()` method.\n\n```php\nuse HelgeSverre\\ReceiptScanner\\Facades\\ReceiptScanner;\n\nReceiptScanner::scan($textPlainText)\nReceiptScanner::scan($textPdf)\nReceiptScanner::scan($textImageOcr)\nReceiptScanner::scan($textPdfOcr)\nReceiptScanner::scan($textWord)\nReceiptScanner::scan($textWeb)\nReceiptScanner::scan($textHtml)\n```\n\n## Receipt Data Model\n\nThe scanned receipt is parsed into a DTO which consists of a main `Receipt` class, which contains the receipt metadata,\nand a `Merchant` dto, representing the seller on the receipt or invoice, and an array of `LineItem` DTOs holding each\nindividual line item.\n\n- `HelgeSverre\\ReceiptScanner\\Data\\Receipt`\n- `HelgeSverre\\ReceiptScanner\\Data\\Merchant`\n- `HelgeSverre\\ReceiptScanner\\Data\\LineItem`\n\nThe DTO has a `toArray()` method, which will result in a structure like this:\n\nFor flexibility, all fields are nullable.\n\n```php\n[\n    \"orderRef\" =\u003e \"string\",\n    \"date\" =\u003e \"date\",\n    \"taxAmount\" =\u003e \"number\",\n    \"totalAmount\" =\u003e \"number\",\n    \"currency\" =\u003e \"string\",\n    \"merchant\" =\u003e [\n        \"name\" =\u003e \"string\",\n        \"vatId\" =\u003e \"string\",\n        \"address\" =\u003e \"string\",\n    ],\n    \"lineItems\" =\u003e [\n        [\n            \"text\" =\u003e \"string\",\n            \"sku\" =\u003e \"string\",\n            \"qty\" =\u003e \"number\",\n            \"price\" =\u003e \"number\",\n        ],\n    ],\n];\n```\n\n## Returning an Array instead of a DTO\n\nIf you prefer to work with an array instead of the built-in DTO, you can specify `asArray: true` when calling `scan()`\n\n```php\nuse HelgeSverre\\ReceiptScanner\\Facades\\ReceiptScanner;\n\nReceiptScanner::scan(\n    $textPlainText\n    asArray: true\n)\n```\n\n## Specifying the model\n\nTo use a different model, you can specify the model name to use with the `model` named argument when calling\nthe `scan()` method.\n\n```php\nuse HelgeSverre\\ReceiptScanner\\Facades\\ReceiptScanner;\nuse HelgeSverre\\ReceiptScanner\\ModelNames;\n\n// With the ModelNames class\nReceiptScanner::scan($content, model: ModelNames::GPT4_1106_PREVIEW)\n\n// With a string\nReceiptScanner::scan($content, model: 'gpt-4-1106-preview')\n```\n\n## All parameters and what they do\n\n**`$text` (TextContent|string)**\n\nThe input text from the receipt or invoice that needs to be parsed. It accepts either a `TextContent` object or a\nstring.\n\n**`$model` (string)\n\nThis parameter specifies the OpenAI model used for the extraction process.\n\n`HelgeSverre\\ReceiptScanner\\ModelNames` is a class containing constants for each model, provided for convenience.\nHowever, you can also directly\nuse a string to specify the model if you prefer.\n\nDifferent models have different speed/accuracy characteristics.\n\nIf you require high accuracy, use a GPT-4 model, if you need speed, use a GPT-3 model, if you need even more speed, use\nthe `gpt-3.5-turbo-instruct` model.\n\nThe default model is `ModelNames::TURBO_INSTRUCT`.\n\n| `ModelNames` Constant           | Value                    |\n|---------------------------------|--------------------------|\n| `ModelNames::TURBO`             | `gpt-3.5-turbo`          |\n| `ModelNames::TURBO_INSTRUCT`    | `gpt-3.5-turbo-instruct` |\n| `ModelNames::TURBO_1106`        | `gpt-3.5-turbo-1106`     |\n| `ModelNames::TURBO_16K`         | `gpt-3.5-turbo-16k`      |\n| `ModelNames::TURBO_0613`        | `gpt-3.5-turbo-0613`     |\n| `ModelNames::TURBO_16K_0613`    | `gpt-3.5-turbo-16k-0613` |\n| `ModelNames::TURBO_0301`        | `gpt-3.5-turbo-0301`     |\n| `ModelNames::GPT4`              | `gpt-4`                  |\n| `ModelNames::GPT4_32K`          | `gpt-4-32k`              |\n| `ModelNames::GPT4_32K_0613`     | `gpt-4-32k-0613`         |\n| `ModelNames::GPT4_1106_PREVIEW` | `gpt-4-1106-preview`     |\n| `ModelNames::GPT4_0314`         | `gpt-4-0314`             |\n| `ModelNames::GPT4_32K_0314`     | `gpt-4-32k-0314`         |\n\n**`$maxTokens` (int)**\n\nThe maximum number of tokens that the model will processes.\nThe default value is `2000`, adjusting this value may be necessary for very long text, but 2000 is \"usually\" fairly\ngood.\n\n**`$temperature` (float)**\n\nControls the randomness/creativity of the model's output.\n\nA higher value (e.g., 0.8) makes the output more random, which is usually not what we want in this scenario, I usually\ngo with 0.1 or 0.2, anything over 0.5 becomes useless. Defaults to `0.1`.\n\n**`$template` (string)**\n\nThis parameter specifies the template used for the prompt.\n\nThe default template is `'receipt'`. You can create and use\nadditional templates by adding new blade files in the `resources/views/vendor/receipt-scanner/` directory and specifying\nthe file name (without extension) as the `$template` value (eg: `\"minimal_invoice\"`.\n\n**`$asArray` (bool)**\n\nIf true, returns the response from the AI model as an array instead of as a DTO, useful if you need to modifythe default\nDTO to have more/less fields or want to convert the response into your own DTO, defaults to `false`\n\n### Example Usage:\n\n```php\nuse HelgeSverre\\ReceiptScanner\\Facades\\ReceiptScanner;\n\n$parsedReceipt = ReceiptScanner::scan(\n    text: $textInput,\n    model: ModelNames::TURBO_INSTRUCT,\n    maxTokens: 500,\n    temperature: 0.2,\n    template: 'minimal_invoice',\n    asArray: true,\n);\n```\n\n### List of supported models\n\n| Enum Value     | Model name             | Endpoint   |\n|----------------|------------------------|------------|\n| TURBO_INSTRUCT | gpt-3.5-turbo-instruct | Completion |\n| TURBO_16K      | gpt-3.5-turbo-16k      | Chat       |\n| TURBO          | gpt-3.5-turbo          | Chat       |\n| GPT4           | gpt-4                  | Chat       |\n| GPT4_32K       | gpt-4-32               | Chat       |\n\n## OCR Configuration with AWS Textract\n\nTo use AWS Textract for extracting text from large images and multi-page PDFs,\nthe package needs to upload the file to S3 and pass the s3 object location along to the textract service.\n\nSo you need to configure your AWS Credentials in the `config/receipt-scanner.php` file as follows:\n\n```dotenv\nTEXTRACT_KEY=\"your-aws-access-key\"\nTEXTRACT_SECRET=\"your-aws-security\"\nTEXTRACT_REGION=\"your-textract-region\"\n\n# Can be omitted\nTEXTRACT_VERSION=\"2018-06-27\"\n```\n\nYou also need to configure a seperate Textract disk where the files will be stored,\nopen your  `config/filesystems.php` configuration file and add the following:\n\n```php\n'textract' =\u003e [\n    'driver' =\u003e 's3',\n    'key' =\u003e env('TEXTRACT_KEY'),\n    'secret' =\u003e env('TEXTRACT_SECRET'),\n    'region' =\u003e env('TEXTRACT_REGION'),\n    'bucket' =\u003e env('TEXTRACT_BUCKET'),\n],\n```\n\nEnsure the `textract_disk` setting in `config/receipt-scanner.php` is the same as your disk name in\nthe `filesystems.php`\nconfig, you can change it with the .env value `TEXTRACT_DISK`.\n\n```php\nreturn [\n    \"textract_disk\" =\u003e env(\"TEXTRACT_DISK\")\n];\n```\n\n`.env`\n\n```dotenv\nTEXTRACT_DISK=\"uploads\"\n```\n\n**Note**\n\nTextract is not available in all regions:\n\n\u003e Q: In which AWS regions is Amazon Textract available?\n\u003e Amazon Textract is currently available in the US East (Northern Virginia), US East (Ohio), US West (Oregon), US West (\n\u003e N. California), AWS GovCloud (US-West), AWS GovCloud (US-East), Canada (Central), EU (Ireland), EU (London), EU (\n\u003e Frankfurt), EU (Paris), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), and Asia Pacific (\n\u003e Mumbai)\n\u003e Regions.\n\nSee: https://aws.amazon.com/textract/faqs/\n\n## Publishing Prompts\n\nYou may publish the prompt file that is used under the hood by running this command:\n\n```bash\nphp artisan vendor:publish --tag=\"receipt-scanner-prompts\"\n```\n\nThis package simply uses blade files as prompts, the `{{ $context }}` variable will be replaced by the text you pass\nto `ReceiptScanner::scan(\"text here\")`.\n\n## Adding prompts/templates\n\nBy default, the package uses the `receipt.blade.php` file as its prompt template, you may add additional templates by\nsimply creating a blade file in `resources/views/vendor/receipt-scanner/minimal_invoice.blade.php` and changing\nthe `$template` parameter when calling `scan()`\n\n**Example prompt:**\n\n```blade\nExtract the following fields from the text below, output as JSON\n\ndate (as string in the  Y-m-d format)\ntotal_amount (as float, do not include currency symbol) \nvendor_name (company name)\n\n{{ $context }}\n\nOUTPUT IN JSON\n```\n\n```php\nuse HelgeSverre\\ReceiptScanner\\Facades\\ReceiptScanner;\n\n$receipt = ReceiptScanner::scan(\n    text: \"Your invoice here\",\n    model:  ModelNames::TURBO_INSTRUCT,\n    template: 'minimal_invoice',\n    asArray: true,\n);\n```\n\n## License\n\nThis package is licensed under the MIT License. For more details, refer to the [License File](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhelgesverre%2Freceipt-scanner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhelgesverre%2Freceipt-scanner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhelgesverre%2Freceipt-scanner/lists"}