Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ronald-kimeli/generate_splittedcsv_s3bucket


https://github.com/ronald-kimeli/generate_splittedcsv_s3bucket

Last synced: about 1 month ago
JSON representation

Awesome Lists containing this project

README

        

# S3-Buckect jsonlines splitted to CSV
> This is an automated script to download _zstandard_ zipped jsonlines file from the **S3-BUCKET** cloud and unzip it automatically to jsonlines by _splitting_ it to smaller files then use it to generate **CSV** siles.

> Simple steps for quick Installation!

* Clone this repository to your computer.
```
git clone https://github.com/KimelirR/generate_SplittedCsv_S3BUCKET.git
```
* Create .env file
```
cp .env.example .env
```
* **Provide credentials of your S3-BUCKET below in .env file**
~~~
KEY=?
SECRET=?
REGION=?
BUCKET=?
~~~

* Install required dependencies through
```
composer install
```
> Note!
1. Ensure you give credentials of your s3bucket correctly.

> Lastly! Generate Csv

* All the functions and classes are inside src folder.

```php
php index.php
```

- Download the latest json lines manually and append filepath like example down below to deline.

```php
$json_lines = (new JsonLines())->delineEachLineFromFile('jobs_2022_11_30.jsonl');

```
- Otherwise using Linux environment everything will be executed automatically,

```php
$json_lines = (new JsonLines())->delineEachLineFromFile($path);
```
#### $path Comes from aws autoload
- Index.php saves file into current Folder._
- generate_csv.php outputs a downloaded file with contents written on.

Generally Split class split into 5000 each file . you can edit on split.class.php