{"id":15358879,"url":"https://github.com/warrenweckesser/scarff","last_synced_at":"2026-01-06T20:48:04.832Z","repository":{"id":141013637,"uuid":"412310098","full_name":"WarrenWeckesser/scarff","owner":"WarrenWeckesser","description":"An ARFF file writer that handles NumPy arrays and SciPy sparse matrices.","archived":false,"fork":false,"pushed_at":"2025-01-09T05:12:11.000Z","size":93,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-01T22:29:10.438Z","etag":null,"topics":["arff","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WarrenWeckesser.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-01T03:19:57.000Z","updated_at":"2025-01-09T05:12:15.000Z","dependencies_parsed_at":"2024-05-23T02:35:00.102Z","dependency_job_id":"86839c4b-d47a-4a15-ac89-45f677b35b68","html_url":"https://github.com/WarrenWeckesser/scarff","commit_stats":{"total_commits":36,"total_committers":1,"mean_commits":36.0,"dds":0.0,"last_synced_commit":"0f460d9f7c6dcf01046fc0995beaa34b3c382a70"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarrenWeckesser%2Fscarff","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarrenWeckesser%2Fscarff/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarrenWeckesser%2Fscarff/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarrenWeckesser%2Fscarff/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WarrenWeckesser","download_url":"https://codeload.github.com/WarrenWeckesser/scarff/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245916256,"owners_count":20693389,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arff","python"],"created_at":"2024-10-01T12:43:17.003Z","updated_at":"2026-01-06T20:48:04.792Z","avatar_url":"https://github.com/WarrenWeckesser.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"scarff\n======\n\nAn ARFF file writer that handles NumPy arrays and SciPy sparse matrices.\n\nLimitations:\n\n* ``relational`` attributes are not supported.\n* The ``dateformat`` parameter accepts a format string that defines\n  the output format for ``date`` attributes.  ARFF uses the Java\n  SimpleDateFormat specification for the format string.  Only a subset\n  of the SimpleDateFormat patterns are supported by ``savearff``.\n* The big limitation, of course, is that this package includes only a\n  writer.  It does not provide a function to read ARFF files.\n\nExamples\n--------\n\nInitial imports::\n\n    \u003e\u003e\u003e import sys\n    \u003e\u003e\u003e import numpy as np\n    \u003e\u003e\u003e from scarff import savearff\n\n**NumPy array of integers**\n\n``a`` is a 2-d array of integers.  The default attribute names generated\nby ``savearff`` for each column are ``f0``, ``f1``, etc.  Here we\noverride that default and assign each column an attribute name with the\n``attributes`` parameter::\n\n    \u003e\u003e\u003e a = np.array([[1, 2, 3], [9, 7, 6], [2, 2, 8], [4, 2, 3]])\n    \u003e\u003e\u003e savearff(sys.stdout, a, attributes=['x0', 'y0', 'z0'],\n    ...          relation='points')\n    @relation points\n\n    @attribute x0 integer\n    @attribute y0 integer\n    @attribute z0 integer\n\n    @data\n    1,2,3\n    9,7,6\n    2,2,8\n    4,2,3\n\n**NumPy array with a structured dtype**\n\nIn this example, we have a structured array with a data type\nthat has four fields.  ``savearff`` takes the attribute names\nfrom the names of the fields in the data type.  This example\nalso shows the use of a ``date`` attribute::\n\n    \u003e\u003e\u003e dt = np.dtype([('id', int), ('strength', float), ('key', 'U8'),\n    ...                ('timestamp', 'datetime64[s]')])\n    \u003e\u003e\u003e m = np.array([(233, 1.75, 'QXX34', '2011-05-04T13:12:04'),\n    ...               (154, 3.25, 'QXX99', '2011-05-04T13:47:43'),\n    ...               (199, 2.16, 'QXZ55', '2011-05-04T14:41:02'),\n    ...               (198, 2.32, 'QXZ59', '2011-05-04T15:28:19')], dtype=dt)\n    \u003e\u003e\u003e savearff(sys.stdout, m, relation='measurements',\n    ...          dateformat='yyyy-MM-dd HH:mm:ss')\n    @relation measurements\n\n    @attribute id integer\n    @attribute strength real\n    @attribute key string\n    @attribute timestamp date \"yyyy-MM-dd HH:mm:ss\"\n\n    @data\n    233,1.75,\"QXX34\",\"2011-05-04 13:12:04\"\n    154,3.25,\"QXX99\",\"2011-05-04 13:47:43\"\n    199,2.16,\"QXZ55\",\"2011-05-04 14:41:02\"\n    198,2.32,\"QXZ59\",\"2011-05-04 15:28:19\"\n\n**Nominal attributes**\n\nARFF files can have \"nominal\" attributes, in which the possible\nvalues are restricted to a given set.  The ``nominal`` parameter\nof ``savearff`` allows a column to be designated as a nominal\nattribute.  The set of possible values can be derived from the\nset of unique values found in the column, or can be given explicitly.\nFor example, here we use ``nominal={'color': True}`` to indicate that\nthe ``color`` attribute is nominal; the set of possible values will\nbe the set of unique values found in the data (in this case, ``black``,\n``green`` and ``red``)::\n\n    \u003e\u003e\u003e things = [[10, 20, 'a', 'green'],\n    ...           [30, 40, 'b', 'red'],\n    ...           [50, 60, 'b', 'red'],\n    ...           [70, 80, 'c', 'black'],\n    ...           [19, 29, 'c', 'red']]\n    \u003e\u003e\u003e savearff(sys.stdout, things, relation='THINGS',\n    ...          attributes=['x', 'y', 'code', 'color'],\n    ...          nominal={'color': True})\n    @relation THINGS\n\n    @attribute x integer\n    @attribute y integer\n    @attribute code string\n    @attribute color {black,green,red}\n\n    @data\n    10,20,\"a\",\"green\"\n    30,40,\"b\",\"red\"\n    50,60,\"b\",\"red\"\n    70,80,\"c\",\"black\"\n    19,29,\"c\",\"red\"\n\nThe set of possible values can be given explicitly::\n\n    \u003e\u003e\u003e savearff(sys.stdout, things, relation='THINGS',\n    ...          attributes=['x', 'y', 'code', 'color'],\n    ...          nominal={'color': ['red', 'green', 'blue', 'black', 'white']})\n    @relation THINGS\n\n    @attribute x integer\n    @attribute y integer\n    @attribute code string\n    @attribute color {red,green,blue,black,white}\n\n    @data\n    10,20,\"a\",\"green\"\n    30,40,\"b\",\"red\"\n    50,60,\"b\",\"red\"\n    70,80,\"c\",\"black\"\n    19,29,\"c\",\"red\"\n\n**SciPy sparse matrix**\n\nSciPy is not a required dependency of ``scarff``, but ``savearff``\nwill recognize SciPy sparse matrices and write them to the ARFF file\nusing the sparse format by default::\n\n    \u003e\u003e\u003e from scipy.sparse import csc_matrix\n    \u003e\u003e\u003e data = [10, 20, 30, 40, 50, 60]\n    \u003e\u003e\u003e rows = [0, 2, 2, 3, 5, 5]\n    \u003e\u003e\u003e cols = [3, 1, 2, 2, 3, 4]\n    \u003e\u003e\u003e s = csc_matrix((data, (rows, cols)), shape=(7, 5))\n    \u003e\u003e\u003e s.toarray()\n    array([[ 0,  0,  0, 10,  0],\n           [ 0,  0,  0,  0,  0],\n           [ 0, 20, 30,  0,  0],\n           [ 0,  0, 40,  0,  0],\n           [ 0,  0,  0,  0,  0],\n           [ 0,  0,  0, 50, 60],\n           [ 0,  0,  0,  0,  0]])\n    \u003e\u003e\u003e savearff(sys.stdout, s, relation='links',\n    ...          attributes=['a', 'b', 'c', 'd', 'e'])\n    @relation links\n\n    @attribute a integer\n    @attribute b integer\n    @attribute c integer\n    @attribute d integer\n    @attribute e integer\n\n    @data\n    {3 10}\n    {}\n    {1 20, 2 30}\n    {2 40}\n    {}\n    {3 50, 4 60}\n    {}\n\n**Sparse format with a NumPy array**\n\nA regular NumPy array can be written in the sparse format by giving\nthe argument ``fileformat='sparse'``::\n\n    \u003e\u003e\u003e sp = np.array([[0, 0, 99, 0, 0],\n    ...                [29, 0, 0, 0, 19],\n    ...                [0, 0, 0, 0, 0],\n    ...                [0, 89, 0, 0, 0]])\n    \u003e\u003e\u003e savearff(sys.stdout, sp, fileformat='sparse',\n    ...          relation='sparse example')\n    @relation \"sparse example\"\n\n    @attribute f0 integer\n    @attribute f1 integer\n    @attribute f2 integer\n    @attribute f3 integer\n    @attribute f4 integer\n\n    @data\n    {2 99}\n    {0 29, 4 19}\n    {}\n    {1 89}\n\n**Missing data**\n\nThe ``missing`` parameter allows values to be specified that\ncorrespond to missing values.  These will appear as ``?`` in the\n``@data`` section of the ARFF file.\n\nIn this example, the value 999.25 indicates a missing value::\n\n    \u003e\u003e\u003e x = np.array([[1.75, 7.93, 18.31],\n    ...               [2.44, 6.62, 32.11],\n    ...               [2.51, 2.25, 999.25],\n    ...               [2.64, 2.33, 999.25],\n    ...               [2.75, 2.83, 999.25]])\n    \u003e\u003e\u003e savearff(sys.stdout, x, missing=[999.25], relation='readings')\n    @relation readings\n\n    @attribute f0 real\n    @attribute f1 real\n    @attribute f2 real\n\n    @data\n    1.75,7.93,18.31\n    2.44,6.62,32.11\n    2.51,2.25,?\n    2.64,2.33,?\n    2.75,2.83,?\n\n**NumPy masked array**\n\n``savearff`` recognizes NumPy masked arrays.  Masked values in\nthe input array will be written as ``?`` in the ``@data`` section::\n\n    \u003e\u003e\u003e flux = np.ma.masked_array([[3.4, 2.1, 0.0, 3.4],\n    ...                            [3.2, 4.8, 0.5, 3.7],\n    ...                            [3.3, 2.8, 0.0, 4.1]],\n    ...                           mask=[[0, 0, 1, 0],\n    ...                                 [0, 0, 0, 0],\n    ...                                 [0, 0, 1, 0]])\n    \u003e\u003e\u003e flux\n    masked_array(\n      data=[[3.4, 2.1, --, 3.4],\n            [3.2, 4.8, 0.5, 3.7],\n            [3.3, 2.8, --, 4.1]],\n      mask=[[False, False,  True, False],\n            [False, False, False, False],\n            [False, False,  True, False]],\n      fill_value=1e+20)\n    \u003e\u003e\u003e savearff(sys.stdout, flux, relation='flux capacitance')\n    @relation \"flux capacitance\"\n\n    @attribute f0 real\n    @attribute f1 real\n    @attribute f2 real\n    @attribute f3 real\n\n    @data\n    3.4,2.1,?,3.4\n    3.2,4.8,0.5,3.7\n    3.3,2.8,?,4.1\n\n**NumPy array with nested data type**\n\nThis example uses a NumPy array with a structured data type with nested\nand array elements in the structure.  ``savearff`` flattens the data type\nand derives attribute names from the structured data type; note how the\nfield names in the structured data type are used to produce the attribute\nnames in the output::\n\n    \u003e\u003e\u003e dt = np.dtype([('key', 'U4'),\n    ...                ('position', [('x', np.float32), ('y', np.float32)]),\n    ...                ('values', np.float32, 3)])\n    \u003e\u003e\u003e records = np.array([('A234', (1.9, -3.0), (6, 7, 2)),\n    ...                     ('A555', (2.8, 0.6), (4, 2.5, 3)),\n    ...                     ('B431', (2.7, 8.6), (4, 2.8, 0.2))], dtype=dt)\n    \u003e\u003e\u003e savearff(sys.stdout, records, relation='records')\n    @relation records\n\n    @attribute key string\n    @attribute position.x real\n    @attribute position.y real\n    @attribute values_0 real\n    @attribute values_1 real\n    @attribute values_2 real\n\n    @data\n    \"A234\",1.9,-3,6,7,2\n    \"A555\",2.8,0.6,4,2.5,3\n    \"B431\",2.7,8.6,4,2.8,0.2\n\nThe above example demonstrates the default method for converting\nstructured data type field names to attribute names. ``savearff``\nhas several options to change how the names are generated.\nFor example::\n\n    \u003e\u003e\u003e savearff(sys.stdout, records, relation='records',\n    ...          join='$', index_base=1, index_open='(', index_close=')')\n    @relation records\n\n    @attribute key string\n    @attribute position$x real\n    @attribute position$y real\n    @attribute values(1) real\n    @attribute values(2) real\n    @attribute values(3) real\n\n    @data\n    \"A234\",1.9,-3,6,7,2\n    \"A555\",2.8,0.6,4,2.5,3\n    \"B431\",2.7,8.6,4,2.8,0.2\n\n**Instance weights**\n\nThe ARFF format provides the option of saving an \"instance weight\" with\neach instance (i.e. each row) of the data.  ``savearff`` accepts a\n``weights`` argument containing a sequence of numbers.  The length of\n``weights`` must equal the number of rows to be written in the ``@DATA``\nsection.  The weights are written to the file as an additional column in\nthe ``@DATA`` section, with the values enclosed in curly brackets.\n\nFor example::\n\n    \u003e\u003e\u003e dt = np.dtype([('id', int), ('x', float), ('y', float)])\n    \u003e\u003e\u003e samples = np.array([(300, 1.5, 1.8),\n    ...                     (300, 0.8, 2.4),\n    ...                     (304, 2.4, 0.5),\n    ...                     (304, 3.2, 0.2)], dtype=dt)\n    \u003e\u003e\u003e weights = np.array([2, 2, 1, 1])\n    \u003e\u003e\u003e savearff(sys.stdout, samples, relation='samples', weights=weights)\n    @relation samples\n\n    @attribute id integer\n    @attribute x real\n    @attribute y real\n\n    @data\n    300,1.5,1.8, {2}\n    300,0.8,2.4, {2}\n    304,2.4,0.5, {1}\n    304,3.2,0.2, {1}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarrenweckesser%2Fscarff","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwarrenweckesser%2Fscarff","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarrenweckesser%2Fscarff/lists"}