{"id":21293353,"url":"https://github.com/ejw-data/python-open-files","last_synced_at":"2026-05-07T20:12:04.168Z","repository":{"id":40691918,"uuid":"479620221","full_name":"ejw-data/python-open-files","owner":"ejw-data","description":"Collection of notes related to opening files and handling text strings in python","archived":false,"fork":false,"pushed_at":"2022-06-27T14:47:45.000Z","size":136,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-15T16:44:30.754Z","etag":null,"topics":["pandas","python","sqlite"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ejw-data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-09T04:45:07.000Z","updated_at":"2022-06-27T14:47:06.000Z","dependencies_parsed_at":"2022-09-07T14:42:08.613Z","dependency_job_id":null,"html_url":"https://github.com/ejw-data/python-open-files","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ejw-data/python-open-files","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fpython-open-files","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fpython-open-files/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fpython-open-files/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fpython-open-files/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ejw-data","download_url":"https://codeload.github.com/ejw-data/python-open-files/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fpython-open-files/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32754050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T02:14:30.463Z","status":"ssl_error","status_checked_at":"2026-05-07T02:14:29.405Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["pandas","python","sqlite"],"created_at":"2024-11-21T13:54:26.543Z","updated_at":"2026-05-07T20:12:04.150Z","avatar_url":"https://github.com/ejw-data.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python File Management  \n\nAuhtor:  Erin James Wills, ejw.data@gmail.com \n\n![File Management](./images/py-openfiles1.png)  \n\n\n## Overview  \n\u003chr\u003e\nThis repo is actually just a collection of notes related to opening files and handling text strings in python.  Topics include base library methods, pandas, sqlalchemy, and sql.    \n\nThis is a work in progress and used for compiling and recording useful resources.  \n\n\u003cbr\u003e\n\n## Reading Large Files\n\n*Refs:*  https://www.kaggle.com/code/rohanrao/tutorial-on-reading-large-datasets/notebook  \n\n\u003cbr\u003e\n\n## Paths in Python\n\nWhen using a file path with `\\` in the path, remember that the backslashes are used to escape characters.\n\nFor example:  '\\r', '\\n', '\\b', '\\c', '\\t'\n\nThe solution is to either `\\\\` every backslash, which inserts a single backslash or use a raw string such as `print(r'.\\path\\file.csv')`.  The `r` instructs the interpreter to not evaluate backslashes as escapes and just as regular backslashes.\n\n\n### Quick notes:\n`print(u'string')` - prints \n\n\n\n### Remaining Questions\nIn BeautifulSoup, are the outputs in unicode?\n*  ie.  `soup[0].encode(\"ascii\")` or `soup[0].encode(\"latin-1\")` or `soup[0].encode(\"utf-8\")` or `soup[0].encode(soup.originalEncoding)` to get the output\n\n### Escaping Refernces\n1.  https://python-reference.readthedocs.io/en/latest/docs/str/escapes.html\n1.  https://www.w3schools.com/python/gloss_python_escape_characters.asp   \n\u003cbr\u003e\n\n\n\n\n# Python Open() Parameters\n\n*  `Read Only (‘r’)`: Open text file for reading. The handle is positioned at the beginning of the file. If the file does not exist, raises I/O error. This is also the default mode in which the file is opened.  \n\n*  `Read and Write (‘r+’)`: Open the file for reading and writing. The handle is positioned at the beginning of the file. Raises I/O error if the file does not exist.  \n\n* `Write Only (‘w’)`: Open the file for writing. For existing file, the data is truncated and over-written. The handle is positioned at the beginning of the file. Creates the file if the file does not exist.  \n\n* `Write and Read (‘w+’)`: Open the file for reading and writing. For existing file, data is truncated and over-written. The handle is positioned at the beginning of the file.  \n\n* `Append Only (‘a’)`: Open the file for writing. The file is created if it does not exist. The handle is positioned at the end of the file. The data being written will be inserted at the end, after the existing data.  \n\n* `Append and Read (‘a+’)`: Open the file for reading and writing. The file is created if it does not exist. The handle is positioned at the end of the file. The data being written will be inserted at the end, after the existing data.  \n\n*Ref:* https://www.geeksforgeeks.org/open-a-file-in-python/  \n\n\u003cbr\u003e\n\n## Pandas\n\u003chr\u003e\n\nReading Files in Parts  \n*  `pd.read_csv(..., nrows, skiprows, chunksize)`\n   *  `nrows` : int, default None Number of rows of file to read. Useful for reading pieces of large files*  \n   *  `skiprows` : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file  \n   *  `chunksize` : int, default None Return TextFileReader object for iteration\n* Also keep in mind that this may be helpful when automating:   `skiprows = nend - nrows`\n\n\n\n\u003cbr\u003e\n\n## Pandas and Dask (Parallel Processing)\n\u003chr\u003e\n\n\n\n*Ref:* https://towardsdatascience.com/how-to-handle-large-datasets-in-python-with-pandas-and-dask-34f43a897d55 \n\n\u003cbr\u003e\n\n## Databases\n\u003chr\u003e\nFor really large files then using a database with map reduce to get the contents would be the best route.  \n\nThe general process for SQLite is:\n1.  Create database \n   ```\n    conn = sqlite3.connect('pts.db')\n    c = conn.cursor()\n   ```\n\n2.  Create Table  \n   ```\n   c.execute('''CREATE TABLE ptsdata (filename, line, x, y, z''')\n   ```\n\n3.  Insert Data  \n   ```\n   c.execute(\"INSERT INTO ptsdata VALUES (filename, lineNumber, x, y, z)\")\n   ```  \n\n4.  Query Data  \n   ```\n   c.execute(\"SELECT lineNumber, x, y, z FROM ptsdata WHERE filename=file.txt ORDER BY lineNumber ASC\")\n   ```  \n\n5.  Get n results  \n   ```\n   c.fetchmany(size=n)\n   ```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fejw-data%2Fpython-open-files","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fejw-data%2Fpython-open-files","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fejw-data%2Fpython-open-files/lists"}