{"id":23871934,"url":"https://github.com/rubynixx/coding_resources","last_synced_at":"2026-04-14T23:34:03.473Z","repository":{"id":270756429,"uuid":"911345467","full_name":"RubyNixx/coding_resources","owner":"RubyNixx","description":"Range of cheat sheets, coding resources, videos, etc that I want to keep track of \u0026 others may find helpful.","archived":false,"fork":false,"pushed_at":"2025-02-05T13:22:58.000Z","size":63,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-22T19:16:00.267Z","etag":null,"topics":["apache-spark","markdown","markdown-language","pyspark","python","spark-sql","t-sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RubyNixx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-02T19:52:42.000Z","updated_at":"2025-02-05T13:23:01.000Z","dependencies_parsed_at":"2025-01-02T22:20:47.445Z","dependency_job_id":"e081b980-d0e1-41b8-891f-76e699e108fd","html_url":"https://github.com/RubyNixx/coding_resources","commit_stats":null,"previous_names":["rubynixx/coding_resources"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyNixx%2Fcoding_resources","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyNixx%2Fcoding_resources/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyNixx%2Fcoding_resources/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyNixx%2Fcoding_resources/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RubyNixx","download_url":"https://codeload.github.com/RubyNixx/coding_resources/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240222499,"owners_count":19767458,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","markdown","markdown-language","pyspark","python","spark-sql","t-sql"],"created_at":"2025-01-03T15:17:27.657Z","updated_at":"2026-04-14T23:34:03.416Z","avatar_url":"https://github.com/RubyNixx.png","language":null,"readme":"# Table of contents\n1. [Introduction](#introduction)\n2. [Pyspark](#pyspark)\n   1. [Basic PySpark operations](#pyspark1)\n3. [python](#python)\n4. [SQL](#sql)\n5. [Markdown](#markdown)\n   1. [Basic Formatting](#markdownsub1)\n   2. [Creating Diagrams](#markdownsub2)\n   3. [Highlighting Syntax](#markdownsub3)\n   4. [Creating Sections](#markdownsub4)\n   5. [Emojis](#markdownsub5)\n   6. [Checklists](#markdownsub6)\n   7. [Collapsed Sections](#markdownsub7)\n\n## Coding Resources sorted by coding language \u003ca name=\"introduction\"\u003e\u003c/a\u003e\n\nRange of cheat sheets, coding resources, videos, etc that I want to keep track of \u0026amp; others may find helpful.\n\n## Pyspark \u003ca name=\"pyspark\"\u003e\u003c/a\u003e :snake:\n\n### Basic PySpark Operations \u003ca name=\"pyspark1\"\u003e\u003c/a\u003e\n[PySpark_RDD_Cheat_Sheet.pdf](https://github.com/user-attachments/files/18294448/PySpark_RDD_Cheat_Sheet.pdf)\nSource: https://www.datacamp.com/cheat-sheet/pyspark-cheat-sheet-spark-in-python\n\nSource: https://aeshantechhub.co.uk/databricks-dbutils-cheat-sheet-and-pyspark-amp-sql-best-practice-cheat-sheet/\n```python\n    # Create a DataFrame from a CSV file:\n    df = spark.read.csv(\"/mnt/datasets/sample.csv\", header=True, inferSchema=True)\n    \n    # Display the first few rows of the DataFrame:\n    df.show(5)\n\n    # Select columns from a DataFrame:\n    df.select(\"column1\", \"column2\").show()\n```\n\u003cb\u003eBest Practice: Avoid using collect()\u003c/b\u003e\n\nAvoid using collect() as it brings all data to the driver node, which can cause memory issues.\n```python\n    # Bad practice - using collect() to bring all data to driver:\n    data = df.collect()\n\n    # Better practice - use show() or take() instead:\n    df.show(5)\n```\n\n\u003cb\u003eRepartitioning DataFrames:\u003c/b\u003e\n\nRepartition your DataFrames to optimize performance when dealing with large datasets.\n```python\n    # Repartition the DataFrame based on a column:\n    df = df.repartition(\"column_name\")\n```\n\n\u003cb\u003eCheck if your df is pandas or pyspark\u003c/b\u003e\n\n```python\nprint(type(example_df))\n\n```\n\n\u003cb\u003eConvert pandas to pyspark df\u003c/b\u003e\n\n```python\nfrom pyspark.sql import SparkSession\n\n# Initialize Spark session if not already done\nspark = SparkSession.builder.getOrCreate()\n\n# Convert Pandas DataFrames to PySpark\nexample_df = spark.createDataFrame(example_df)\n```\n\n\u003cb\u003eJoins\u003c/b\u003e\n\n```python\n.join(df1, fn.col(\"Code\") == fn.col(\"Der_Code\"), how=\"left\")\n```\n\n\u003cb\u003eSee the df in different ways\u003c/b\u003e\n\n```python\n# Prints the columns \u0026 types\ndf.printSchema()\n\n# Prints list of columns in a paragraph format\nprint(df.columns)\n\n#Prints disinct values in a given column\n\n\n```\n\n\u003cb\u003eDrop columns\u003c/b\u003e\n\n```python\ndf = df.drop(\"column1\", \"column2\")\n```\n\n\n***\n\n## Python \u003ca name=\"python\"\u003e\u003c/a\u003e :snake:\n\nList of built-in functions:\n[Beginner friendly build-in function list](https://www.programiz.com/python-programming/methods/built-in/abs)\n\n[w3schools list of functions](https://www.w3schools.com/python/ref_list_append.asp)\n\nSpecific functions:\n\n[Value Counts](https://www.kaggle.com/code/parulpandey/five-ways-to-use-value-counts)\n\nPandas library doc:\n\n[Pandas documentation](https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html)\n\nSome standard data investigation code:\nhttps://medium.com/analytics-vidhya/statistical-analysis-in-python-using-pandas-27c6a4209de2\n\n\n\n\u003cb\u003eCheat Sheets\u003c/b\u003e\n\n[python-cheat-sheet.pdf](https://github.com/user-attachments/files/18294457/python-cheat-sheet.pdf)\n\nSource: https://www.datacamp.com/cheat-sheet/python-for-data-science-a-cheat-sheet-for-beginners\n\n[Numpy_Cheat_Sheet.pdf](https://github.com/user-attachments/files/18294463/Numpy_Cheat_Sheet.pdf)\n\nSource: https://www.datacamp.com/cheat-sheet/numpy-cheat-sheet-data-analysis-in-python\n\n[Pandas_Cheat_Sheet.pdf](https://github.com/user-attachments/files/18294466/Pandas_Cheat_Sheet.pdf)\n\nSource: https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-for-data-science-in-python\n\n[Data_Wrangling_Cheat_Sheet.pdf](https://github.com/user-attachments/files/18294469/Data_Wrangling_Cheat_Sheet.pdf)\n\nSource: https://www.datacamp.com/cheat-sheet/pandas-cheat-sheet-data-wrangling-in-python\n\n[Cheat-Importing Data.pdf](https://github.com/user-attachments/files/18294477/Cheat-Importing.Data.pdf)\n\n[Cheat-Jupyiter Notebooks.pdf](https://github.com/user-attachments/files/18294478/Cheat-Jupyiter.Notebooks.pdf)\n\n[Cheat-Pandas Basics.pdf](https://github.com/user-attachments/files/18294481/Cheat-Pandas.Basics.pdf)\n\n[Python-Cheat-Sheet-for-Scikit-learn-Edureka.pdf](https://github.com/user-attachments/files/18294497/Python-Cheat-Sheet-for-Scikit-learn-Edureka.pdf)\n\n[scikit_learn_cheat.pdf](https://github.com/user-attachments/files/18294504/scikit_learn_cheat.pdf)\n\n[Sklearn-cheat-sheet.pdf](https://github.com/user-attachments/files/18294508/Sklearn-cheat-sheet.pdf)\n\n***\n\n## SQL \u003ca name=\"sql\"\u003e\u003c/a\u003e :scroll:\n\n![sql_cheat_sheet](https://github.com/user-attachments/assets/0d23e459-b4d5-4b84-a2f3-aad771cc3b7a)\n\n\nSQL Best Practices Cheat Sheet\nSource: https://aeshantechhub.co.uk/databricks-dbutils-cheat-sheet-and-pyspark-amp-sql-best-practice-cheat-sheet/\n\nSQL is widely used in Databricks for data querying and transformation. Below are some best practices to keep your queries optimized.\n\n\u003cb\u003eCommon SQL Operations:\u003c/b\u003e\n\n```sql\n    # Creating a table from a DataFrame:\n    df.createOrReplaceTempView(\"temp_table\")\n    \n    # Running a SQL query on the DataFrame:\n    spark.sql(\"SELECT * FROM temp_table\").show()\n```\n\n\u003cb\u003eBest Practice: Use LIMIT when previewing data\u003c/b\u003e\n\n```sql\nAvoid fetching large datasets during development. Use LIMIT to preview data instead.\n\n    # Use LIMIT to preview data in SQL:\n    spark.sql(\"SELECT * FROM temp_table LIMIT 10\").show()\n```\n\n\u003cb\u003eBest Practice: Leverage Caching\u003c/b\u003e\n\n```sql\nCache intermediate results in memory to optimise performance for iterative queries.\n\n    # Cache a DataFrame for future use:\n    df.cache()\n```\n\n\u003cb\u003eAvoid using SELECT * in production\u003c/b\u003e\n\n```sql\nUsing SELECT * can lead to unnecessary data transfer and slow performance, especially with large datasets.\n\n    # Bad practice - using SELECT *:\n    spark.sql(\"SELECT * FROM temp_table\")\n\n    # Better practice - select only needed columns:\n    \n\t\t\t\t\n```\n\n\n***\n\n## Markdown \u003ca name=\"markdown\"\u003e\u003c/a\u003e :blue_book:\n\n\u003cb\u003eComplete formatting cheat sheet:\u003c/b\u003e\n[Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)\n\nSome bits from the above I used alot:\n\n### General Formatting \u003ca name=\"markdownsub1\"\u003e\u003c/a\u003e\n\nStyle\tSyntax\tKeyboard shortcut\tExample\tOutput\n```markdown\nBold\t** ** or __ __\t\nExample: **This is bold text**\n\nItalic\t* * or _ _\nExample: _This text is italicized_\n\nStrikethrough\t~~ ~~\nExample: ~~This was mistaken text~~\n\nBold and nested italic\t** ** and _ _\nExample: **This text is _extremely_ important**\n\nAll bold and italic\t*** ***\nExample: ***All this text is important***\n\nSubscript\t\u003csub\u003e \u003c/sub\u003e\nExample: This is a \u003csub\u003esubscript\u003c/sub\u003e text\n\nSuperscript\t\u003csup\u003e \u003c/sup\u003e\nExample:This is a \u003csup\u003esuperscript\u003c/sup\u003e text\tThis is a superscript text\n\nUnderline\t\u003cins\u003e \u003c/ins\u003e\nExample:This is an \u003cins\u003eunderlined\u003c/ins\u003e text\tThis is an underlined text\n```\n\n\u003cb\u003eHorizontal rules\u003c/b\u003e\n```markdown\nThree or more...\n\n---\n\nHyphens\n\n***\n\nAsterisks\n\n___\n\nUnderscores\n```\n\n\u003cb\u003eInsert a hyperlink\u003c/b\u003e\n```markdown\n[Insert your text here](https://www.google.com)\n```\n\n### Creating Diagrams \u003ca name=\"markdownsub2\"\u003e\u003c/a\u003e\n\nYou can also use code blocks to create diagrams in Markdown. GitHub supports Mermaid, GeoJSON, TopoJSON, and ASCII STL syntax. For more information, see [Creating diagrams](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-diagrams).\n\n### Syntax Highlighting \u003ca name=\"markdownsub3\"\u003e\u003c/a\u003e\nSource: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks\nYou can add an optional language identifier to enable syntax highlighting in your fenced code block.\n\nSyntax highlighting changes the color and style of source code to make it easier to read.\n\nFor example, to syntax highlight Markdown code:\n\n````\n```markdown\nhello this is my markdown\n\nYou can put code in this block \u0026 it will show in a grey box. Change your language as required. More info can be found below on supported languages.\n```\n````\n\nThis will display the code block. If using python, the code block will be formatted with colours.\n\nWhen you create a fenced code block that you also want to have syntax highlighting on a GitHub Pages site, use lower-case language identifiers. For more information, see [About GitHub Pages and Jekyll](https://docs.github.com/en/pages/setting-up-a-github-pages-site-with-jekyll/about-github-pages-and-jekyll#syntax-highlighting).\n\nWe use [Linguist](https://github.com/github-linguist/linguist) to perform language detection and to select [third-party grammars](https://github.com/github-linguist/linguist/blob/main/vendor/README.md) for syntax highlighting. You can find out which keywords are valid in the languages [YAML](https://github.com/github-linguist/linguist/blob/main/lib/linguist/languages.yml) file.\n\n### Creating sections \u003ca name=\"markdownsub4\"\u003e\u003c/a\u003e\n\nTo create a contents and sections in your markdown file, you can use the below code.\nSource: https://stackoverflow.com/questions/11948245/markdown-to-create-pages-and-table-of-contents\n\n```markdown\n# Table of contents\n1. [Introduction](#introduction)\n2. [Some paragraph](#paragraph1)\n    1. [Sub paragraph](#subparagraph1)\n3. [Another paragraph](#paragraph2)\n\n## This is the introduction \u003ca name=\"introduction\"\u003e\u003c/a\u003e\nSome introduction text, formatted in heading 2 style\n\n## Some paragraph \u003ca name=\"paragraph1\"\u003e\u003c/a\u003e\nThe first paragraph text\n\n### Sub paragraph \u003ca name=\"subparagraph1\"\u003e\u003c/a\u003e\nThis is a sub paragraph, formatted in heading 3 style\n\n## Another paragraph \u003ca name=\"paragraph2\"\u003e\u003c/a\u003e\nThe second paragraph text\n```\n\n### Emojis \u003ca name=\"markdownsub5\"\u003e\u003c/a\u003e\n\nSee full list here :smile::\n\nhttps://gist.github.com/rxaviers/7360908\n\n### Create a checklist \u003ca name=\"markdownsub6\"\u003e\u003c/a\u003e\n\nTo create a task list, preface list items with a hyphen and space followed by [ ]. To mark a task as complete, use [x].\n\nSource: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists#creating-task-lists\n\n```markdown\n- [x] #739\n- [ ] https://github.com/octo-org/octo-repo/issues/740\n- [ ] Add delight to the experience when all tasks are complete :tada:\n```\n\n### Create a collapsed section \u003ca name=\"markdownsub7\"\u003e\u003c/a\u003e\n\nMarkdown code:\n````markdown\n\u003cdetails\u003e\n\n\u003csummary\u003eThis is a collapsed section\u003c/summary\u003e\n\n### You can add a header\n\nYou can add text within a collapsed section. \n\nYou can add an image or a code block, too.\n\n```python\n   print(\"Hello World\")\n```\n\n\u003c/details\u003e\n````\n\nPreview how it looks:\n\n\u003cdetails\u003e\n\n\u003csummary\u003eThis is a collapsed section\u003c/summary\u003e\n\n### You can add a header\n\nYou can add text within a collapsed section. \n\nYou can add an image or a code block, too.\n\n```python\n   print(\"Hello World\")\n```\n\n\u003c/details\u003e\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frubynixx%2Fcoding_resources","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frubynixx%2Fcoding_resources","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frubynixx%2Fcoding_resources/lists"}