{"id":13416072,"url":"https://github.com/arogozhnikov/python3_with_pleasure","last_synced_at":"2025-05-15T03:05:04.461Z","repository":{"id":41207589,"uuid":"114542075","full_name":"arogozhnikov/python3_with_pleasure","owner":"arogozhnikov","description":"A short guide on features of Python 3 with examples","archived":false,"fork":false,"pushed_at":"2021-05-03T06:04:40.000Z","size":239,"stargazers_count":3616,"open_issues_count":2,"forks_count":198,"subscribers_count":95,"default_branch":"master","last_synced_at":"2025-04-05T16:01:41.794Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arogozhnikov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-17T14:17:52.000Z","updated_at":"2025-04-01T01:32:49.000Z","dependencies_parsed_at":"2022-08-10T01:43:11.458Z","dependency_job_id":null,"html_url":"https://github.com/arogozhnikov/python3_with_pleasure","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arogozhnikov%2Fpython3_with_pleasure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arogozhnikov%2Fpython3_with_pleasure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arogozhnikov%2Fpython3_with_pleasure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arogozhnikov%2Fpython3_with_pleasure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arogozhnikov","download_url":"https://codeload.github.com/arogozhnikov/python3_with_pleasure/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248606606,"owners_count":21132386,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T21:00:54.055Z","updated_at":"2025-04-12T17:36:57.629Z","avatar_url":"https://github.com/arogozhnikov.png","language":null,"funding_links":[],"categories":["Others","Misc","Python","Python 好资源"],"sub_categories":[],"readme":"# We made it! \n\n*Update (Jan 2020)*. \nPython 2 is now officially retired. Thanks to everyone for making this hard transition to better code happen!\n\n# Migrating to Python 3 with pleasure\n## A short guide on features of Python 3 for data scientists\n\n\nPython became a mainstream language for machine learning and other scientific fields that heavily operate with data;\nit boasts various deep learning frameworks and well-established set of tools for data processing and visualization.\n\nHowever, Python ecosystem co-exists in Python 2 and Python 3, and Python 2 is still used among data scientists.\nBy the end of 2019 the scientific stack will [stop supporting Python2](http://www.python3statement.org).\nAs for numpy, after 2018 any new feature releases will only support [Python3](https://github.com/numpy/numpy/blob/master/doc/neps/dropping-python2.7-proposal.rst). *Update (Sep 2018): same story now with pandas, matplotlib, ipython, jupyter notebook and jupyter lab.*\n\nTo make the transition less frustrating, I've collected a bunch of Python 3 features that you may find useful.\n\n\u003cimg src='https://uploads.toptal.io/blog/image/92216/toptal-blog-image-1457618659472-be2f380fe3aad41333427ecd5a1ec5c5.jpg' width=400 /\u003e\n\nImage from [Dario Bertini post (toptal)](https://www.toptal.com/python/python-3-is-it-worth-the-switch)\n\n## Better paths handling with `pathlib`\n\n`pathlib` is a default module in python3, that helps you to avoid tons of `os.path.join`s:\n\n```python\nfrom pathlib import Path\n\ndataset = 'wiki_images'\ndatasets_root = Path('/path/to/datasets/')\n\ntrain_path = datasets_root / dataset / 'train'\ntest_path = datasets_root / dataset / 'test'\n\nfor image_path in train_path.iterdir():\n    with image_path.open() as f: # note, open is a method of Path object\n        # do something with an image\n```\n\nPreviously it was always tempting to use string concatenation (concise, but obviously bad),\nnow with `pathlib` the code is safe, concise, and readable.\n\nAlso `pathlib.Path` has a bunch of methods and properties, that every python novice previously had to google:\n\n```python\np.exists()\np.is_dir()\np.parts\np.with_name('sibling.png') # only change the name, but keep the folder\np.with_suffix('.jpg') # only change the extension, but keep the folder and the name\np.chmod(mode)\np.rmdir()\n```\n\n`pathlib` should save you lots of time,\nplease see [docs](https://docs.python.org/3/library/pathlib.html) and [reference](https://pymotw.com/3/pathlib/) for more.\n\n\n## Type hinting is now part of the language\n\nExample of type hinting in pycharm: \u003cbr/\u003e\n\u003cimg src='images/pycharm-type-hinting.png' /\u003e\n\nPython is not just a language for small scripts anymore,\ndata pipelines these days include numerous steps each involving different frameworks (and sometimes very different logic).\n\nType hinting was introduced to help with growing complexity of programs, so machines could help with code verification.\nPreviously different modules used custom ways to point [types in docstrings](https://www.jetbrains.com/help/pycharm/type-hinting-in-pycharm.html#legacy)\n(Hint: pycharm can convert old docstrings to fresh type hinting).\n\nAs a simple example, the following code may work with different types of data (that's what we like about python data stack).\n```python\ndef repeat_each_entry(data):\n    \"\"\" Each entry in the data is doubled\n    \u003cblah blah nobody reads the documentation till the end\u003e\n    \"\"\"\n    index = numpy.repeat(numpy.arange(len(data)), 2)\n    return data[index]\n```\n\nThis code e.g. works for `numpy.array` (incl. multidimensional ones), `astropy.Table` and `astropy.Column`, `bcolz`, `cupy`, `mxnet.ndarray` and others.\n\nThis code will work for `pandas.Series`, but in the wrong way:\n```python\nrepeat_each_entry(pandas.Series(data=[0, 1, 2], index=[3, 4, 5])) # returns Series with Nones inside\n```\n\nThis was two lines of code. Imagine how unpredictable behavior of a complex system, because just one function may misbehave.\nStating explicitly which types a method expects is very helpful in large systems, this will warn you if a function was passed unexpected arguments.\n\n```python\ndef repeat_each_entry(data: Union[numpy.ndarray, bcolz.carray]):\n```\n\nIf you have a significant codebase, hinting tools like [MyPy](http://mypy.readthedocs.io) are likely to become part of your continuous integration pipeline.\nA webinar [\"Putting Type Hints to Work\"](https://www.youtube.com/watch?v=JqBCFfiE11g) by Daniel Pyrathon is good for a brief introduction.\n\nSidenote: unfortunately, hinting is not yet powerful enough to provide fine-grained typing for ndarrays/tensors, but [maybe we'll have it once](https://github.com/numpy/numpy/issues/7370), and this will be a great feature for DS.\n\n## Type hinting → type checking in runtime\n\nBy default, function annotations do not influence how your code is working, but merely help you to point code intentions.\n\nHowever, you can enforce type checking in runtime with tools like ... [enforce](https://github.com/RussBaz/enforce),\nthis can help you in debugging (there are many cases when type hinting is not working).\n\n```python\n@enforce.runtime_validation\ndef foo(text: str) -\u003e None:\n    print(text)\n\nfoo('Hi') # ok\nfoo(5)    # fails\n\n\n@enforce.runtime_validation\ndef any2(x: List[bool]) -\u003e bool:\n    return any(x)\n\nany ([False, False, True, False]) # True\nany2([False, False, True, False]) # True\n\nany (['False']) # True\nany2(['False']) # fails\n\nany ([False, None, \"\", 0]) # False\nany2([False, None, \"\", 0]) # fails\n\n```\n\n## \u003cstrike\u003eOther usages of function annotations\u003c/strike\u003e\n\n*Update: starting from python 3.7 this behavior was [deprecated](https://www.python.org/dev/peps/pep-0563/#non-typing-usage-of-annotations), and function annotations should be used for type hinting only. Python 4 will not support other usages of annotations.*\n\nAs mentioned before, annotations do not influence code execution, but rather provide some meta-information,\nand you can use it as you wish.\n\nFor instance, measurement units are a common pain in scientific areas, `astropy` package [provides a simple decorator](http://docs.astropy.org/en/stable/units/quantity.html#functions-that-accept-quantities) to control units of input quantities and convert output to required units\n```python\n# Python 3\nfrom astropy import units as u\n@u.quantity_input()\ndef frequency(speed: u.meter / u.s, wavelength: u.nm) -\u003e u.terahertz:\n    return speed / wavelength\n\nfrequency(speed=300_000 * u.km / u.s, wavelength=555 * u.nm)\n# output: 540.5405405405404 THz, frequency of green visible light\n```\n\nIf you're processing tabular scientific data in python (not necessarily astronomical), you should give `astropy` a shot.\n\nYou can also define your application-specific decorators to perform control / conversion of inputs and output in the same manner.\n\n## Matrix multiplication with @\n\nLet's implement one of the simplest ML models \u0026mdash; a linear regression with l2 regularization (a.k.a. ridge regression):\n\n```python\n# l2-regularized linear regression: || AX - y ||^2 + alpha * ||x||^2 -\u003e min\n\n# Python 2\nX = np.linalg.inv(np.dot(A.T, A) + alpha * np.eye(A.shape[1])).dot(A.T.dot(y))\n# Python 3\nX = np.linalg.inv(A.T @ A + alpha * np.eye(A.shape[1])) @ (A.T @ y)\n```\n\nThe code with `@` becomes more readable and more translatable between deep learning frameworks: same code `X @ W + b[None, :]` for a single layer of perceptron works in `numpy`, `cupy`, `pytorch`, `tensorflow` (and other frameworks that operate with tensors).\n\n## Globbing with `**`\n\nRecursive folder globbing is not easy in Python 2, even though the [glob2](https://github.com/miracle2k/python-glob2) custom module exists that overcomes this. A recursive flag is supported since Python 3.5:\n\n```python\nimport glob\n\n# Python 2\nfound_images = (\n    glob.glob('/path/*.jpg')\n  + glob.glob('/path/*/*.jpg')\n  + glob.glob('/path/*/*/*.jpg')\n  + glob.glob('/path/*/*/*/*.jpg')\n  + glob.glob('/path/*/*/*/*/*.jpg'))\n\n# Python 3\nfound_images = glob.glob('/path/**/*.jpg', recursive=True)\n```\n\nA better option is to use `pathlib` in python3 (minus one import!):\n```python\n# Python 3\nfound_images = pathlib.Path('/path/').glob('**/*.jpg')\n```\nNote: there are [minor differences](https://github.com/arogozhnikov/python3_with_pleasure/issues/16) between `glob.glob`, `Path.glob` and bash globbing.\n\n## Print is a function now\n\nYes, code now has these annoying parentheses, but there are some advantages:\n\n- simple syntax for using file descriptor:\n    ```python\n    print \u003e\u003esys.stderr, \"critical error\"      # Python 2\n    print(\"critical error\", file=sys.stderr)  # Python 3\n    ```\n- printing tab-aligned tables without `str.join`:\n    ```python\n    # Python 3\n    print(*array, sep='\\t')\n    print(batch, epoch, loss, accuracy, time, sep='\\t')\n    ```\n- hacky suppressing / redirection of printing output:\n    ```python\n    # Python 3\n    _print = print # store the original print function\n    def print(*args, **kargs):\n        pass  # do something useful, e.g. store output to some file\n    ```\n    In jupyter it is desirable to log each output to a separate file (to track what's happening after you got disconnected), so you can override `print` now.\n\n    Below you can see a context manager that temporarily overrides behavior of print:\n    ```python\n    @contextlib.contextmanager\n    def replace_print():\n        import builtins\n        _print = print # saving old print function\n        # or use some other function here\n        builtins.print = lambda *args, **kwargs: _print('new printing', *args, **kwargs)\n        yield\n        builtins.print = _print\n\n    with replace_print():\n        \u003ccode here will invoke other print function\u003e\n    ```\n    It is *not* a recommended approach, but a small dirty hack that is now possible.\n- `print` can participate in list comprehensions and other language constructs\n    ```python\n    # Python 3\n    result = process(x) if is_valid(x) else print('invalid item: ', x)\n    ```\n\n\n## Underscores in Numeric Literal (Thousands Separator)\n\n[PEP-515](https://www.python.org/dev/peps/pep-0515/ \"PEP-515\") introduced underscores in Numeric Literals.\nIn Python3, underscores can be used to group digits visually in integral, floating-point, and complex number literals.\n\n```python\n# grouping decimal numbers by thousands\none_million = 1_000_000\n\n# grouping hexadecimal addresses by words\naddr = 0xCAFE_F00D\n\n# grouping bits into nibbles in a binary literal\nflags = 0b_0011_1111_0100_1110\n\n# same, for string conversions\nflags = int('0b_1111_0000', 2)\n```\n\n## f-strings for simple and reliable formatting\n\nThe default formatting system provides a flexibility that is not required in data experiments.\nThe resulting code is either too verbose or too fragile towards any changes.\n\nQuite typically data scientists outputs some logging information iteratively in a fixed format.\nIt is common to have a code like:\n\n```python\n# Python 2\nprint '{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {acc_mean:0.4f}±{acc_std:0.4f} time: {avg_time:3.2f}'.format(\n    batch=batch, epoch=epoch, total_epochs=total_epochs,\n    acc_mean=numpy.mean(accuracies), acc_std=numpy.std(accuracies),\n    avg_time=time / len(data_batch)\n)\n\n# Python 2 (too error-prone during fast modifications, please avoid):\nprint '{:3} {:3} / {:3}  accuracy: {:0.4f}±{:0.4f} time: {:3.2f}'.format(\n    batch, epoch, total_epochs, numpy.mean(accuracies), numpy.std(accuracies),\n    time / len(data_batch)\n)\n```\n\nSample output:\n```\n120  12 / 300  accuracy: 0.8180±0.4649 time: 56.60\n```\n\n**f-strings** aka formatted string literals were introduced in Python 3.6:\n```python\n# Python 3.6+\nprint(f'{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {numpy.mean(accuracies):0.4f}±{numpy.std(accuracies):0.4f} time: {time / len(data_batch):3.2f}')\n```\n\n\n## Explicit difference between 'true division' and 'floor division'\n\nFor data science this is definitely a handy change \n\n```python\ndata = pandas.read_csv('timing.csv')\nvelocity = data['distance'] / data['time']\n```\n\nResults in Python 2 depend on whether 'time' and 'distance' (e.g. measured in meters and seconds) are stored as integers.\nIn Python 3, the result is correct in both cases, because the result of division is float.\n\nAnother case is floor division, which is now an explicit operation:\n\n```python\nn_gifts = money // gift_price  # correct for int and float arguments\n```\n\nIn a nutshell:\n\n```python\n\u003e\u003e\u003e from operator import truediv, floordiv\n\u003e\u003e\u003e truediv.__doc__, floordiv.__doc__\n('truediv(a, b) -- Same as a / b.', 'floordiv(a, b) -- Same as a // b.')\n\u003e\u003e\u003e (3 / 2), (3 // 2), (3.0 // 2.0)\n(1.5, 1, 1.0)\n```\n\nNote, that this applies both to built-in types and to custom types provided by data packages (e.g. `numpy` or `pandas`).\n\n\n## Strict ordering\n\n```python\n# All these comparisons are illegal in Python 3\n3 \u003c '3'\n2 \u003c None\n(3, 4) \u003c (3, None)\n(4, 5) \u003c [4, 5]\n\n# False in both Python 2 and Python 3\n(4, 5) == [4, 5]\n```\n\n- prevents from occasional sorting of instances of different types\n  ```python\n  sorted([2, '1', 3])  # invalid for Python 3, in Python 2 returns [2, 3, '1']\n  ```\n- helps to spot some problems that arise when processing raw data\n\nSidenote: proper check for None is (in both Python versions)\n```python\nif a is not None:\n  pass\n\nif a: # WRONG check for None\n  pass\n```\n\n\n## Unicode for NLP\n\n```python\ns = '您好'\nprint(len(s))\nprint(s[:2])\n```\nOutput:\n- Python 2: `6\\n��`\n- Python 3: `2\\n您好`.\n\n```python\nx = u'со'\nx += 'co' # ok\nx += 'со' # fail\n```\nPython 2 fails, Python 3 works as expected (because I've used russian letters in strings).\n\nIn Python 3 `str`s are unicode strings, and it is more convenient for NLP processing of non-english texts.\n\nThere are other funny things, for instance:\n```python\n'a' \u003c type \u003c u'a'  # Python 2: True\n'a' \u003c u'a'         # Python 2: False\n```\n\n```python\nfrom collections import Counter\nCounter('Möbelstück')\n```\n\n- Python 2: `Counter({'\\xc3': 2, 'b': 1, 'e': 1, 'c': 1, 'k': 1, 'M': 1, 'l': 1, 's': 1, 't': 1, '\\xb6': 1, '\\xbc': 1})`\n- Python 3: `Counter({'M': 1, 'ö': 1, 'b': 1, 'e': 1, 'l': 1, 's': 1, 't': 1, 'ü': 1, 'c': 1, 'k': 1})`\n\nYou can handle all of this in Python 2 properly, but Python 3 is more friendly.\n\n## Preserving order of dictionaries and **kwargs\n\nIn CPython 3.6+ dicts behave like `OrderedDict` by default (and [this is guaranteed in Python 3.7+](https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6)).\nThis preserves order during dict comprehensions (and other operations, e.g. during json serialization/deserialization)\n\n```python\nimport json\nx = {str(i):i for i in range(5)}\njson.loads(json.dumps(x))\n# Python 2\n{u'1': 1, u'0': 0, u'3': 3, u'2': 2, u'4': 4}\n# Python 3\n{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}\n```\n\nSame applies to `**kwargs` (in Python 3.6+), they're kept in the same order as they appear in parameters.\nOrder is crucial when it comes to data pipelines, previously we had to write it in a cumbersome manner:\n```python\nfrom torch import nn\n\n# Python 2\nmodel = nn.Sequential(OrderedDict([\n          ('conv1', nn.Conv2d(1,20,5)),\n          ('relu1', nn.ReLU()),\n          ('conv2', nn.Conv2d(20,64,5)),\n          ('relu2', nn.ReLU())\n        ]))\n\n# Python 3.6+, how it *can* be done, not supported right now in pytorch\nmodel = nn.Sequential(\n    conv1=nn.Conv2d(1,20,5),\n    relu1=nn.ReLU(),\n    conv2=nn.Conv2d(20,64,5),\n    relu2=nn.ReLU())\n)\n```\n\nDid you notice? Uniqueness of names is also checked automatically.\n\n\n## Iterable unpacking\n\n```python\n# handy when amount of additional stored info may vary between experiments, but the same code can be used in all cases\nmodel_paramteres, optimizer_parameters, *other_params = load(checkpoint_name)\n\n# picking two last values from a sequence\n*prev, next_to_last, last = values_history\n\n# This also works with any iterables, so if you have a function that yields e.g. qualities,\n# below is a simple way to take only last two values from a list\n*prev, next_to_last, last = iter_train(args)\n```\n\n## Default pickle engine provides better compression for arrays\n\nPickling is a mechanism to pass data between threads / processes, in particular used inside `multiprocessing` package. \n\n```python\n# Python 2\nimport cPickle as pickle\nimport numpy\nprint len(pickle.dumps(numpy.random.normal(size=[1000, 1000])))\n# result: 23691675\n\n# Python 3\nimport pickle\nimport numpy\nlen(pickle.dumps(numpy.random.normal(size=[1000, 1000])))\n# result: 8000162\n```\n\nThree times less space. And it is *much* faster.\nActually similar compression (but not speed) is achievable with `protocol=2` parameter, but developers typically ignore this option (or simply are not aware of it). \n\nNote: pickle is [not safe](https://docs.python.org/3/library/pickle.html) (and not quite transferrable), so never unpickle data received from an untrusted or unauthenticated source.\n\n## Safer comprehensions\n\n```python\nlabels = \u003cinitial_value\u003e\npredictions = [model.predict(data) for data, labels in dataset]\n\n# labels are overwritten in Python 2\n# labels are not affected by comprehension in Python 3\n```\n\n## Super, simply super()\n\nPython 2 `super(...)` was a frequent source of mistakes in code.\n\n```python\n# Python 2\nclass MySubClass(MySuperClass):\n    def __init__(self, name, **options):\n        super(MySubClass, self).__init__(name='subclass', **options)\n\n# Python 3\nclass MySubClass(MySuperClass):\n    def __init__(self, name, **options):\n        super().__init__(name='subclass', **options)\n```\n\nMore on `super` and method resolution order on [stackoverflow](https://stackoverflow.com/questions/576169/understanding-python-super-with-init-methods).\n\n## Better IDE suggestions with variable annotations\n\nThe most enjoyable thing about programming in languages like Java, C# and alike is that IDE can make very good suggestions,\nbecause type of each identifier is known before executing a program.\n\nIn python this is hard to achieve, but annotations will help you\n- write your expectations in a clear form\n- and get good suggestions from IDE\n\n\u003cimg src='images/variable_annotations.png' /\u003e\u003cbr /\u003e\nThis is an example of PyCharm suggestions with variable annotations.\nThis works even in situations when functions you use are not annotated (e.g. due to backward compatibility).\n\n## Multiple unpacking\n\nHere is how you merge two dicts now:\n```python\nx = dict(a=1, b=2)\ny = dict(b=3, d=4)\n# Python 3.5+\nz = {**x, **y}\n# z = {'a': 1, 'b': 3, 'd': 4}, note that value for `b` is taken from the latter dict.\n```\n\nSee [this thread at StackOverflow](https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression) for a comparison with Python 2.\n\nThe same approach also works for lists, tuples, and sets (`a`, `b`, `c` are any iterables):\n```python\n[*a, *b, *c] # list, concatenating\n(*a, *b, *c) # tuple, concatenating\n{*a, *b, *c} # set, union\n```\n\nFunctions also [support multiple unpacking](https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-448) for `*args` and `**kwargs`:\n```python\n# Python 3.5+\ndo_something(**{**default_settings, **custom_settings})\n\n# Also possible, this code also checks there is no intersection between keys of dictionaries\ndo_something(**first_args, **second_args)\n```\n\n## Future-proof APIs with keyword-only arguments\n\nLet's consider this snippet\n```python\nmodel = sklearn.svm.SVC(2, 'poly', 2, 4, 0.5)\n```\nObviously, an author of this code didn't get the Python style of coding yet (most probably, just jumped from cpp or rust).\nUnfortunately, this is not just question of taste, because changing the order of arguments (adding/deleting) in `SVC` will break this code. In particular, `sklearn` does some reordering/renaming from time to time of numerous algorithm parameters to provide consistent API. Each such refactoring may drive to broken code.\n\nIn Python 3, library authors may demand explicitly named parameters by using `*`:\n```python\nclass SVC(BaseSVC):\n    def __init__(self, *, C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, ... )\n```\n- users have to specify names of parameters `sklearn.svm.SVC(C=2, kernel='poly', degree=2, gamma=4, coef0=0.5)` now\n- this mechanism provides a great combination of reliability and flexibility of APIs\n\n## Data classes\n\nPython 3.7 introduces data classes, a good replacement for `namedtuple` in most cases.\n```python\n@dataclass\nclass Person:\n    name: str\n    age: int\n\n@dataclass\nclass Coder(Person):\n    preferred_language: str = 'Python 3'\n```\n\n`dataclass` decorator takes the job of implementing routine methods for you (initialization, representation, comparison, and hashing when applicable). \nLet's name some features:\n- data classes can be both mutable and immutable\n- default values for fields are supported\n- inheritance\n- data classes are still old good classes: you can define new methods and override existing\n- post-init processing (e.g. to verify consistency) \n\nGeir Arne Hjelle gives a good overview of dataclasses [in his post](https://realpython.com/python-data-classes/).\n\n\n\n\n## Customizing access to module attributes\n\nIn Python you can control attribute access and hinting with `__getattr__` and `__dir__` for any object. Since python 3.7 you can do it for modules too.\n\nA natural example is implementing a `random` submodule of tensor libraries, which is typically a shortcut to skip initialization and passing of RandomState objects. Here's implementation for numpy:  \n```python\n# nprandom.py\nimport numpy\n__random_state = numpy.random.RandomState()\n\ndef __getattr__(name):\n    return getattr(__random_state, name)\n\ndef __dir__():\n    return dir(__random_state)\n    \ndef seed(seed):\n    __random_state = numpy.random.RandomState(seed=seed)\n```\n\nOne can also mix this way functionalities of different objects/submodules. Compare with tricks in [pytorch](https://github.com/pytorch/pytorch/blob/3ce17bf8f6a2c4239085191ea60d6ee51cd620a5/torch/__init__.py#L253-L256) and [cupy](https://github.com/cupy/cupy/blob/94592ecac8152d5f4a56a129325cc91d184480ad/cupy/random/distributions.py).\n\nAdditionally, now one can\n- use it for [lazy loading of submodules](https://snarky.ca/lazy-importing-in-python-3-7/). For example, `import tensorflow` takes **~150MB** of RAM is imports all submodules (and dependencies). \n- use this for [deprecations in API](https://www.python.org/dev/peps/pep-0562/)\n- introduce runtime routing between submodules\n\n## Built-in breakpoint()\n\nJust write `breakpoint()` in the code to invoke debugger.\n```python\n# Python 3.7+, not all IDEs support this at the moment\nfoo()\nbreakpoint()\nbar()\n```\n\nFor remote debugging you may want to try [combining breakpoint() with `web-pdb`](https://hackernoon.com/python-3-7s-new-builtin-breakpoint-a-quick-tour-4f1aebc444c)\n\n\n## Minor: constants in `math` module\n\n```python\n# Python 3\nmath.inf # Infinite float\nmath.nan # not a number\n\nmax_quality = -math.inf  # no more magic initial values!\n\nfor model in trained_models:\n    max_quality = max(max_quality, compute_quality(model, data))\n```\n\n## Minor: single integer type\n\nPython 2 provides two basic integer types, which are `int` (64-bit signed integer) and `long` for long arithmetics (quite confusing after C++).\n\nPython 3 has a single type `int`, which incorporates long arithmetics.\n\nHere is how you check that value is integer:\n\n```python\nisinstance(x, numbers.Integral) # Python 2, the canonical way\nisinstance(x, (long, int))      # Python 2\nisinstance(x, int)              # Python 3, easier to remember\n```\n\nUpdate: first check also works for *other integral types*, such as `numpy.int32`, `numpy.int64`, but others don't. So they're not equivalent.\n\n\n## Other stuff\n\n- `Enum`s are theoretically useful, but\n    - string-typing is already widely adopted in the python data stack\n    - `Enum`s don't seem to interplay with numpy and categorical from pandas\n- coroutines also *sound* very promising for data pipelining (see [slides](http://www.dabeaz.com/coroutines/Coroutines.pdf) by David Beazley), but I don't see their adoption in the wild.\n- Python 3 has [stable ABI](https://www.python.org/dev/peps/pep-0384/)\n- Python 3 supports unicode identifies (so `ω = Δφ / Δt` is ok), but you'd [better use good old ASCII names](https://stackoverflow.com/a/29855176/498892)\n- some libraries e.g. [jupyterhub](https://github.com/jupyterhub/jupyterhub) (jupyter in cloud), django and fresh ipython only support Python 3, so features that sound useless for you are useful for libraries you'll probably want to use once.\n\n\n### Problems for code migration specific for data science (and how to resolve those)\n\n- support for nested arguments [was dropped](https://www.python.org/dev/peps/pep-3113/)\n  ```python\n  map(lambda x, (y, z): x, z, dict.items())\n  ```\n\n  However, it is still perfectly working with different comprehensions:\n  ```python\n  {x:z for x, (y, z) in d.items()}\n  ```\n  In general, comprehensions are also better 'translatable' between Python 2 and 3.\n\n- `map()`, `.keys()`, `.values()`, `.items()`, etc. return iterators, not lists. Main problems with iterators are:\n  - no trivial slicing\n  - can't be iterated twice\n\n  Almost all of the problems are resolved by converting result to list.\n\n- see [Python FAQ: How do I port to Python 3?](https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/) when in trouble\n\n### Main problems for teaching machine learning and data science with python\n\nCourse authors should spend time in the first lectures to explain what is an iterator,\nwhy it can't be sliced / concatenated / multiplied / iterated twice like a string (and how to deal with it).\n\nI think most course authors would be happy to avoid these details, but now it is hardly possible.\n\n# Conclusion\n\nPython 2 and Python 3 have co-existed for almost 10 years, but we *should* move to Python 3.\n\nResearch and production code should become a bit shorter, more readable, and significantly safer after moving to Python 3-only codebase.\n\nRight now most libraries support both Python versions.\nAnd I can't wait for the bright moment when packages drop support for Python 2 and enjoy new language features.\n\nFollowing migrations are promised to be smoother: [\"we will never do this kind of backwards-incompatible change again\"](https://snarky.ca/why-python-3-exists/)\n\n### Links\n\n- [Key differences between Python 2.7 and Python 3.x](http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html)\n- [Python FAQ: How do I port to Python 3?](https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/)\n- [10 awesome features of Python that you can't use because you refuse to upgrade to Python 3](http://www.asmeurer.com/python3-presentation/slides.html)\n- [Trust me, python 3.3 is better than 2.7 (video)](http://pyvideo.org/pycon-us-2013/python-33-trust-me-its-better-than-27.html)\n- [Python 3 for scientists](http://python-3-for-scientists.readthedocs.io/en/latest/)\n\n### License\n\nThis text was published by [Alex Rogozhnikov](https://arogozhnikov.github.io/about/) and [contributors](https://github.com/arogozhnikov/python3_with_pleasure/graphs/contributors) under [CC BY-SA 3.0 License](https://creativecommons.org/licenses/by-sa/3.0/) (excluding images).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farogozhnikov%2Fpython3_with_pleasure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farogozhnikov%2Fpython3_with_pleasure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farogozhnikov%2Fpython3_with_pleasure/lists"}