An open API service indexing awesome lists of open source software.

https://github.com/quansight-labs/python-api-inspect

Statistics to better understand how python is used and written
https://github.com/quansight-labs/python-api-inspect

Last synced: 5 months ago
JSON representation

Statistics to better understand how python is used and written

Awesome Lists containing this project

README

          

[SciPy 2019 Lightning Talk](https://docs.google.com/presentation/d/1iGnnNh-qxPOJcuIPxNNNwYhp5whV4RqQCduC6Ar1VVs/edit?usp=sharing)

# Motivation

This is a package with a goal to provide statistics to better
understand how python is used and written.

A package maintainer might ask:
- Can certain functions be depreciated?
- How are my users using my package in tests vs. source vs. notebooks?
- What should I include in tutorials?
- Are new features being adopted?

Python Core Maintainers might ask:
- What are the most and least used stdlib modules?
- Is the community moving away from one module?
- Lets educate PEPs with actual statistics!

This work exposes a [sqlite](https://sqlite.org/index.html) queryable
web api via [datasette](https://github.com/simonw/datasette).

**NOTE:** this dataset is currently extremely biased as we are parsing
the top 4,000 repositories for few scientific libraries in
`data/whitelist`. This is not a representative sample of the python
ecosystem nor the entire scientific python ecosystem. Further work is
needed to make this dataset less biased.

# Interesting Questions

As with any project that provides large datasets interpretation is
even more important than the data itself. Here we provide some guiding
questions.

- [How many files are we looking at?](https://python-api-inspect.aves.io/inspect?sql=++SELECT+count%28*%29+FROM+File)
- [How many repositories are we looking at?](https://python-api-inspect.aves.io/inspect?sql=++SELECT+count%28DISTINCT+project%29+FROM+File)
- [How many distinct namespaces are we inspecting?](https://python-api-inspect.aves.io/inspect?sql=++SELECT+count%28DISTINCT+namespace%29+FROM+FunctionStats)
- [What are the top 10 most popular pandas functions?](https://python-api-inspect.aves.io/inspect?sql=++SELECT+key+AS+function%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+FunctionStats.namespace+%3D+%3Anamespace%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++++AND+File.filename+LIKE+%27%25.ipynb%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc%0D%0A++LIMIT+10&namespace=pandas)
- [What are the top 10 most popular numpy attributes?](https://python-api-inspect.aves.io/inspect?sql=++SELECT+key+AS+function%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+AttributeStats%2C+json_each%28AttributeStats.stats%29%0D%0A++JOIN+File+ON+AttributeStats.id+%3D+File.id%0D%0A++WHERE+AttributeStats.namespace+%3D+%3Anamespace%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++++AND+File.filename+LIKE+%27%25.ipynb%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc%0D%0A++LIMIT+10&namespace=numpy)
- [What are the most depended upon modules by function usage count?](https://python-api-inspect.aves.io/inspect?sql=+SELECT+namespace%2C+sum%28count%29%0D%0A+FROM+%28%0D%0A+SELECT+FunctionStats.namespace+as+namespace%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++++AND+File.filename+LIKE+%27%25.ipynb%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc%0D%0A+%29%0D%0A+GROUP+BY+namespace%0D%0A+ORDER+BY+sum%28count%29+desc%0D%0A+LIMIT+100)
- [What are the top 100 most used stdlib module functions?](https://python-api-inspect.aves.io/inspect?sql=SELECT+key+AS+function%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+FunctionStats.namespace+IN+%28%27string%27%2C+%27re%27%2C+%27difflib%27%2C+%27textwrap%27%2C+%27unicodedata%27%2C%0D%0A++++%27stringprep%27%2C+%27readline%27%2C+%27rlcompleter%27%2C+%27struct%27%2C+%27codecs%27%2C%0D%0A++++%27datetime%27%2C+%27calendar%27%2C+%27collections%27%2C+%27heapq%27%2C+%27bisect%27%2C+%27array%27%2C%0D%0A++++%27weakref%27%2C+%27types%27%2C+%27copy%27%2C+%27pprint%27%2C+%27reprlib%27%2C+%27enum%27%2C%0D%0A++++%27numbers%27%2C+%27math%27%2C+%27cmath%27%2C+%27decimal%27%2C+%27fractions%27%2C+%27random%27%2C%0D%0A++++%27statistics%27%2C+%27itertools%27%2C+%27functools%27%2C+%27operator%27%2C%0D%0A++++%27pathlib%27%2C+%27fileinput%27%2C+%27stat%27%2C+%27filecmp%27%2C+%27tempfile%27%2C+%27glob%27%2C%0D%0A++++%27fnmatch%27%2C+%27linecache%27%2C+%27shutil%27%2C+%27macpath%27%2C+%27pickle%27%2C+%27copyreg%27%2C%0D%0A++++%27shelve%27%2C+%27marshal%27%2C+%27dbm%27%2C+%27sqlite3%27%2C+%27zlib%27%2C+%27gzip%27%2C+%27bz2%27%2C%0D%0A++++%27lzma%27%2C+%27zipfile%27%2C+%27tarfile%27%2C+%27csv%27%2C+%27configparser%27%2C+%27netrc%27%2C%0D%0A++++%27xdrlib%27%2C+%27plistlib%27%2C+%27hashlib%27%2C+%27hmac%27%2C+%27secrets%27%2C+%27os%27%2C+%27io%27%2C%0D%0A++++%27time%27%2C+%27argparse%27%2C+%27getopt%27%2C+%27logging%27%2C+%27getpass%27%2C+%27curses%27%2C%0D%0A++++%27platform%27%2C+%27errno%27%2C+%27ctypes%27%2C+%27threading%27%2C+%27multiprocessing%27%2C%0D%0A++++%27concurrent%27%2C+%27subprocess%27%2C+%27sched%27%2C+%27queue%27%2C+%27_thread%27%2C%0D%0A++++%27_dummy_thread%27%2C+%27dummy_threading%27%2C+%27contextvars%27%2C+%27asyncio%27%2C%0D%0A++++%27socket%27%2C+%27ssl%27%2C+%27select%27%2C+%27selectors%27%2C+%27asyncore%27%2C+%27asynchat%27%2C%0D%0A++++%27signal%27%2C+%27mmap%27%2C+%27email%27%2C+%27json%27%2C+%27mailcap%27%2C+%27mailbox%27%2C%0D%0A++++%27mimetypes%27%2C+%27base64%27%2C+%27binhex%27%2C+%27binascii%27%2C+%27quopri%27%2C+%27uu%27%2C%0D%0A++++%27html%27%2C+%27xml%27%2C+%27webbrowser%27%2C+%27cgi%27%2C+%27cgitb%27%2C+%27wsgiref%27%2C+%27urllib%27%2C%0D%0A++++%27ftplib%27%2C+%27poplib%27%2C+%27imaplib%27%2C+%27nntplib%27%2C+%27smtplib%27%2C+%27smtpd%27%2C%0D%0A++++%27telnetlib%27%2C+%27uuid%27%2C+%27socketserver%27%2C+%27xmlrpc%27%2C+%27ipaddress%27%2C%0D%0A++++%27audioop%27%2C+%27aifc%27%2C+%27sunau%27%2C+%27wave%27%2C+%27chunk%27%2C+%27colorsys%27%2C+%27imghdr%27%2C%0D%0A++++%27sndhdr%27%2C+%27ossaudiodev%27%2C+%27gettext%27%2C+%27locale%27%2C+%27turtle%27%2C+%27cmd%27%2C%0D%0A++++%27shlex%27%2C+%27tkinter%27%2C+%27typing%27%2C+%27pydoc%27%2C+%27doctest%27%2C+%27unittest%27%2C%0D%0A++++%27lib2to3%27%2C+%27test%27%2C+%27bdb%27%2C+%27faulthandler%27%2C+%27pdb%27%2C+%27timeit%27%2C%0D%0A++++%27trace%27%2C+%27tracemalloc%27%2C+%27distutils%27%2C+%27ensurepip%27%2C+%27venv%27%2C%0D%0A++++%27zipapp%27%2C+%27sys%27%2C+%27sysconfig%27%2C+%27builtins%27%2C+%27warnings%27%2C%0D%0A++++%27dataclasses%27%2C+%27contextlib%27%2C+%27abc%27%2C+%27atexit%27%2C+%27traceback%27%2C%0D%0A++++%27__future__%27%2C+%27gc%27%2C+%27inspect%27%2C+%27site%27%2C+%27code%27%2C+%27codeop%27%2C%0D%0A++++%27zipimport%27%2C+%27pkgutil%27%2C+%27modulefinder%27%2C+%27runpy%27%2C+%27importlib%27%2C%0D%0A++++%27parser%27%2C+%27ast%27%2C+%27symtable%27%2C+%27symbol%27%2C+%27token%27%2C+%27keyword%27%2C%0D%0A++++%27tokenize%27%2C+%27tabnanny%27%2C+%27pyclbr%27%2C+%27py_compile%27%2C+%27compileall%27%2C%0D%0A++++%27dis%27%2C+%27pickletools%27%2C+%27formatter%27%2C+%27msilib%27%2C+%27msvcrt%27%2C+%27winreg%27%2C%0D%0A++++%27winsound%27%2C+%27posix%27%2C+%27pwd%27%2C+%27spwd%27%2C+%27grp%27%2C+%27crypt%27%2C+%27termios%27%2C%0D%0A++++%27tty%27%2C+%27pty%27%2C+%27fcntl%27%2C+%27pipes%27%2C+%27resource%27%2C+%27nis%27%2C+%27syslog%27%2C%0D%0A++++%27optparse%27%2C+%27imp%27%2C+%27posixpath%27%2C+%27ntpath%27%29%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc%0D%0A++LIMIT+100)
- [What are the least and most used stdlib modules?](https://python-api-inspect.aves.io/inspect?sql=SELECT+namespace%2C+sum%28count%29%0D%0AFROM+%28%0D%0ASELECT+FunctionStats.namespace+as+%27namespace%27%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+%27count%27%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+FunctionStats.namespace+IN+%28%27string%27%2C+%27re%27%2C+%27difflib%27%2C+%27textwrap%27%2C+%27unicodedata%27%2C%0D%0A++++%27stringprep%27%2C+%27readline%27%2C+%27rlcompleter%27%2C+%27struct%27%2C+%27codecs%27%2C%0D%0A++++%27datetime%27%2C+%27calendar%27%2C+%27collections%27%2C+%27heapq%27%2C+%27bisect%27%2C+%27array%27%2C%0D%0A++++%27weakref%27%2C+%27types%27%2C+%27copy%27%2C+%27pprint%27%2C+%27reprlib%27%2C+%27enum%27%2C%0D%0A++++%27numbers%27%2C+%27math%27%2C+%27cmath%27%2C+%27decimal%27%2C+%27fractions%27%2C+%27random%27%2C%0D%0A++++%27statistics%27%2C+%27itertools%27%2C+%27functools%27%2C+%27operator%27%2C%0D%0A++++%27pathlib%27%2C+%27fileinput%27%2C+%27stat%27%2C+%27filecmp%27%2C+%27tempfile%27%2C+%27glob%27%2C%0D%0A++++%27fnmatch%27%2C+%27linecache%27%2C+%27shutil%27%2C+%27macpath%27%2C+%27pickle%27%2C+%27copyreg%27%2C%0D%0A++++%27shelve%27%2C+%27marshal%27%2C+%27dbm%27%2C+%27sqlite3%27%2C+%27zlib%27%2C+%27gzip%27%2C+%27bz2%27%2C%0D%0A++++%27lzma%27%2C+%27zipfile%27%2C+%27tarfile%27%2C+%27csv%27%2C+%27configparser%27%2C+%27netrc%27%2C%0D%0A++++%27xdrlib%27%2C+%27plistlib%27%2C+%27hashlib%27%2C+%27hmac%27%2C+%27secrets%27%2C+%27os%27%2C+%27io%27%2C%0D%0A++++%27time%27%2C+%27argparse%27%2C+%27getopt%27%2C+%27logging%27%2C+%27getpass%27%2C+%27curses%27%2C%0D%0A++++%27platform%27%2C+%27errno%27%2C+%27ctypes%27%2C+%27threading%27%2C+%27multiprocessing%27%2C%0D%0A++++%27concurrent%27%2C+%27subprocess%27%2C+%27sched%27%2C+%27queue%27%2C+%27_thread%27%2C%0D%0A++++%27_dummy_thread%27%2C+%27dummy_threading%27%2C+%27contextvars%27%2C+%27asyncio%27%2C%0D%0A++++%27socket%27%2C+%27ssl%27%2C+%27select%27%2C+%27selectors%27%2C+%27asyncore%27%2C+%27asynchat%27%2C%0D%0A++++%27signal%27%2C+%27mmap%27%2C+%27email%27%2C+%27json%27%2C+%27mailcap%27%2C+%27mailbox%27%2C%0D%0A++++%27mimetypes%27%2C+%27base64%27%2C+%27binhex%27%2C+%27binascii%27%2C+%27quopri%27%2C+%27uu%27%2C%0D%0A++++%27html%27%2C+%27xml%27%2C+%27webbrowser%27%2C+%27cgi%27%2C+%27cgitb%27%2C+%27wsgiref%27%2C+%27urllib%27%2C%0D%0A++++%27ftplib%27%2C+%27poplib%27%2C+%27imaplib%27%2C+%27nntplib%27%2C+%27smtplib%27%2C+%27smtpd%27%2C%0D%0A++++%27telnetlib%27%2C+%27uuid%27%2C+%27socketserver%27%2C+%27xmlrpc%27%2C+%27ipaddress%27%2C%0D%0A++++%27audioop%27%2C+%27aifc%27%2C+%27sunau%27%2C+%27wave%27%2C+%27chunk%27%2C+%27colorsys%27%2C+%27imghdr%27%2C%0D%0A++++%27sndhdr%27%2C+%27ossaudiodev%27%2C+%27gettext%27%2C+%27locale%27%2C+%27turtle%27%2C+%27cmd%27%2C%0D%0A++++%27shlex%27%2C+%27tkinter%27%2C+%27typing%27%2C+%27pydoc%27%2C+%27doctest%27%2C+%27unittest%27%2C%0D%0A++++%27lib2to3%27%2C+%27test%27%2C+%27bdb%27%2C+%27faulthandler%27%2C+%27pdb%27%2C+%27timeit%27%2C%0D%0A++++%27trace%27%2C+%27tracemalloc%27%2C+%27distutils%27%2C+%27ensurepip%27%2C+%27venv%27%2C%0D%0A++++%27zipapp%27%2C+%27sys%27%2C+%27sysconfig%27%2C+%27builtins%27%2C+%27warnings%27%2C%0D%0A++++%27dataclasses%27%2C+%27contextlib%27%2C+%27abc%27%2C+%27atexit%27%2C+%27traceback%27%2C%0D%0A++++%27__future__%27%2C+%27gc%27%2C+%27inspect%27%2C+%27site%27%2C+%27code%27%2C+%27codeop%27%2C%0D%0A++++%27zipimport%27%2C+%27pkgutil%27%2C+%27modulefinder%27%2C+%27runpy%27%2C+%27importlib%27%2C%0D%0A++++%27parser%27%2C+%27ast%27%2C+%27symtable%27%2C+%27symbol%27%2C+%27token%27%2C+%27keyword%27%2C%0D%0A++++%27tokenize%27%2C+%27tabnanny%27%2C+%27pyclbr%27%2C+%27py_compile%27%2C+%27compileall%27%2C%0D%0A++++%27dis%27%2C+%27pickletools%27%2C+%27formatter%27%2C+%27msilib%27%2C+%27msvcrt%27%2C+%27winreg%27%2C%0D%0A++++%27winsound%27%2C+%27posix%27%2C+%27pwd%27%2C+%27spwd%27%2C+%27grp%27%2C+%27crypt%27%2C+%27termios%27%2C%0D%0A++++%27tty%27%2C+%27pty%27%2C+%27fcntl%27%2C+%27pipes%27%2C+%27resource%27%2C+%27nis%27%2C+%27syslog%27%2C%0D%0A++++%27optparse%27%2C+%27imp%27%2C+%27posixpath%27%2C+%27ntpath%27%29%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc%0D%0A%29%0D%0AGROUP+BY+namespace%0D%0AORDER+BY+sum%28count%29+desc)
- How are the builtin functions used within [source](https://python-api-inspect.aves.io/inspect?sql=SELECT+key+AS+function%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+FunctionStats.namespace+%3D+%27__builtins__%27%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++++AND+File.filename+LIKE+%27%25.py%27%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Ftests%2F%25%27%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Ftest%2F%25%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc) vs. [notebooks](https://python-api-inspect.aves.io/inspect?sql=SELECT+key+AS+function%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+FunctionStats.namespace+%3D+%27__builtins__%27%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++++AND+File.filename+LIKE+%27%25.ipynb%27%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc) vs. [tests](http://python-api-inspect.aves.io/inspect?sql=SELECT+key+AS+function%2C+sum%28json_extract%28value%2C+%27%24.count%27%29%29+as+count%0D%0A++FROM+FunctionStats%2C+json_each%28FunctionStats.stats%29%0D%0A++JOIN+File+ON+FunctionStats.id+%3D+File.id%0D%0A++WHERE+FunctionStats.namespace+%3D+%27__builtins__%27%0D%0A++++AND+File.filename+NOT+LIKE+%27%25%2Fsite-packages%2F%25%27%0D%0A++++AND+File.filename+LIKE+%27%25.py%27%0D%0A++++AND+%28File.filename+LIKE+%27%25%2Ftests%2F%25%27+OR+File.filename+LIKE+%27%25%2Ftest%2F%25%27%29%0D%0A++GROUP+BY+key%0D%0A++ORDER+BY+sum%28json_extract%28value%2C+%27%24.count%27%29%29+desc)?
- [How often are the dunder methods used?](https://python-api-inspect.aves.io/inspect?sql=SELECT+key%2C+sum%28value%29%0D%0AFROM+DefClassStats%2C+json_each%28DefClassStats.stats%2C+%27%24.dunder%27%29%0D%0AGROUP+BY+key%0D%0AORDER+BY+sum%28value%29+desc%0D%0ALIMIT+100)
- [What is the average length of a line of code?](https://python-api-inspect.aves.io/inspect?sql=SELECT+avg%28json_extract%28ContentStats.stats%2C+%27%24.avg_line_length%27%29%29%0D%0AFROM+ContentStats)

# Workflow

This is a package with components that expose a [sqlite
database](https://sqlite.org/index.html) via
[datasette](https://github.com/simonw/datasette). Originally this
package provided csv files with api usage statistics for packages. The
problem is that this cannot anticipate all the questions that users
may have. Thus we have a sql interface to ask custom questions on the
(currently) 6 GB database.

The scripts involved in this work.

1. Assemble list of important repositories/projects that depend on
libraries such as `numpy`, `scipy`, `requests`, `tensorflow`,
etc. This work would not be possible without
[libraries.io](https://libraries.io/) `scripts/librariesio.sh`
2. Construct database by inspecting source code and ast of every
python file and notebook in repositories. `scripts/inspect.sh`
3. Expose sqlite database via datasette `scripts/serve.sh`

# Tests

The tests depend on `pytest`. The tests are a great demostration of
what python-api-inspect can capture.

```shell
pytest
```