Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jalammar/ecco
Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
https://github.com/jalammar/ecco
explorables language-models natural-language-processing nlp pytorch visualization
Last synced: 30 days ago
JSON representation
Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).
- Host: GitHub
- URL: https://github.com/jalammar/ecco
- Owner: jalammar
- License: bsd-3-clause
- Created: 2020-11-07T10:06:34.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2024-08-15T19:08:06.000Z (3 months ago)
- Last Synced: 2024-10-01T21:41:37.301Z (about 1 month ago)
- Topics: explorables, language-models, natural-language-processing, nlp, pytorch, visualization
- Language: Jupyter Notebook
- Homepage: https://ecco.readthedocs.io
- Size: 4.06 MB
- Stars: 1,973
- Watchers: 24
- Forks: 168
- Open Issues: 37
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
- Authors: AUTHORS.rst
Awesome Lists containing this project
README
.. image:: https://ar.pegg.io/img/ecco-logo-w-800.png
:alt: Ecco Logo.. start-badges
|version| |supported-versions|.. |version| image:: https://img.shields.io/pypi/v/ecco.svg
:alt: PyPI Package latest release
:target: https://pypi.org/project/ecco.. |supported-versions| image:: https://img.shields.io/pypi/pyversions/ecco.svg
:alt: Supported versions
:target: https://pypi.org/project/ecco
.. end-badgesEcco is a python library for explaining Natural Language Processing models using interactive visualizations.
It provides multiple interfaces to aid the explanation and intuition of `Transformer
`_-based language models. Read: `Interfaces for Explaining Transformer Language Models `_.Ecco runs inside Jupyter notebooks. It is built on top of `pytorch
`_ and `transformers
`_.The library is currently an alpha release of a research project. Not production ready. You're welcome to contribute to make it better!
Installation
============.. code-block:: python
# Assuming you had PyTorch previously installed
pip install eccoDocumentation
=============To use the project:
.. code-block:: python
import ecco
# Load pre-trained language model. Setting 'activations' to True tells Ecco to capture neuron activations.
lm = ecco.from_pretrained('distilgpt2', activations=True)# Input text
text = "The countries of the European Union are:\n1. Austria\n2. Belgium\n3. Bulgaria\n4."# Generate 20 tokens to complete the input text.
output = lm.generate(text, generate=20, do_sample=True)
# Ecco will output each token as it is generated.
# 'output' now contains the data captured from this run, including the input and output tokens
# as well as neuron activations and input saliency values.
# To view the input saliency
output.saliency()This does the following:
1. It loads a pretrained Huggingface DistilGPT2 model. It wraps it an ecco ``LM`` object that does useful things (e.g. it calculates input saliency, can collect neuron activations).
2. We tell the model to generate 20 tokens.
3. The model returns an ecco ``OutputSeq`` object. This object holds the output sequence, but also a lot of data generated by the generation run, including the input sequence and input saliency values. If we set ``activations=True`` in ``from_pretrained()``, then this would also contain neuron activation values.
4. ``output`` can now produce various interactive explorables. Examples include:- ``output.saliency()`` to generate input saliency explorable [`Input Saliency Colab Notebook `_]
- ``output.run_nmf()`` to to explore non-negative matrix factorization of neuron activations [`Neuron Activation Colab Notebook `_].. code-block:: python
# To view the input saliency explorable
output.saliency()
# to view input saliency with more details (a bar and % value for each token)
output.saliency(style="detailed")
# output.activations contains the neuron activation values. it has the shape: (layer, neuron, token position)
# We can run non-negative matrix factorization using run_nmf. We pass the number of factors/components to break down into
nmf_1 = output.run_nmf(n_components=10)# nmf_1 now contains the necessary data to create the interactive nmf explorable:
nmf_1.explore()