Collection of NLP model explanations and accompanying analysis tools
A few minor improvements and fixes were needed after over a year of radio silence on my part. The new version of thermostat-datasets includes
from thermostat.explain import explain_custom_data
takes a .jsonnet config file and will run the same code that produced the already existing Thermostat datasets. It should work on most Hugging Face datasets. In most cases, you need to specify text_field
in dataset
of your config.
Thanks to @g8a9 for fixing the issue with scikit-learn
!
Give ferret a try if you don't know it yet and are interested in explainability benchmarks! :smile:
Lastly, I want to promote the exciting new library for interpreting sequence generation models by Gabriele Sarti, Ludwig Sickert, Oskar van der Wal, Malvina Nissim, Arianna Bisazza and myself. Inseq lets you attribute entire datasets and visualize attributions in the form of matrices to explain the behavior of state-of-the-art LLMs and other sequence generation models. Since this library is much more recent and has more exciting functionalities, I will probably not do much maintenance with Thermostat in the future and instead focus on improving Inseq.
Inseq will be presented at ACL 2023 alongside my new Saliency Map Verbalization paper. Hoping to see you in Toronto! 🍁
thermostat-datasets
is out now via PyPI!Thank you very much for the overwhelming response to this project!
Thanks to @aj280192 there are now two new explainers, LayerDeepLiftShap and LayerGradientShap, from Captum that have been applied to all four datasets, IMDb, MNLI, XNLI and AG News.
Unfortunately, XLNet explanations could not be produced due to an issue with Captum, but all four other models, ALBERT, BERT, ELECTRA and RoBERTa are available for both explainers.
A minor issue with the import of tqdm has also been fixed.
Note: I've been experiencing issues with the .render()
function in Google Colab that displays heatmaps using displaCy. The next update will include an alternative engine such as ipymarkup.
pip install thermostat-datasets