Spaces:

awacke1
/

DictionaryLearningAndAnthropic

Sleeping

App Files Files Community

awacke1 commited on May 24, 2024

Commit

aaf590b

verified ·

1 Parent(s): 4fefc43

Update app.py

Browse files

Files changed (1) hide show

app.py +27 -409

app.py CHANGED Viewed

@@ -15,438 +15,58 @@ st.write('''
     Dictionary learning aims to find a sparse representation of the data in the form of a dictionary and a sparse matrix.
 ''')
-# Text input
-text_input = st.text_area("Enter the text to analyze:", value='''
-# 🩺🔍 Search Results
-### 11 Jul 2023 | [FairLay-ML: Intuitive Remedies for Unfairness in Data-Driven  Social-Critical Algorithms](https://arxiv.org/abs/2307.05029) | [⬇️](https://arxiv.org/pdf/2307.05029)
-*Normen Yu, Gang Tan, Saeid Tizpaz-Niari*
-  This thesis explores open-sourced machine learning (ML) model explanation
-tools to understand whether these tools can allow a layman to visualize,
-understand, and suggest intuitive remedies to unfairness in ML-based
-decision-support systems. Machine learning models trained on datasets biased
-against minority groups are increasingly used to guide life-altering social
-decisions, prompting the urgent need to study their logic for unfairness. Due
-to this problem's impact on vast populations of the general public, it is
-critical for the layperson -- not just subject matter experts in social justice
-or machine learning experts -- to understand the nature of unfairness within
-these algorithms and the potential trade-offs. Existing research on fairness in
-machine learning focuses mostly on the mathematical definitions and tools to
-understand and remedy unfair models, with some directly citing user-interactive
-tools as necessary for future work. This thesis presents FairLay-ML, a
-proof-of-concept GUI integrating some of the most promising tools to provide
-intuitive explanations for unfair logic in ML models by integrating existing
-research tools (e.g. Local Interpretable Model-Agnostic Explanations) with
-existing ML-focused GUI (e.g. Python Streamlit). We test FairLay-ML using
-models of various accuracy and fairness generated by an unfairness detector
-tool, Parfait-ML, and validate our results using Themis. Our study finds that
-the technology stack used for FairLay-ML makes it easy to install and provides
-real-time black-box explanations of pre-trained models to users. Furthermore,
-the explanations provided translate to actionable remedies.
----------------
-### 29 Jan 2020 | [stream-learn -- open-source Python library for difficult data stream  batch analysis](https://arxiv.org/abs/2001.11077) | [⬇️](https://arxiv.org/pdf/2001.11077)
-*Pawe{\l} Ksieniewicz, Pawe{\l} Zyblewski*
-  stream-learn is a Python package compatible with scikit-learn and developed
-for the drifting and imbalanced data stream analysis. Its main component is a
-stream generator, which allows to produce a synthetic data stream that may
-incorporate each of the three main concept drift types (i.e. sudden, gradual
-and incremental drift) in their recurring or non-recurring versions. The
-package allows conducting experiments following established evaluation
-methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators
-adapted for data stream classification have been implemented, including both
-simple classifiers and state-of-art chunk-based and online classifier
-ensembles. To improve computational efficiency, package utilises its own
-implementations of prediction metrics for imbalanced binary classification
-tasks.
----------------
-### 16 Oct 2022 | [POTATO: exPlainable infOrmation exTrAcTion framewOrk](https://arxiv.org/abs/2201.13230) | [⬇️](https://arxiv.org/pdf/2201.13230)
-*\'Ad\'am Kov\'acs, Kinga G\'emes, Eszter Ikl\'odi, G\'abor Recski*
-  We present POTATO, a task- and languageindependent framework for
-human-in-the-loop (HITL) learning of rule-based text classifiers using
-graph-based features. POTATO handles any type of directed graph and supports
-parsing text into Abstract Meaning Representations (AMR), Universal
-Dependencies (UD), and 4lang semantic graphs. A streamlit-based user interface
-allows users to build rule systems from graph patterns, provides real-time
-evaluation based on ground truth data, and suggests rules by ranking graph
-features using interpretable machine learning models. Users can also provide
-patterns over graphs using regular expressions, and POTATO can recommend
-refinements of such rules. POTATO is applied in projects across domains and
-languages, including classification tasks on German legal text and English
-social media data. All components of our system are written in Python, can be
-installed via pip, and are released under an MIT License on GitHub.
----------------
-### 01 Aug 2019 | [ProSper -- A Python Library for Probabilistic Sparse Coding with  Non-Standard Priors and Superpositions](https://arxiv.org/abs/1908.06843) | [⬇️](https://arxiv.org/pdf/1908.06843)
-*Georgios Exarchakis, J\"org Bornschein, Abdul-Saboor Sheikh, Zhenwen  Dai, Marc Henniges, Jakob Drefs, J\"org L\"ucke*
-  ProSper is a python library containing probabilistic algorithms to learn
-dictionaries. Given a set of data points, the implemented algorithms seek to
-learn the elementary components that have generated the data. The library
-widens the scope of dictionary learning approaches beyond implementations of
-standard approaches such as ICA, NMF or standard L1 sparse coding. The
-implemented algorithms are especially well-suited in cases when data consist of
-components that combine non-linearly and/or for data requiring flexible prior
-distributions. Furthermore, the implemented algorithms go beyond standard
-approaches by inferring prior and noise parameters of the data, and they
-provide rich a-posteriori approximations for inference. The library is designed
-to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary
-Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis
-(MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding
-(GSC, a recent spike-and-slab sparse coding approach). The algorithms are
-scalable due to a combination of variational approximations and
-parallelization. Implementations of all algorithms allow for parallel execution
-on multiple CPUs and multiple machines for medium to large-scale applications.
-Typical large-scale runs of the algorithms can use hundreds of CPUs to learn
-hundreds of dictionary elements from data with tens of millions of
-floating-point numbers such that models with several hundred thousand
-parameters can be optimized. The library is designed to have minimal
-dependencies and to be easy to use. It targets users of dictionary learning
-algorithms and Machine Learning researchers.
----------------
-### 27 Jul 2020 | [metric-learn: Metric Learning Algorithms in Python](https://arxiv.org/abs/1908.04710) | [⬇️](https://arxiv.org/pdf/1908.04710)
-*William de Vazelhes and CJ Carey and Yuan Tang and Nathalie Vauquier  and Aur\'elien Bellet*
-  metric-learn is an open source Python package implementing supervised and
-weakly-supervised distance metric learning algorithms. As part of
-scikit-learn-contrib, it provides a unified interface compatible with
-scikit-learn which allows to easily perform cross-validation, model selection,
-and pipelining with other machine learning estimators. metric-learn is
-thoroughly tested and available on PyPi under the MIT licence.
----------------
-### 10 Nov 2023 | [Deep Fast Vision: A Python Library for Accelerated Deep Transfer  Learning Vision Prototyping](https://arxiv.org/abs/2311.06169) | [⬇️](https://arxiv.org/pdf/2311.06169)
-*Fabi Prezja*
-  Deep learning-based vision is characterized by intricate frameworks that
-often necessitate a profound understanding, presenting a barrier to newcomers
-and limiting broad adoption. With many researchers grappling with the
-constraints of smaller datasets, there's a pronounced reliance on pre-trained
-neural networks, especially for tasks such as image classification. This
-reliance is further intensified in niche imaging areas where obtaining vast
-datasets is challenging. Despite the widespread use of transfer learning as a
-remedy to the small dataset dilemma, a conspicuous absence of tailored auto-ML
-solutions persists. Addressing these challenges is "Deep Fast Vision", a python
-library that streamlines the deep learning process. This tool offers a
-user-friendly experience, enabling results through a simple nested dictionary
-definition, helping to democratize deep learning for non-experts. Designed for
-simplicity and scalability, Deep Fast Vision appears as a bridge, connecting
-the complexities of existing deep learning frameworks with the needs of a
-diverse user base.
----------------
-### 12 Jul 2021 | [Online Graph Dictionary Learning](https://arxiv.org/abs/2102.06555) | [⬇️](https://arxiv.org/pdf/2102.06555)
-*C\'edric Vincent-Cuaz, Titouan Vayer, R\'emi Flamary, Marco Corneli,  Nicolas Courty*
-  Dictionary learning is a key tool for representation learning, that explains
-the data as linear combination of few basic elements. Yet, this analysis is not
-amenable in the context of graph learning, as graphs usually belong to
-different metric spaces. We fill this gap by proposing a new online Graph
-Dictionary Learning approach, which uses the Gromov Wasserstein divergence for
-the data fitting term. In our work, graphs are encoded through their nodes'
-pairwise relations and modeled as convex combination of graph atoms, i.e.
-dictionary elements, estimated thanks to an online stochastic algorithm, which
-operates on a dataset of unregistered graphs with potentially different number
-of nodes. Our approach naturally extends to labeled graphs, and is completed by
-a novel upper bound that can be used as a fast approximation of Gromov
-Wasserstein in the embedding space. We provide numerical evidences showing the
-interest of our approach for unsupervised embedding of graph datasets and for
-online graph subspace estimation and tracking.
----------------
-### 25 Nov 2021 | [Online Orthogonal Dictionary Learning Based on Frank-Wolfe Method](https://arxiv.org/abs/2103.01484) | [⬇️](https://arxiv.org/pdf/2103.01484)
-*Ye Xue and Vincent Lau*
-  Dictionary learning is a widely used unsupervised learning method in signal
-processing and machine learning. Most existing works of dictionary learning are
-in an offline manner. There are mainly two offline ways for dictionary
-learning. One is to do an alternative optimization of both the dictionary and
-the sparse code; the other way is to optimize the dictionary by restricting it
-over the orthogonal group. The latter one is called orthogonal dictionary
-learning which has a lower complexity implementation, hence, it is more
-favorable for lowcost devices. However, existing schemes on orthogonal
-dictionary learning only work with batch data and can not be implemented
-online, which is not applicable for real-time applications. This paper proposes
-a novel online orthogonal dictionary scheme to dynamically learn the dictionary
-from streaming data without storing the historical data. The proposed scheme
-includes a novel problem formulation and an efficient online algorithm design
-with convergence analysis. In the problem formulation, we relax the orthogonal
-constraint to enable an efficient online algorithm. In the algorithm design, we
-propose a new Frank-Wolfe-based online algorithm with a convergence rate of
-O(ln t/t^(1/4)). The convergence rate in terms of key system parameters is also
-derived. Experiments with synthetic data and real-world sensor readings
-demonstrate the effectiveness and efficiency of the proposed online orthogonal
-dictionary learning scheme.
----------------
-### 14 Jun 2022 | [Supervised Dictionary Learning with Auxiliary Covariates](https://arxiv.org/abs/2206.06774) | [⬇️](https://arxiv.org/pdf/2206.06774)
-*Joowon Lee, Hanbaek Lyu, Weixin Yao*
-  Supervised dictionary learning (SDL) is a classical machine learning method
-that simultaneously seeks feature extraction and classification tasks, which
-are not necessarily a priori aligned objectives. The goal of SDL is to learn a
-class-discriminative dictionary, which is a set of latent feature vectors that
-can well-explain both the features as well as labels of observed data. In this
-paper, we provide a systematic study of SDL, including the theory, algorithm,
-and applications of SDL. First, we provide a novel framework that `lifts' SDL
-as a convex problem in a combined factor space and propose a low-rank projected
-gradient descent algorithm that converges exponentially to the global minimizer
-of the objective. We also formulate generative models of SDL and provide global
-estimation guarantees of the true parameters depending on the hyperparameter
-regime. Second, viewed as a nonconvex constrained optimization problem, we
-provided an efficient block coordinate descent algorithm for SDL that is
-guaranteed to find an $\varepsilon$-stationary point of the objective in
-$O(\varepsilon^{-1}(\log \varepsilon^{-1})^{2})$ iterations. For the
-corresponding generative model, we establish a novel non-asymptotic local
-consistency result for constrained and regularized maximum likelihood
-estimation problems, which may be of independent interest. Third, we apply SDL
-for imbalanced document classification by supervised topic modeling and also
-for pneumonia detection from chest X-ray images. We also provide simulation
-studies to demonstrate that SDL becomes more effective when there is a
-discrepancy between the best reconstructive and the best discriminative
-dictionaries.
----------------
-### 07 Oct 2013 | [Online Unsupervised Feature Learning for Visual Tracking](https://arxiv.org/abs/1310.1690) | [⬇️](https://arxiv.org/pdf/1310.1690)
-*Fayao Liu, Chunhua Shen, Ian Reid, Anton van den Hengel*
-  Feature encoding with respect to an over-complete dictionary learned by
-unsupervised methods, followed by spatial pyramid pooling, and linear
-classification, has exhibited powerful strength in various vision applications.
-Here we propose to use the feature learning pipeline for visual tracking.
-Tracking is implemented using tracking-by-detection and the resulted framework
-is very simple yet effective. First, online dictionary learning is used to
-build a dictionary, which captures the appearance changes of the tracking
-target as well as the background changes. Given a test image window, we extract
-local image patches from it and each local patch is encoded with respect to the
-dictionary. The encoded features are then pooled over a spatial pyramid to form
-an aggregated feature vector. Finally, a simple linear classifier is trained on
-these features.
-  Our experiments show that the proposed powerful---albeit simple---tracker,
-outperforms all the state-of-the-art tracking methods that we have tested.
-Moreover, we evaluate the performance of different dictionary learning and
-feature encoding methods in the proposed tracking framework, and analyse the
-impact of each component in the tracking scenario. We also demonstrate the
-flexibility of feature learning by plugging it into Hare et al.'s tracking
-method. The outcome is, to our knowledge, the best tracker ever reported, which
-facilitates the advantages of both feature learning and structured output
-prediction.
----------------
-### 04 Mar 2024 | [Automated Generation of Multiple-Choice Cloze Questions for Assessing  English Vocabulary Using GPT-turbo 3.5](https://arxiv.org/abs/2403.02078) | [⬇️](https://arxiv.org/pdf/2403.02078)
-*Qiao Wang, Ralph Rose, Naho Orita, Ayaka Sugawara*
-  A common way of assessing language learners' mastery of vocabulary is via
-multiple-choice cloze (i.e., fill-in-the-blank) questions. But the creation of
-test items can be laborious for individual teachers or in large-scale language
-programs. In this paper, we evaluate a new method for automatically generating
-these types of questions using large language models (LLM). The VocaTT
-(vocabulary teaching and training) engine is written in Python and comprises
-three basic steps: pre-processing target word lists, generating sentences and
-candidate word options using GPT, and finally selecting suitable word options.
-To test the efficiency of this system, 60 questions were generated targeting
-academic words. The generated items were reviewed by expert reviewers who
-judged the well-formedness of the sentences and word options, adding comments
-to items judged not well-formed. Results showed a 75% rate of well-formedness
-for sentences and 66.85% rate for suitableword options. This is a marked
-improvement over the generator used earlier in our research which did not take
-advantage of GPT's capabilities. Post-hoc qualitative analysis reveals several
-points for improvement in future work including cross-referencing
-part-of-speech tagging, better sentence validation, and improving GPT prompts.
-13 Dec 2016 | TF.Learn: TensorFlow's High-level Module for Distributed Machine  Learning | ⬇️
-Yuan Tang
-TF.Learn is a high-level Python module for distributed machine learning
-inside TensorFlow. It provides an easy-to-use Scikit-learn style interface to
-simplify the process of creating, configuring, training, evaluating, and
-experimenting a machine learning model. TF.Learn integrates a wide range of
-state-of-art machine learning algorithms built on top of TensorFlow's low level
-APIs for small to large-scale supervised and unsupervised problems. This module
-focuses on bringing machine learning to non-specialists using a general-purpose
-high-level language as well as researchers who want to implement, benchmark,
-and compare their new methods in a structured environment. Emphasis is put on
-ease of use, performance, documentation, and API consistency.
-11 Dec 2019 | Majorization Minimization Technique for Optimally Solving Deep  Dictionary Learning | ⬇️
-Vanika Singhal and Angshul Majumdar
-The concept of deep dictionary learning has been recently proposed. Unlike
-shallow dictionary learning which learns single level of dictionary to
-represent the data, it uses multiple layers of dictionaries. So far, the
-problem could only be solved in a greedy fashion; this was achieved by learning
-a single layer of dictionary in each stage where the coefficients from the
-previous layer acted as inputs to the subsequent layer (only the first layer
-used the training samples as inputs). This was not optimal; there was feedback
-from shallower to deeper layers but not the other way. This work proposes an
-optimal solution to deep dictionary learning whereby all the layers of
-dictionaries are solved simultaneously. We employ the Majorization Minimization
-approach. Experiments have been carried out on benchmark datasets; it shows
-that optimal learning indeed improves over greedy piecemeal learning.
-Comparison with other unsupervised deep learning tools (stacked denoising
-autoencoder, deep belief network, contractive autoencoder and K-sparse
-autoencoder) show that our method supersedes their performance both in accuracy
-and speed.
-17 May 2022 | Applications of Deep Neural Networks with Keras | ⬇️
-Jeff Heaton
-Deep learning is a group of exciting new technologies for neural networks.
-Through a combination of advanced training techniques and neural network
-architectural components, it is now possible to create neural networks that can
-handle tabular data, images, text, and audio as both input and output. Deep
-learning allows a neural network to learn hierarchies of information in a way
-that is like the function of the human brain. This course will introduce the
-student to classic neural network structures, Convolution Neural Networks
-(CNN), Long Short-Term Memory (LSTM), Gated Recurrent Neural Networks (GRU),
-General Adversarial Networks (GAN), and reinforcement learning. Application of
-these architectures to computer vision, time series, security, natural language
-processing (NLP), and data generation will be covered. High-Performance
-Computing (HPC) aspects will demonstrate how deep learning can be leveraged
-both on graphical processing units (GPUs), as well as grids. Focus is primarily
-upon the application of deep learning to problems, with some introduction to
-mathematical foundations. Readers will use the Python programming language to
-implement deep learning using Google TensorFlow and Keras. It is not necessary
-to know Python prior to this book; however, familiarity with at least one
-programming language is assumed.
-26 Feb 2015 | Learning computationally efficient dictionaries and their implementation  as fast transforms | ⬇️
-Luc Le Magoarou (INRIA - IRISA), R'emi Gribonval (INRIA - IRISA)
-Dictionary learning is a branch of signal processing and machine learning
-that aims at finding a frame (called dictionary) in which some training data
-admits a sparse representation. The sparser the representation, the better the
-dictionary. The resulting dictionary is in general a dense matrix, and its
-manipulation can be computationally costly both at the learning stage and later
-in the usage of this dictionary, for tasks such as sparse coding. Dictionary
-learning is thus limited to relatively small-scale problems. In this paper,
-inspired by usual fast transforms, we consider a general dictionary structure
-that allows cheaper manipulation, and propose an algorithm to learn such
-dictionaries --and their fast implementation-- over training data. The approach
-is demonstrated experimentally with the factorization of the Hadamard matrix
-and with synthetic dictionary learning experiments.
-03 Dec 2021 | SSDL: Self-Supervised Dictionary Learning | ⬇️
-Shuai Shao, Lei Xing, Wei Yu, Rui Xu, Yanjiang Wang, Baodi Liu
-The label-embedded dictionary learning (DL) algorithms generate influential
-dictionaries by introducing discriminative information. However, there exists a
-limitation: All the label-embedded DL methods rely on the labels due that this
-way merely achieves ideal performances in supervised learning. While in
-semi-supervised and unsupervised learning, it is no longer sufficient to be
-effective. Inspired by the concept of self-supervised learning (e.g., setting
-the pretext task to generate a universal model for the downstream task), we
-propose a Self-Supervised Dictionary Learning (SSDL) framework to address this
-challenge. Specifically, we first design a $p$-Laplacian Attention Hypergraph
-Learning (pAHL) block as the pretext task to generate pseudo soft labels for
-DL. Then, we adopt the pseudo labels to train a dictionary from a primary
-label-embedded DL method. We evaluate our SSDL on two human activity
-recognition datasets. The comparison results with other state-of-the-art
-methods have demonstrated the efficiency of SSDL.
-05 Jun 2018 | Scikit-learn: Machine Learning in Python | ⬇️
-Fabian Pedregosa, Ga"el Varoquaux, Alexandre Gramfort, Vincent  Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas M"uller,  Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg,  Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher,  Matthieu Perrot, 'Edouard Duchesnay
-Scikit-learn is a Python module integrating a wide range of state-of-the-art
-machine learning algorithms for medium-scale supervised and unsupervised
-problems. This package focuses on bringing machine learning to non-specialists
-using a general-purpose high-level language. Emphasis is put on ease of use,
-performance, documentation, and API consistency. It has minimal dependencies
-and is distributed under the simplified BSD license, encouraging its use in
-both academic and commercial settings. Source code, binaries, and documentation
-can be downloaded from http://scikit-learn.org.
-15 Jul 2020 | Complete Dictionary Learning via $\ell_p$-norm Maximization | ⬇️
-Yifei Shen, Ye Xue, Jun Zhang, Khaled B. Letaief, and Vincent Lau
-Dictionary learning is a classic representation learning method that has been
-widely applied in signal processing and data analytics. In this paper, we
-investigate a family of $\ell_p$-norm ($p>2,p \in \mathbb{N}$) maximization
-approaches for the complete dictionary learning problem from theoretical and
-algorithmic aspects. Specifically, we prove that the global maximizers of these
-formulations are very close to the true dictionary with high probability, even
-when Gaussian noise is present. Based on the generalized power method (GPM), an
-efficient algorithm is then developed for the $\ell_p$-based formulations. We
-further show the efficacy of the developed algorithm: for the population GPM
-algorithm over the sphere constraint, it first quickly enters the neighborhood
-of a global maximizer, and then converges linearly in this region. Extensive
-experiments will demonstrate that the $\ell_p$-based approaches enjoy a higher
-computational efficiency and better robustness than conventional approaches and
-$p=3$ performs the best.
-27 Nov 2023 | Utilizing Explainability Techniques for Reinforcement Learning Model  Assurance | ⬇️
-Alexander Tapley and Kyle Gatesman and Luis Robaina and Brett Bissey  and Joseph Weissman
-Explainable Reinforcement Learning (XRL) can provide transparency into the
-decision-making process of a Deep Reinforcement Learning (DRL) model and
-increase user trust and adoption in real-world use cases. By utilizing XRL
-techniques, researchers can identify potential vulnerabilities within a trained
-DRL model prior to deployment, therefore limiting the potential for mission
-failure or mistakes by the system. This paper introduces the ARLIN (Assured RL
-Model Interrogation) Toolkit, an open-source Python library that identifies
-potential vulnerabilities and critical points within trained DRL models through
-detailed, human-interpretable explainability outputs. To illustrate ARLIN's
-effectiveness, we provide explainability visualizations and vulnerability
-analysis for a publicly available DRL model. The open-source code repository is
-available for download at https://github.com/mitre/arlin.
-19 Sep 2019 | InterpretML: A Unified Framework for Machine Learning Interpretability | ⬇️
-Harsha Nori and Samuel Jenkins and Paul Koch and Rich Caruana
-InterpretML is an open-source Python package which exposes machine learning
-interpretability algorithms to practitioners and researchers. InterpretML
-exposes two types of interpretability - glassbox models, which are machine
-learning models designed for interpretability (ex: linear models, rule lists,
-generalized additive models), and blackbox explainability techniques for
-explaining existing systems (ex: Partial Dependence, LIME). The package enables
-practitioners to easily compare interpretability algorithms by exposing
-multiple methods under a unified API, and by having a built-in, extensible
-visualization platform. InterpretML also includes the first implementation of
-the Explainable Boosting Machine, a powerful, interpretable, glassbox model
-that can be as accurate as many blackbox models. The MIT licensed source code
-can be downloaded from github.com/microsoft/interpret.
-''', height=200)
-#Get user input for the number of dictionary components
 n_components = st.slider('Number of dictionary components', 1, 20, 10)
 if st.button('Analyze'):
     # Perform text preprocessing
     vectorizer = CountVectorizer(stop_words='english')
     X = vectorizer.fit_transform([text_input])
     # Convert sparse matrix to dense numpy array
     X_dense = X.toarray()
     # Perform dictionary learning
     dl = DictionaryLearning(n_components=n_components, transform_algorithm='lasso_lars', random_state=0)
     X_transformed = dl.fit_transform(X_dense)
     dictionary = dl.components_
     # Get the feature names (terms)
     feature_names = vectorizer.get_feature_names_out()
     # Create a DataFrame with dictionary components and their corresponding terms
     df_components = pd.DataFrame(dictionary, columns=feature_names)
     df_components['Component'] = ['Component ' + str(i+1) for i in range(n_components)]
     df_components = df_components.set_index('Component')
     # Display the DataFrame
     st.markdown("### Dictionary Components")
     st.dataframe(df_components)
     # Create a graph of terms and their connections
     G = nx.Graph()
     # Add nodes to the graph
     for term in feature_names:
         G.add_node(term)
     # Add edges to the graph based on co-occurrence in dictionary components
     for i in range(n_components):
         terms = df_components.columns[df_components.iloc[i] > 0]
@@ -454,7 +74,7 @@ if st.button('Analyze'):
             for term2 in terms:
                 if term1 != term2:
                     G.add_edge(term1, term2)
     # Plot the graph
     fig, ax = plt.subplots(figsize=(8, 8))
     pos = nx.spring_layout(G, k=0.3)
@@ -462,6 +82,4 @@ if st.button('Analyze'):
     nx.draw_networkx_edges(G, pos, edge_color='gray', alpha=0.5)
     nx.draw_networkx_labels(G, pos, font_size=8)
     ax.axis('off')
-    st.pyplot(fig)

     Dictionary learning aims to find a sparse representation of the data in the form of a dictionary and a sparse matrix.
 ''')
+# Load text from file
+with open("text_file.txt", "r", encoding="utf-8") as file:
+    text_input = file.read()
+# Text input
+st.text_area("Analyzed Text:", value=text_input, height=200)
+# Get user input for the number of dictionary components
 n_components = st.slider('Number of dictionary components', 1, 20, 10)
 if st.button('Analyze'):
     # Perform text preprocessing
     vectorizer = CountVectorizer(stop_words='english')
     X = vectorizer.fit_transform([text_input])
     # Convert sparse matrix to dense numpy array
     X_dense = X.toarray()
     # Perform dictionary learning
     dl = DictionaryLearning(n_components=n_components, transform_algorithm='lasso_lars', random_state=0)
     X_transformed = dl.fit_transform(X_dense)
     dictionary = dl.components_
     # Get the feature names (terms)
     feature_names = vectorizer.get_feature_names_out()
     # Create a DataFrame with dictionary components and their corresponding terms
     df_components = pd.DataFrame(dictionary, columns=feature_names)
     df_components['Component'] = ['Component ' + str(i+1) for i in range(n_components)]
     df_components = df_components.set_index('Component')
     # Display the DataFrame
     st.markdown("### Dictionary Components")
     st.dataframe(df_components)
+    # Plot the high-use words and terms
+    fig, ax = plt.subplots(figsize=(10, 6))
+    word_counts = df_components.sum(axis=0).sort_values(ascending=False)[:20]
+    ax.bar(word_counts.index, word_counts.values)
+    ax.set_xticklabels(word_counts.index, rotation=45, ha='right')
+    ax.set_xlabel('Words/Terms')
+    ax.set_ylabel('Count')
+    ax.set_title('High-Use Words and Terms')
+    st.pyplot(fig)
     # Create a graph of terms and their connections
     G = nx.Graph()
     # Add nodes to the graph
     for term in feature_names:
         G.add_node(term)
     # Add edges to the graph based on co-occurrence in dictionary components
     for i in range(n_components):
         terms = df_components.columns[df_components.iloc[i] > 0]
             for term2 in terms:
                 if term1 != term2:
                     G.add_edge(term1, term2)
     # Plot the graph
     fig, ax = plt.subplots(figsize=(8, 8))
     pos = nx.spring_layout(G, k=0.3)
     nx.draw_networkx_edges(G, pos, edge_color='gray', alpha=0.5)
     nx.draw_networkx_labels(G, pos, font_size=8)
     ax.axis('off')
+    st.pyplot(fig)