Spaces:
Running
Running
| This project Implements an LLM-augmented `textgraph` algorithm for | |
| constructing a _lemma graph_ from raw, unstructured text source. | |
| The `TextGraphs` library is based on work developed by | |
| [Derwen](https://derwen.ai/graph) | |
| in 2023 Q2 for customer apps and used in our `Cysoni` | |
| product. | |
| This library integrates code from: | |
| * [`SpanMarker`](https://github.com/tomaarsen/SpanMarkerNER/) | |
| * [`spaCy-DBpedia-Spotlight`](https://github.com/MartinoMensio/spacy-dbpedia-spotlight) | |
| * [`REBEL`](https://github.com/Babelscape/rebel) | |
| * [`OpenNRE`](https://github.com/thunlp/OpenNRE/) | |
| * [`qwikidata`](https://github.com/kensho-technologies/qwikidata) | |
| * [`pulp`](https://github.com/coin-or/pulp) | |
| * [`spaCy`](https://spacy.io/) | |
| * [`HF transformers`](https://huggingface.co/docs/transformers/index) | |
| * [`PyTextRank`](https://github.com/DerwenAI/pytextrank/) | |
| For more background about early efforts which led to this line of inquiry, see the recent talks: | |
| * ["Language, Graphs, and AI in Industry"](https://derwen.ai/s/mqqm) | |
| **Paco Nathan**, K1st World (2023-10-11) ([video](https://derwen.ai/s/4h2kswhrm3gc)) | |
| * ["Language Tools for Creators"](https://derwen.ai/s/rhvg) | |
| **Paco Nathan**, FOSSY (2023-07-13) | |
| The `TextGraphs` library shows integrations of several of these kinds | |
| of components, complemented with use of graph queries, graph algorithms, | |
| and other related tooling. | |
| Admittedly, the results present a "hybrid" approach: | |
| it's not purely "generative" -- whatever that might mean. | |
| A core principle here is to provide results from the natural language | |
| workflows which may be used for expert feedback. | |
| In other words, how can we support means for leveraging | |
| _human-in-the-loop_ (HITL) process? | |
| Another principle has been to create a Python library built to produced | |
| configurable, extensible pipelines. | |
| Care has been given to writing code that can be run concurrently | |
| (e.g., leveraging `asyncio`), using dependencies which have | |
| business-friendly licenses, and paying attention to security concerns. | |
| The library provides three main affordances for AI applications: | |
| 1. With the default settings, one can use `TextGraphs` to extracti ranked key phrases from raw text -- even without using any of the additional deep learning models. | |
| 2. Going a few further steps, one can generate an RDF or LPG graph from raw texts, and make use of _entity linking_, _relation extraction_, and other techniques to ground the natural language parsing by leveraging some knowledge graph which represents a particular domain. Default examples use WikiMedia graphs: DBPedia, Wikidata, etc. | |
| 3. A third set of goals for `TextGraphs` is to provide a "playground" or "gym" for evaluating _graph levels of detail_, i.e., abstraction layers for knowledge graphs, and explore some the emerging work to produced _foundation models_ for knowledge graphs through topological transforms. | |
| Regarding the third point, consider how language parsing produces | |
| graphs by definition, although NLP results tend to be quite _noisy_. | |
| The annotations inferred by NLP pipelines often get thrown out. | |
| This seemed like a good opportunity to generate sample data for | |
| "condensing" graphs into more abstracted representations. | |
| In other words, patterns within the relatively noisy parse results | |
| can be condensed into relatively refined knowledge graph elements. | |
| Note that while the `spaCy` library for NLP plays a central role, the | |
| `TextGraphs` library is not intended to become a `spaCy` pipeline. | |