neochar / README.md
lqume's picture
Update README.md
9f01a90 verified
---
title: Neochar
emoji: 🖼
colorFrom: purple
colorTo: red
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
license: openrail
short_description: Unwritten Chinese Charecters in Style
---
# What is this?
Generate New Characters by combining parts in creative ways. Write them in a controlled style.
- Inspired by
- Lin Yutang's [Ming-Kwai typewriter](https://en.wikipedia.org/wiki/Chinese_typewriter#MingKwai_design)
- Wu Yue's [Glyffuser](https://yue-here.com/posts/glyffuser/)
# Why
- Fun to generate valid but unseen characters. (Never in a dictionary, nor Unicode).
- Implements Lin Yutang's ideas with generative AI/ML, without the mechanical marvel :-/ or limitations :-)
- Extends a font to support new charsets, and beyond to non-existent chars.
- Adds variation/diversity/personality to generated images. No boring duplicates from the same char.
- Other [Creative Uses](#creative-uses)
# How to use this app
- Combine components or radicals in the following way
- Specify the 'Structure' and 'Components', in a [Polish Notation](https://en.wikipedia.org/wiki/Polish_notation) fashion - Good for tree structures
- ⿰: 'LR' Left-Rigth
- ⿱: 'TB' Top-Bottom
- ⿸: 'TL' Top-Left
- ⿹: 'TR' Top-Right
- ⿺: 'BL' Bottom-Left
- ⿴: 'OI' Outer-Inner
- ⿻: 'OV' Overlap
- ⿲: 'LMR' Left-Middle-Right
- ⿳: 'TMB' Top-Middle-Bottom
- ⿵: 'BT' Bottom Open Enclosure
- ⿶: 'CT' Top Open Enclosure
- ⿷: 'RT' Right Open Enclosure
- Select a 'Style' by clicking the sample images
- Hit the 'Generate' button
- Repeat
# Usage Tips
- Simple structures work best (⿰ ⿱ ⿴ etc.)
- "Known radicals at seen positions" work best (釒on left better than right, but may also surprise you in a good way)
- Noto font family (sans and serif) gives the best results, as there are many training examples
- Cursive and handwritten styles usually give good results, as they are more tolerant
- Fonts supporting less chars are challenging
- Current model was trained with 300k samples for only 20 epochs
- Training will continue if this app gets attention or likes
- For dictionary chars, [decompose](https://github.com/cburgmer/cjklib/blob/master/cjklib/data/characterdecomposition.csv) first.
- For a part hard to describe, or you don't care, use a wildcard '?' (full-width question mark, or does it matter?)
- What to do when the results are not as expected
- Pick a different 'sytle' which may have trained the model better
- Try again with a different random seed. This will change the overall structure in an unpredictable way
- Try again with a different 'step' number. This will change the local details in a continuous way
# Creative Uses
## Turning a bug into a feature
When you see a funny result you didn't expect (5 or 3 dots while it should be 4), don't throw it away immediately.
- Save the results to confuse/train OCR
- 3vade 3vil c3nsorship
- Share in discussion. The input text/seed/step will reliably reproduce the result.
# Future Features
- Typewriter keyboard for hard-to-input radicals, filtered by pinyin prefix
- Direct generation from a single char, auto decomposition