Spaces:

lqume
/

neochar

Sleeping

App Files Files Community

neochar / README.md

lqume

Update README.md

9f01a90 verified 4 months ago

preview code

raw

history blame contribute delete

3.16 kB

	---
	title: Neochar
	emoji: 🖼
	colorFrom: purple
	colorTo: red
	sdk: gradio
	sdk_version: 5.25.0
	app_file: app.py
	pinned: false
	license: openrail
	short_description: Unwritten Chinese Charecters in Style
	---

	# What is this?

	Generate New Characters by combining parts in creative ways. Write them in a controlled style.

	- Inspired by
	- Lin Yutang's [Ming-Kwai typewriter](https://en.wikipedia.org/wiki/Chinese_typewriter#MingKwai_design)
	- Wu Yue's [Glyffuser](https://yue-here.com/posts/glyffuser/)

	# Why

	- Fun to generate valid but unseen characters. (Never in a dictionary, nor Unicode).
	- Implements Lin Yutang's ideas with generative AI/ML, without the mechanical marvel :-/ or limitations :-)
	- Extends a font to support new charsets, and beyond to non-existent chars.
	- Adds variation/diversity/personality to generated images. No boring duplicates from the same char.
	- Other [Creative Uses](#creative-uses)

	# How to use this app
	- Combine components or radicals in the following way
	- Specify the 'Structure' and 'Components', in a [Polish Notation](https://en.wikipedia.org/wiki/Polish_notation) fashion - Good for tree structures
	- ⿰: 'LR' Left-Rigth
	- ⿱: 'TB' Top-Bottom
	- ⿸: 'TL' Top-Left
	- ⿹: 'TR' Top-Right
	- ⿺: 'BL' Bottom-Left
	- ⿴: 'OI' Outer-Inner
	- ⿻: 'OV' Overlap
	- ⿲: 'LMR' Left-Middle-Right
	- ⿳: 'TMB' Top-Middle-Bottom
	- ⿵: 'BT' Bottom Open Enclosure
	- ⿶: 'CT' Top Open Enclosure
	- ⿷: 'RT' Right Open Enclosure
	- Select a 'Style' by clicking the sample images
	- Hit the 'Generate' button
	- Repeat

	# Usage Tips
	- Simple structures work best (⿰ ⿱ ⿴ etc.)
	- "Known radicals at seen positions" work best (釒on left better than right, but may also surprise you in a good way)
	- Noto font family (sans and serif) gives the best results, as there are many training examples
	- Cursive and handwritten styles usually give good results, as they are more tolerant
	- Fonts supporting less chars are challenging
	- Current model was trained with 300k samples for only 20 epochs
	- Training will continue if this app gets attention or likes

	- For dictionary chars, [decompose](https://github.com/cburgmer/cjklib/blob/master/cjklib/data/characterdecomposition.csv) first.
	- For a part hard to describe, or you don't care, use a wildcard '？' (full-width question mark, or does it matter?)

	- What to do when the results are not as expected
	- Pick a different 'sytle' which may have trained the model better
	- Try again with a different random seed. This will change the overall structure in an unpredictable way
	- Try again with a different 'step' number. This will change the local details in a continuous way

	# Creative Uses
	## Turning a bug into a feature
	When you see a funny result you didn't expect (5 or 3 dots while it should be 4), don't throw it away immediately.
	- Save the results to confuse/train OCR
	- 3vade 3vil c3nsorship
	- Share in discussion. The input text/seed/step will reliably reproduce the result.

	# Future Features
	- Typewriter keyboard for hard-to-input radicals, filtered by pinyin prefix
	- Direct generation from a single char, auto decomposition