1. 59
  1.  

  2. 11

    Very interesting and solid work keep it up.

    I think Terry was more right than most of us would dare to admit. Playing with the thought of setting up a fund to build a statue in his honor!!

    I saw your Toki-Pona project at your GitHub page and it caught my interest. Perhaps a bit off topic but would you mind telling us more?

    1. 10

      So I’ve gone through a few rounds of this. The first one is my chatbot ilo Kesi. It’s pretty poorly documented, but I made a talk that covered a lot of the ideas (and made everyone in the crowd lost as hell). Let me paste from the blogpost about ilo Kesi I’ve been putting off writing because of lack of interest:

      Thinking Different

      Recently I’ve been looking into constructed languages. Constructed languages, or conlangs, allow people to communicate in fundamentally different or interesting ways. Some are based on predicate logic, another is a legacy-free version of Latin, and more as the creators desire to explore the ideas around language. A lot of the core of this idea is to see language not as something we are subjected to, but as an object we can manipulate and bend into a better tool.

      Toki Pona (https://tokipona.org https://tokipona.net) is a good starter language for learning constructed languages. It’s very simple, a sentence (or toki) is made up of a few parts that are never grammatically vague for any correct Toki Pona toki.

      So, we can tokenize the sentence “mi olin e sina” (I love you) with some code and get back the following:

      []tokiponatokens.Sentence{
          {
              {
                  Type:   "subject",
                  Sep:    (*string)(nil),
                  Tokens: {"mi", "olin"},
                  Parts:  nil,
              },
              {
                  Type:   "objectMarker",
                  Sep:    &"e",
                  Tokens: {"sina"},
                  Parts:  nil,
              },
              {
                  Type:   "punctuation",
                  Sep:    (*string)(nil),
                  Tokens: {"period"},
                  Parts:  nil,
              },
          },
      }
      

      Which gets us most of the way there to the structure this sentence is actually representing in pseudo-xultabangu (a metalanguage format I’m experimenting with):

      (toki (subject (pronoun speaker)) (verb olin) (object (pronoun listener)))

      With something like this, we can easily pick out what the user is asking a bot to do from these grammatical features:

      type Request struct {
      	Address []string
      	Action  string
      	Subject string
      	Punct   string
      	Author  string
      
      	Input tokiponatokens.Sentence
      }
      
      func parseRequest(authorID string, inp tokiponatokens.Sentence) (*Request, error) {
      	var result Request
      	result.Author = authorID
      	result.Input = inp
      
      	for _, part := range inp {
      		switch part.Type {
      		case tokiponatokens.PartAddress:
      			for i, pt := range part.Parts {
      				if i == 0 {
      					result.Address = append(result.Address, pt.Tokens[0])
      					continue
      				}
      
      				result.Address = append(result.Address, strings.Title(strings.Join(pt.Tokens, "")))
      			}
      		case tokiponatokens.PartSubject:
      			if len(part.Tokens) == 0 {
      				sub := strings.Title(strings.Join(part.Parts[1].Tokens, ""))
      				result.Subject = sub
      			} else {
      				sub := strings.Join(part.Tokens, " ")
      				result.Subject = sub
      			}
      		case tokiponatokens.PartObjectMarker:
      			act := strings.Join(part.Tokens, ",")
      
      			switch act {
      			case actionWhat, actionMarkov:
      			default:
      				return nil, ErrUnknownAction
      			}
      
      			result.Action = act
      		case tokiponatokens.PartPunctuation:
      			result.Punct = part.Tokens[0]
      		}
      	}
      
      	return &result, nil
      }
      

      Which then lets us pick out sentences like “ilo Kesi o, tenpo ni li seme?” into the user requesting what the current time is, among other things. This allows for the most beautiful switch statements to implement the logic for handling parsed sentences. It’s worth noting that I am using the Toki Pona command form of sentences in order to give this chatbot a command.

      In the future I want to enhance this into a more comprehensive system as the basis for a knowledge engine. I’d use such a thing for project “shitty Alexa”, or my stab at implementing the features of a personal assistant I’d actually use.


      The second round has been la baujmi (lit: proper-name-of is-a-language+understands), and this does a more literal translation of the toki pona AST into Prolog, a-la:

      cadey:staff@om ~/g/s/g/X/x/c/la-baujmi master ✚4 ./rw
      $ ./la-baujmi
      |toki: mi li sona e toki pona
      2019/05/30 10:01:54 registering fact: bridi(verb(sona), subject(mi), object(toki_pona)).
      |toki: mi li sona e seme?
      2019/05/30 10:02:01 Query: bridi(verb(sona), subject(mi), object(A)).
      2019/05/30 10:02:01 found bridi(verb(sona), subject(mi), object(toki_pona)).
      

      or:

      |speech: i understand toki pona
      2019/05/30 10:01:54 registering fact: bridi(verb(sona), subject(mi), object(toki_pona)).
      |speech: i understand what?
      2019/05/30 10:02:01 Query: bridi(verb(sona), subject(mi), object(A)).
      2019/05/30 10:02:01 found bridi(verb(sona), subject(mi), object(toki_pona)).
      

      The next big steps would be to support the rest of the language (see here); but moving internationally and wanting to do this more in lojban directly has been a good part of the reason why this is so primitive.

      Feel free to ask me anything about this, this kind of stuff is super fascinating to me.

      1. 3

        Unfortunately my tablet’s cable to its speaker is cut off so I can’t listen to your talk at the moment, but I’d sure like to.

        I’d take it that you are familar with the Saphir-Worf theory and from that perspective constructed languages become endlessly intressting.

        Your work seems really interested I’ll give it a shot for sure whenever I find the time. The version that translates it into prolog is even more interesting.

        Recently I found another eye opener in lingustics that is somewhat related to Toki Pona namely lingustic primes. The set of lingustic primes is the set of words that can be used to describe all other words in the language. So for example the words used to describe each in a dictionary will and should be a limited set. In fact the amount of words that are primes in English turns out to be surprisingly few.

        I do happen to work on language analysis but currently from a symbl resolution. I find that to strict a grammar and or definition in language can be to restrictive. Doing it from the ground up perhaps even sub-symbol would be useful. For example different dialects of the language might not be supported by a certain model that or a play on words.

        So my interest in the field comes from my work on input methods. As I said I started out on a symbol level. This is also due to my lacking knowledge in how to parse grammar. I’d love to use Toki Pona as a testbed languge. It would be a good experiment to see how the rest of my ideas would work in conjunction with higher level understanding of the text.

        Oh and it’s so nice to find someone else with this rather obscure subject. I really really do that there could be enormous philosopical implications to be drawn from breakthroughs in this field! It’d be cool if we could work against that in some way!!

        http://tbf-rnd.life for more on what I am doing / obsessing with

        1. 1

          I’d take it that you are familar with the Saphir-Worf theory and from that perspective constructed languages become endlessly intressting.

          Yep, I’ve found that I don’t seem to think in language most of the time.

          Your work seems really interested I’ll give it a shot for sure whenever I find the time. The version that translates it into prolog is even more interesting.

          Thanks! It’s kind of sad when few people can really understand it, so I can’t talk about it very easily.

          Recently I found another eye opener in lingustics that is somewhat related to Toki Pona namely lingustic primes.

          Yep, I have a project or two in perma-backlog for creating a somewhat usable language out of the list of semantic primes. I mostly want to try and resolve other language words to their semantic primes to be able to machine-based translate things more easily. Have sentences, adjectives and the like be really just statements of relation (logical or non-logical). I would also like to use this for a story that’s also in perma-backlog.

          For example different dialects of the language might not be supported by a certain model that or a play on words.

          Lojban gets around this with pe’a (linking to la sutysisku because jbovalste seems to be down because of power maint. issues) and assuming everything is in the default literal tone; however people still write metaphorical statements in that literal tone!

          I’d love to use Toki Pona as a testbed languge.

          I’ve found it’s good for this, but then also bad for it because few people actually use it perfectly grammatically correctly, and the ways people end up using it can sometimes be almost englishy syntax skins.

          It would be a good experiment to see how the rest of my ideas would work in conjunction with higher level understanding of the text.

          The big thing is to make sure you parse out mi <predicate> and sina <predicate> apart from mi <descriptor> and sina <descriptor> (though an argument could be made that descriptors in Toki Pona can be used as predicates).

          Oh and it’s so nice to find someone else with this rather obscure subject.

          I know, right? It’s so hard to find anyone who has more than a passing interest in this. It’s especially hard to find people that are good with computers too. I’d love to write a paper at some point.

          1. 2

            . I mostly want to try and resolve other language words to their semantic primes to be able to machine-based translate things more easily. Have sentences, adjectives and the like be really just statements of relation (logical or non-logical). I would also like to use this for a story that’s also in perma-backlog. My initial response as well! I have another idea that I’d like my input engine to be 1:n input output, i.e. that you could have one input signal and multiple out. Also i’d like to see the prediction that is done on character or word level to be juxtaposed onto a sentence level or where it really gets sexy paragraph or higher level.

            One big argument for me doing all of this from a statistical word to word basis is that I simply don’t have the knowledge when it comes to grammar theory. There are advantages but in the end I think there’s a dualism to it. Probably a synthesis between a formal rule based version combined with a statistical method is the way to go.

            I’ll have a lot of catching up to do. Do you have any good reasources to get started on Toki Pona?

            1. 2

              Honestly pu is your best bet for learning Toki Pona. 12 days of sona pi toki pona is also good (and how I got my start).

              1. 2

                I’ll have a look as soon as I find the time, which might be a while

    2. 5

      From your article

      1. Loop through the vocabulary list and count the number of words in it (by the number of word boundaries).
      2. Allocate an integer array big enough for all of the words.
      3. Loop through the vocabulary list again and add each of these words to the words array.

      Since the vocabulary list is pretty safely not going to change at this point, we can omit the first step:

      There’s a history of scribes counting words when transcribing the Bible to make sure they didn’t miss or add a word in the process. I obviously don’t know, but it’s possible Terry was following in that tradition.

      1. 5

        Nah that’s how you do things in C simply. A linked list is tedious and error implement also has some memory overhead.

        Interesting point about counting words though, poor man’s hash function

        1. 4

          I know it’s not profound but this correspondence is just beautiful.

          1. 1

            … and inspiring

          2. 1

            poor man’s hash function

            — or “rich man’s”, given the time expenditure involved.

            1. 1

              :D

        2. 5

          @cadey - Another super cool article! What made you choose Zig for your target language for the port?

          1. 5

            Zig isn’t super painful to work with this close to the “metal”. I tried Rust for a while but its memory management model is really different to how I think about code at a low level. Zig is everything I’ve ever wanted in a compiler. I just wish it was 1.0 already lol.

            1. 4

              Did they tell you to try using data-oriented design to reduce headaches from borrow checker? That helps quite a few folks.

              1. 3

                That is the first time I’ve heard that, but honestly I’m used to thinking of memory as disposable.

                1. 5

                  In that case, there’s even more for you to learn. Going from automatic to manual is always a learning curve. Far as data-driven design and Rust, I found a comment with a link to an example of that. zmitchell said it’s like ECS. Sharing them in case they help with the borrow checker or things completely unrelated to Rust.

                  1. 3

                    The thing is, sometimes it’s good to decide what you need to learn, and when.

                    Not everyone is interested in becoming a bare metal coder (I don’t know @cadey very well and certainly can’t speak for eir preferences and goals) and while I totally agree that everyone should learn about memory management at some point in their coding careers, there’s a lot to be said for using what you know and achieving mastery of that body of knowledge before embarking on a whole new journey.

                    I’ve said it before but this is a mistake I made - I spent years diving from shiny thing to shiny thing, never doubling down and actually building with what I already knew.

                    This article would seem to represent some of that, so if Zig works for @cadey I say good for them :)

                    1. 2

                      That someone tried to use a bare-metal language and is trying another shows they have an interest. I threw out the same tip I give to everyone who has trouble with the borrow checker. cadey is currently enjoying a shiny, new thing called Zig. I like both Rust and Zig projects as attempts to improve on language design. So, just being helpful as usual with no expectations for what folks do with it. They gonna be them, I’m gonna be me. :)

                      1. 3

                        I’m super grateful for the advice. I’ve wanted to use rust but I haven’t really seen the light yet.

                        1. 3

                          Another one to check out is D. It’s a C++ alternative that has a GC, supports not using one, and compiles super fast. It gets little attention but there’s some fans here.

                          1. 2

                            I second investigating D, especially if you already know and use C.

                            A subset of D is interoperable with existing C projects and can be used as a better C, so it’s something you can ease into.

                            The language has multiple implementations and wide platform support.

                        2. 2

                          You’re quite right. @cadey expressed a specific problem and you offered help for that problem. I need to stop beating this dead horse :)

            2. 4

              I’d love to see TempleOS runnable in the browser. Without thinking I went ahead and tried v86, an in-browser virtualization codebase.

              So close yet so far.

              1. 3

                I know, right?

              2. 4

                “Once the system boots, god gets initialized with the contents of every word in the King James Bible. It loads the words something like this:

                Loop through the vocabulary list and count the number of words in it (by the number of word boundaries).

                Allocate an integer array big enough for all of the words.

                Loop through the vocabulary list again and add each of these words to the words array.”

                – Weirdly reminded me of pushcx’s https://www.wellsortedversion.com/

                1. 1

                  I love this, I might order a printed version.

                2. 4

                  Nice blog, great topics so far, keep it going dude.

                  Have to look into this Zig shit now god damn … :)

                  1. 7

                    keep it going dude.

                    Author not a dude.

                    1. 12

                      I am however a minister of the Church of the Latter-Day Dude, my dude. I also have been hit by the so-called “California dude” grammar change that removes gender (or personhood) implications from the word dude. So I’m a dude, he’s a dude, she’s a dude, and we’re all dudes, yeah!

                      No offense was taken on my end :)

                      1. 6

                        I also have been hit by the so-called “California dude” grammar change that removes gender (or personhood) implications from the word dude.

                        I see, nobody informed me of this. Thanks dude!

                        1. 5

                          It’s great. I end up calling my code “dude”.

                          1. 10

                            “The Dude compiles.”

                            1. 6

                              Sometimes when getting weird compiler errors, my response is:

                              “Yeah, well, you know, that’s just, like, your opinion, man.”

                        2. 3

                          Here (Belgrade) everybody do it with a form of bro instead of dude with the exact same semantics and frequency. One other interesting thing is that you absolutely need to have bunch of dudes/bros in the single line, like cmon bro, finish the freaking test bro, already bro. This is very addictive and, for example, my friends gf said to him once that she knows that he was with me because there is a dude between each word :)

                          This is used even for females (Christine bro, sup) and little kids and objects too. Sometimes used for this simulation as well.

                          I guess we need abstract dude at this point.

                    2. 2

                      Thank you very much for this series.

                      1. 8

                        No problem. I’m working on some of his longer videos too. It gets hard to understand at times. Doing this kind of stuff really gives you a view into mental health and how vital it is for it to be taken care of. I’m almost an advocate of “no material cost to patient for any reason” healthcare at this point.

                      2. 2

                        I like the article and the series. However I find your website theme hard to read, contrast an colour wise. Here is a plain text mirror of the article without formatting: https://gopher.floodgap.com/gopher/gw?gopher://txtn.ws:70/0/lobsters/20190531/christine.website/20190530T1038%5fTempleOS%5f%5f2%5f%5f%5fgod%5f%5fthe%5fRandom%5fNumber%5fGenerator.txt

                        1. 2

                          I have been meaning to make a native gopher server for christine.website for a while. How would you suggest I go about doing that?

                          Also I’m sorry about the theme issues. Would a lighter theme option of some kind help?

                          1. 2

                            Native gopherhole would be cool. I do like your content. Theming is not something I would burden you with, since it’s my issue. I can just turn off CSS in Firefox and have the plain text or use reader mode.

                            The simplest gopher hole could just be a bunch of text files in /var/gopher with Pygopherd installed. More fancy stuff could generate a gophermap with some nice text and links.

                            1. 1

                              If you’re still generating from markdown, one way to do it is to stick the markdown files in the gopher hosted directory, generate a gophermap every time you add a new post, and host with something like pygopherd.

                              A simple shell script like ls -ltr --time-style=full-iso blog/*.markdown | while read x ; do set -- $x; echo "1$6 $(head -n 1 $9 | sed 's/^# *//;s/\t.*$//')\t$9"; done can produce a gophermap index for your blog directory, assuming that the title of the blog post is the first line of each markdown file & you index by edit time rather than creation time.

                              1. 1
                                1. 1

                                  Right. Some folks prefer viewing raw markdown over gopher if the formatting matters semantically (which might be the case here, since you’ve got a lot of code blocks), & so that’s what my script above is for: creating a TOC for your blog posts, to be served as markdown.

                                  Alternatively, the formatting could be stripped or selectively stripped, & you can emit plaintext or a gophermap. This wouldn’t necessarily change the TOC substantially (emit ‘0’ as the beginning of each line instead of ‘1’ to render as gophermap). One way of producing text is to render the markdown as html and then render the html through w3m or lynx, but you’re liable to have non-functional headers and footers (since links will disappear).

                                  You can also just host html in a gopherhole (gophermap type code ‘h’), but that sort of defeats the point: we can strip formatting from your website just by visiting it in lynx, after all, & not all gopher clients have html support because ‘h’ is not a completely standard code.

                                  The script sed 's/^/i/;s/\[([^\]]*)\]\(([^\)])\)/\nh\1\tURL:\2\ni/g' will take your markdown and emit a gophermap where everything except links are displayed as markdown but links are navigable (assuming the client supported both ‘h’ typecode and ‘URL:’ extension, which lynx does).

                                  There are some good tutorials floating around on how to create gophermaps. They’re a lot easier than generating HTML, particularly for indexes.