Threads for dvaun

  1. 2

    If anyone has the time to answer, I’d love to hear your feedback/anecdotes about working on interesting ideas. esp. if it was a challenge to get started.

    What I want to do is make an accessible interface for California (USA) State Education data. I work with county-level data at my workplace and would like to have a tool for easy comparison of historical information against the data we house.

    The problem is…I’m not sure when what I have is “good enough”. Would starting with a small search-interface be alright to publish? I’m wondering if I’m over-thinking this and should just put up something bare-bones.

    The data is small so that’s not a problem. I guess that what I’m unsure about is how much work should I put into the interface before sharing it?

    How would any of you begin your hobby/interest programs?

    1. 2

      Make something that is useful to you. Share it with people you know who have similar interests and might find it useful as well. Listen to what they say. If you then want to keep working on it, decide what is in and (more importantly) not in your scope, and work some more. Then gently advertise in communities that are appropriate and full of people who would find such a tool useful and listen more.

    1. 3

      Hm.. more directly related to the title than the post content, the future of Databases sometimes is discussed by a Mike Stonebraker and a group of people he invites for this specific reason every few years, publishing a report afterwards. The Seattle Report on Database Research is believe is the latest.

      1. 2

        The first reference on the ACM page points to a broken URL which did not contain the full report. A quick search gives the following URL which allows you to access the article without a requirement for ACM membership or payment: http://seattle-dbresearch-assessment.cs.washington.edu/

      1. 3

        I think this would be fascinating, and I’d be especially curious how much of an improvement it actually is on discovering new or relatively unknown authors.

        This isn’t built on book texts as far as I know, but to the broader point of finding new authors, I have played around with https://www.literature-map.com/ in the past with decent results, in case you haven’t come across it yet.

        I think to do what you are suggesting though, you could do it via public domain as the @dvaun suggested (but then you have a lack-of-content problem making it less useful), or get indexing agreements with publishers (which can be difficult).

        We kind of do this for scientific articles at my current job, and beyond the challenges of getting more indexing agreements to make it useful, there’s a non-trivial infrastructure cost for storing & processing (and potentially re-processing) all of that content.

        Would be neat if there was someone with a boat load of money willing to fund this! :)

        1. 3

          The https://www.literature-map.com/ resource and other Gnod tools have proven useful to me in the past. Though, the mapping of authors (or music, movies, and other Gnod projects) is quite broad in scope.

          What would be neat to utilize is an engine that would work well in topic exploration within various subjects, including fiction categories (e.g. thrillers, fantasy, historical fiction) as well as fields of research.

          Having a resource like that on-hand, paired with a community of folk interested in putting together curricula for entering and diving into new domains would be awesome. I’d imagine a resource that had the ability to analyze articles and other sources (e.g. Wikipedia entries) in addition to books, in conjunction with an ability to put together these “curriculums” or “paths” would be great for any auto-didacts exploring a new field.

          I am out of my field when discussing this, as I don’t have any ML-experience. If there were a project that worked on this, though, I’d be happy to contribute in other ways.

          Would something like this need to be monetized to be sustainable?

          1. 2

            There is the cost of hosting. However, text is cheap. I can see a tool suite that is as follows

            1. There is a command line tool that, given plain text will spit out a feature vector
            2. The feature vectors are submitted to a central repository and collected into a file of feature vectors, this could be github
            3. You can checkout/download the feature vector file, run another command line tool (or a simple GUI or SPA) on it, mark out your favorite books and the tool will propose an ordered list of books you might like based on the content.

            I really like the idea of trying this out on copyright lapsed books hosted on gutenberg, but surely someone must have done this already.

        1. 2

          I’m looking for a site/program that uses the actual text of books to match up your interests directly with books, rather than other peoples opinions of books.

          How would one go about analyzing text from recently published books and books that haven’t entered the public domain? I can think of a few sources for retrieving digital texts…but those would be illicit and not permissible for use like this—I would think it would apply whether the project was for-profit or not.

          That being said, it would be neat to use books in the public domain from sites like Project Gutenberg.

          1. 1

            I would imagine a tie up between publishing houses and the site. Maintaining the confidentiality of the text via algorithmic or legal means, the aggregator would analyze the full text of the book for the analysis.

            As a POC the site would use publicly available text.

            Another way is to have a program that takes the copyright text you have access to and then generates a feature vector. This feature vector can then be uploaded to the site. I don’t know if this legally constitutes a derived work, but from a common sense view I can’t reconstruct the original text from it, so it should be ok and not a copyright violation because it’s not a reproduction.

          1. 3

            Love It! I was just looking for something like this. Does anyone know of any other single file CSS frameworks that make simple sites look good with minimal work?

            1. 5

              Yeah PicoCSS and SpectreCSS are also really nice. I personally really struggle with CSS myself and would love to see something minimal with just the right stuff to get started building “native”(ish) looking apps too. Not sure if possible? 🤔

              1. 3

                Thanks for the links! I found https://terminalcss.xyz/ which I’ve started using for my tilde server. I love that I can use these simple CSS setups in combination with a Markdown parser and UNIX tools to make a simple templated site without a bunch of kitchen-sink frameworks and CLIs!

              2. 2

                Sakura is a minimal CSS theme that has been around for a while. It can be used as a drop-in.

                1. 4

                  Starting my first newsletter! I started blogging and link curation/commentary as one of my pandemic hobbies, and now I’m obsessed. I tend to send to various different subgroups of my social circle, but I figured now was as good a time as any to consolidate all of that stuff to my “greatest hits” and start sending them out on a weekly basis. Plus, it’ll hold me accountable to continue to blog and learn in public :)

                  1. 4

                    Plus, it’ll hold me accountable to continue to blog and learn in public :)

                    Wow. Great article. I hadn’t framed/considered blogging with this perspective in mind.

                    Many times, in fact, I have come across comments that deride many of the beginner-tutorials that are repeatedly created and shared, such as on Medium. That led me to believe that I should only create blog posts when I feel that I have sufficient knowledge on the topic.

                    Reading this, though, I guess it makes sense that it’s okay to share what you learned because the content may be new to others who read it.

                    Thanks for posting this! This gives me confidence in publicizing my writing :) and now I know at least one project that I’ll want to work on this weekend…

                    1. 1

                      I’m glad you liked it! I’ve been noodling around with the concept of growth in the public eye as a good tool for accountability, and when I read that article it definitely made things click for me :)

                      Yeah, I’m not really a fan of dunking on people for posting beginner topics/tutorials – I understand that folks don’t want a quagmire of introductory information that can occlude search engine results and potentially offer conflicting best practices (hell, even the whole concept of “best practice” implies some sort of solved state for teaching and learning, which is a concept I don’t particularly agree with), but I’ve never been a fan of the gatekeeping that tends to accompany that desire. Of course, there’s a relevant xkcd about that sentiment, too.

                      I’m happy to hear that you found enjoyed this and that it helped you feel more confident about publicizing your writing! That confidence is something I work on for myself every day, and I’m happy to inspire it in others. Hopefully your project goes well!

                    2. 2

                      What a great read about learning in public! This is something I have am now learning to do as well, started off with the little step of making my instagram profile non-private so I can share more photos of my journey.

                      I see you have a ton of great content about learning in public, which is awesome. My question to you is at what point do you package up what you have learnt to share with others? I’m a software dev and forming a habit of writing what I have done seems like it might distract me out of flow if i’m to take notes to share later when i’m working, consequently, I think i’m a little exhausted after coding to write something decent. That’s the little quagmire I find myself in at the moment, I wish to share, just don’t have the method. Any tips?

                      1. 2

                        I’m glad you liked it the article! As I mentioned in another comment, I’ve been trying to wrap my head around the value of sharing “rough clay” and that article definitely helped solidify some of the ideas I’d been chewing on.

                        My question to you is at what point do you package up what you have learnt to share with others? That’s a great question. To be honest, it’s a question I’ve been iterating on for a while and I’m still trying to find out what works best for me. I’m happy to share what my current practice is, though!

                        TL;DR: I take rough notes on my approach before coding it, then I take a break when I’m done coding to clear my head and note force myself to write when I’m exhausted. Later (either before bed or earlier in the following day), I use those notes as a way to re-contextualize myself with my solution and flesh them out if I feel inclined to write more about it.

                        Longer answer: Like you, I’m a software developer and I resonate with the sentiment of feeling too cooked to write anything else after coding for a while. While I occasionally find success in taking notes on what I’m doing as I’m working, I (again, like you) find that it interrupts my flow and it’s only something I do when I’m noodling on a problem, rather than actually working through the implementation.

                        The most successful approach for me, however, has been to write my ideas down before I start coding them (kinda like a scientific notebook; I’ll write a rough hypothesis of what I’m about to attempt to implement, and then my implementation almost serves as my experiment), and then when I’m done with my implementation, I come back and fill in the details later if I feel like what I just did is worth fleshing out in more detail. These initial ideas are intentionally pretty sparse; they’re more like a memory hook that I can come back to as I’m working through my implementation, but they’re helpful as a guide when re-reading my notes later.

                        Here’s some notes from my work journal about a task I did yesterday as an example:

                        Okay, it looks like all of this logic is in ParseProto.scala; 
                        since ParseProto is what inherits the definitions of the 
                        protobuf descriptors and actually composes 
                        all of our algebras together.  
                        
                        The problem is that right now I’m not sure how to 
                        visualize what a protobuf file representation looks like; 
                        I think once I can do that I’ll have a better idea 
                        of where I’m going wrong currently
                        
                        Hell yeah I cracked it.  The tricky part was having to rely 
                        on Google’s `file.getTypeName` call to the protobuf descriptor 
                        but once I figured that out the implementation wasn’t hard
                        

                        Yeah, super rough clay, but the key is that when I read these notes later, it’s much easier to re-contextualize where my head was when I was working through this problem, and so now that I’ve taken a break from coding and don’t feel so exhausted, I’m more inclined to be able to write these ideas into a more detailed explanation of what I just did.

                        Anyway, sorry for the long response, but I hope it’s helpful! Even if my approach doesn’t work for you, hopefully having more data on the process is generally useful.

                        1. 2

                          Thanks so much for such a detailed reply. That approach of writing the approach before makes quite some sense, at the very least it offers a basis for comparison even after one ends up using a different approach. (You can easily pit the two against each other and write up the rationale for changing). That’s super helpful and I think i’m going to give that a shot!

                          Happy writing and i will check your blog often to see how this is going.

                    1. 3

                      I’ve picked up a few blogs over time from reading on HN and other places on Reddit.

                      1. 2

                        Nice to see me featured in your list! Great to hear people enjoy reading my content. 😌

                        1. 2

                          You’re welcome!

                          If it interests you, many of the articles that I especially enjoy are on the topic of your technical reasoning behind choosing various services or your development process. I learned about BunnyCDN through one of those articles and now use it for a few projects of my own :)

                          It’s good content :) I’m happy to share it.

                      1. 1

                        At work I am completing an import of some legacy data held in flat files into our updated system. We had fun digging around in PL/B (also known as Databus) to figure out how the inner workings of this old system worked…now we’re close to retiring it!

                        At home I am continuing MVP work for a nonprofit based on React. It’s a data-viz application that will allow users to search for information about local law enforcement agencies and their activities.