Despite everything, 2021 was reasonably good for me.
As for next year, well I’ll still be recovering from the accident for most of January and will need physio for a while. So in my first year of a new job I’ll have had to take 2 months off! However the company and team are great and I look forward to keep growing at this company.
It is a good year! I am also starting to prep for interviewing as I have stayed too long with my current employer. I failed to get a promo to staff level earlier in the year despite working my ass off. Felt depressed and burnt out after that debacle. But, in rear sight, I learned that I need to look ahead beyond this company as in “I have a business that is providing my skills for whoever could pay me decent”.
So leetcode and system design I go!
Sorry to hear about your accident. I’ve been wanting to get an ebike as well.. which one did you get, do you recommend it?
It’s made by a UK chain called Halfords. It’s a Carrera Impel im-3. From the brief amount of time I was able to use it, it’s nice. Comfortable to ride, the electric assist is great.
I scrolled down to the conclusion section hoping to learn about why it is slow in some quick words. But there was none :/ Any TLDR folks?
In this case, it is because of known compile time regression in Rust 1.57.0 which will be fixed in 1.58.0. Compared to methods, the answer itself is rather unenlightening.
TL;DR: There was a giant generic type that took forever for the compiler to handle. That type came from a web server library called “warp”, which is famous for its extreme use of generics and playing games with the type system. The fix was to use dynamic dispatch instead of monomorphization, which erases the type so the compiler doesn’t have to keep track of it. If that sounds hard, it’s not. It’s literally just one method on the end of the normal type called “boxed”.
Thanks for putting this together! I’m interested in doing some natural language processing but I’m not sure if it falls under the heading of sentiment analysis. I’d like to extract common themes from a bunch of text snippets, which is sort of like collocations, but with perhaps a little more fuzz. (i.e. a theme doesn’t have to be word-for-word identical) One example would be Yelp’s review highlights (“15 people said something like this…”).
Do you know what tools or techniques people tend to use for this?
If you are looking at reviews (typically subjective text expressing opinions), this could fall into aspect based sentiment analysis. Getting the themes out of texts can be seen as a topic modeling task. Hope that helps :)
This looks quite useful, thanks!
The only real caveat I’d add is that people should be careful assuming off-the-shelf implementations will work for their problem, and make sure to look at the model assumptions, validate whether they’re accurate on your own domain, etc.
Two common pitfalls:
Many papers and systems are explicitly or implicitly targeted at specific kinds of texts and/or specific kinds of sentiments. The less you are like the ones they were built for, the bigger grain of salt to take their outputs with. A large proportion of research targets either mining positive/negative sentiment from product reviews, or mining positive/negative sentiments from news articles. Both of these are pretty structured domains, with relatively short texts and generally not a “literary” style of writing. There are lots of other things you might want to use sentiment analysis for; for example, I’ve seen people want to use it to graph the sentimental trajectories of novels. If you use a system designed for classifying product reviews on a novel, it’ll produce numbers and you can graph the result, but it may or may not show what you claim it does.
Be extra careful if using the extracted sentiment in further processing, since errors can magnify, and worse, are often biased errors. For example, if correlating sentiment with some other variable, you probably can’t assume (without some kind of validation) that even a 95%-accurate sentiment classifier is not going to completely skew your data: if those 5% misclassifications are highly non-random, it can throw everything in the analysis off (and they often are… sentiment classifiers more often fail on whole categories of sentences, versus misclassifying sentences with uniformly random probability).
The short version of this is probably: don’t assume sentiment analysis is a solved problem where you can download something that just works (except in very specific cases).
Thanks for the remark! It is an important information that I left out. I’ve added the warning in relevant section https://github.com/xiamx/awesome-sentiment-analysis#open-source-implementations
This has been posted to Lobsters a couple of times before but your comment really reminds me of this Google paper talking about pitfalls of machine learning.
Machine Learning: The High Interest Credit Card of Technical Debt
Author here. We use this approach to write large REST api running on production. It has helped us catch a few bugs early in the development stage. Would love to hear what you guys think.
I am disappointed at the title of this blog post, because it insufficiently captures the sheer scope and importance of the topic discussed within.
It is an excellent post. It packs a lot of content into a small word count with clear explanations.
thank you for convincing me to read to article instead of just clicking through the comments! I agree it’s an excellent post :)
Agree. Wasn’t expecting it to be that good. Updating my philosophy to give friction a better reputation.
Same, there’s just so much potential to this article