Author here. Thanks for reading this! I’ve only been programming professionally for a few years; I have no idea what stored procedure use looked like a decade ago. I completely understand the fear, though— splitting application logic out of your application sounds like a future nightmare.
The point with these two modest stored procedures is that they don’t represent application logic, but database logic that should remain in the database. These procedures can instead be thought of as stronger constraints. That’s why they only touch the timestamps and IDs— columns entirely unaffected by the application. If instead we were to, say, set the columns by default values but update the timestamps in the application (a la Rails), we’d move those two procedures into the application but then have database logic split between the application and database. We’d also lose the strength of the constraints— the default IDs and timestamps could be modified in the application, destroying their veracity.
I think the above procedures are basically extensions of e.g. CREATE DOMAIN, and are a statement about the type of the data, not the business logic (although of course that line isn’t a bright one all the time). There was a school of thought (unsurprisingly pushed by Oracle) that e.g. embedded a JVM into the database server and encouraged people to write everything in that context, which is probably what most people revolt against (because it is appalling).
I have personally dealt with a stored procedure, over 5000 lines in length, which was the heart of an auction system. It ran every minute through a scheduler and assembled and output raw html (among other side effects).
It took me several weeks to tear that monster apart and distribute the logic among code and mutiple (better) stored procs. (I ended up with 200 lines of code and several basic CRUD type procs)
The great thing about stored procs is, the database can generate and reuse optimized query plans. The downside is, they allow you to hot deploy business logic changes outside of a full, tested release. So… they can open a Pandora’s box when you have an over eager management team trumping good engineering practices.
I also would just use UUID for my synthetic keys, where SQL’s awful support for multi-part keys forces ones hand.
I wouldn’t worry so much about algorithms and data structures, at least beyond the basics - if you know the rough behaviour of a list, tree and hashmap that’s all that usually comes up, and they’re straightforward to learn on your own. I’d say learn by solving real problems; particularly software design type matters where the best practice makes no sense until you’ve seen a several-hundred-thousand-line codebase. If you can’t do that, try and calibrate your sense of what is and isn’t possible - maybe read through open-source library code, find the gnarly bits and see if you can rewrite them to be less gnarly (this is the key skill for professional programming). Sometimes it’s bad programming and sometimes it’s like that for a reason; being able to tell the difference at a glance will serve you very well.
Totally agree with your sentiment except for interviews. I just finished a job search in San Francisco and some places in the Valley for a senior type position. I had one interviewer actually expect me to implement a totally functional LRU cache on a phone interview with no ability to google anything. I got dinged because I didn’t implement the most efficient cache removal algorithm on my first try (though I knew this was the hotspot and indicated how I would fix it).
What I’m saying is, most of the Bay Area firms I talked to were very picky about hiring and intricate knowledge of data structures and algorithms were table stakes for the interview.
I disagree.
Learning data structures is pretty important, to me its what makes a difference between a decent and a good developer. Learning programing patters and best practices is something that develops overtime, but understand different data structures, their performance, best times to use this or that data structure in the end can make your design decisions much easier and the algorithms much simpler. Some data structures lend themselves to concurrency well while others don’t etc. You don’t need to be able to implement them, but understanding them is paramount.
[Comment removed by author]
Hadoop isn’t a database. Hadoop at its core is a map reduce framework. Its not about “scaling to 10 TB”, its about processing data.
Here’s a simple example:
You generate 100GB of log data a day. For some people, that is a ton of data, for others, not very much. I need to be able to find information in that 100GB within a couple hours. A python script on a single or a java app on a single box won’t cut it. They won’t get me the answer I need in the amount of time that I need it. So I spread the load across many machines. I’m getting the answer that I need but, I’m in a painful scenario of dealing with my home grown cluster of machines.
Hadoop is just a standard framework handling cluster’s of map reduce jobs. A nice chunk of hard work has been done for you. Lots of rough edges have been shaved off.
It doesn’t matter what you think as an outsider about someone’s choice to use Hadoop or other technologies unless you understand their problem. I have data, I need to get an answer in X period of time, to do that I need a cluster. You don’t get to decide what is someone else’s X. Are there people who could use something other than hadoop? Sure. Maybe like some “look at me” blog posts and articles like the point of “that problem could be solved in excel”, maybe it could, maybe a single Python script could handle it. But if that one script dies? What then? No answer. Maybe that person went with Hadoop after looking at the problem and deciding that a job tracker as a SPOF was way more likely to fail them than just a single python script.
How about, we stop assuming our colleagues are idiots and try to understand why people make the tradeoffs that they do. Yes, Hadoop is a beast in many ways. Yeah, operationally, it can be a giant pain. But, there are plenty of reasons people want/need to use it that cynic’s sniping from the sidelines simply won’t see.
How about, we stop assuming our colleagues are idiots and try to understand why people make the tradeoffs that they do. Yes, Hadoop is a beast in many ways. Yeah, operationally, it can be a giant pain.
I have met cases where people want to go with Hadoop based on hype (or to gain career experience), when they have a dataset < 10GB. These people are not “idiots” per se, but don’t always consider that traditional single node technologies are most suitable many (if not most) projects. Articles like this can help keep perspectives in the right place, where sometimes “boring old-fashioned RDBMS” are a valid approach.
You generate 100GB of log data a day. For some people, that is a ton of data, for others, not very much. I need to be able to find information in that 100GB within a couple hours. A python script on a single or a java app on a single box won’t cut it. They won’t get me the answer I need in the amount of time that I need it. So I spread the load across many machines. I’m getting the answer that I need but, I’m in a painful scenario of dealing with my home grown cluster of machines.
Curious what type of operations you perform that take that much time for < 5GB / hour of logging?
I had a hard time listening to a lot of the sessions at strata in Santa Clara this year. The enterprise vendors definitely smell money in the hadoop ecosystem and this had created a feedback loop which had resulted in a lot of “you need a hadoop” type cargo cult behavior.
I use a small hortonworks cluster to process about 200 Gb/day of video viewing and ad data from log files. We had a home brewed distributed log processing system that was less functional than what we got for free (development wise) by using hadoop. Using hadoop as a big dump parallel hammer to apply functions to large data sets without having to write custom code is the sweet spot for my needs.
Maybe that person went with Hadoop after looking at the problem and deciding that a job tracker as a SPOF was way more likely to fail them than just a single python script.
Did you mean “less likely”?
Hello, lobste.rs. I don’t know your protocols yet, but I am officially done with hacker news as a community. HN used to be about links that challenged and inspired me. I had an account for over 2200 days but never felt like I was part of the community. I predict this latest change will be reverted, but I know now that PG does not want me in his forum.
I promise to post only links and comments that I feel are worthy of your time and to never care about karma, troll, or make a comment comprised solely of the word “this”.
or make a comment comprised solely of the word “this”.
Once upon a time, HTTP and SMTP (among others) were good examples of a decentralized Internet. Anyone could run their own web server or mail server and be a little island unto themselves.
What has changed that prevents this from occurring? Although I stopped running my own SMTP server on my home network because I found the delivery of spam to eat up too much of my bandwidth, nothing prevents me from doing this again. I will grant that deliverability may be tougher now than it was ten years ago.
I run my own web, app, pubsub and db servers on my home network and deploy to them daily. I have recently started deploying a few projects to heroku and find it close to the descriptions mentioned in the gist.
When I imagine the proverbial metaverse of Stephenson’s “Snow Crash”, I imagine it as a program whose execution spreads across every participating machine (like BitTorrent). I see the various areas entered as subprocesses that were created (or more likely, configured) by the local admin of that area (much like web servers are). I don’t see a total rewrite or unleashing of all copyright necessary to a acheive a truly collaborative internet. I think we’ve got a pretty good one right now and it will continue to evolve.
17 years ago (or so), as the first dot com bubble was heating up, services and audience were over promised and overhyped. Our technology, hardware, languages, framework, etc were not able to live up to the hype sold around the famous disasters of that time. Right now, our tools, workforce, and culture have caught up to delivering on that original vision. And so, the next generations see what’s lacking and what it could be and get frustrated (the true catalyst of change).
Right now, our technology is built largely around centralizing experience. Look to Facebook, github or amazon for examples of centralized experience. Look back at MySpace for am approach that more closely resembles the free form collaborative internet (In all its good and badness).
Here’s my 10 year prediction: we are currently seeing development and traction of the technologies that will form the foundation of the next wave of collaboration. BitTorrent, block chain, git, html5, paxos/raft, functionally inspired approaches (if not actually functional languages) will be used to build the next levels of collaboration. We need to figure out new levels of decentralized trust to allow anonymous code execution on our machines (as JavaScript does today… perhaps). Even today, though, we can build subnetworks to test ideas like this out. Nothing has changed to prevent this, just some experiences that shun this openness have gotten immensely popular.
The economics changed.
It’s not an accident that things that shun openness got popular: they got popular because lots of money was invested in them, with the expectation of a return.
It’s not impossible to make money off decentralised networks, but centralised products are a much safer bet.
It’s the client server model. It’s too hard to deploy for normal people to have first class privately controlled network presence (webserver, smtp etc). The assumption that systems have to be consistent to work also makes it veeeerrry difficult to scale anything without a well monied central presence ie a company.
We also have not always had such good tools and languages for collaboration as we have now. HTML5 is so rich. WebGL? Cryptocurrency? They’re enough to keep us going for a long time. Sure, Bitcoin is a bit hard to use but that’ll change as soon as something really needs it to change.
Many ISP make it a violation of their policy to run your own HTTP server from your house.
Excellent point. I mentioned heroku mainly because, from my pint of view, I see AWS as something that can create the ability for is to run as much or as little of our own presence as we want. While I think “running my own server” has helped me greatly as a developer, I don’t think the important lessons about sysadmin I’ve learned couldn’t have come from managing a remotely hosted service.