I’ve recently started my first programming job, and the codebase is pretty huge. How do you guys approach a new codebase, and how do you become familiar with things as quickly as possible?
While there are language-specific tools that help immensely (such as cscope), for me nothing has ever really beaten just knowing find/grep/silversearcher type tooling thoroughly, alongside a couple of shell macros.
Even monster-sized codebases fit easily in RAM and with a multithreaded grep like silversearcher (ag), navigating to strange and unknown parts takes only a few moments. Combine your newly discovered directories with a few hacky shell macros like in https://gist.github.com/dw/81f9b8cc0d957c41a7b2f6dc85b53358 and you’re all set.
This should be true everywhere, but it might not be:
You have the right to ask every single senior programmer on the team for time for them to explain the code, architecture, and production environment (i.e. how it runs) of any component they work on. DO THIS FIRST. If they push back because they have good documentation, great, read the docs and then ask them again with all your questions in tow.
Get a way to browse & search your code as fast as possible, with as much semantic support as possible. OpenGrok, sourcegraph, cscope, gnu global, language server protocol, or just an IDE that can parse everything all at once. Someone’s already written it, so go search for it. This is important because of the next point.
Find bugs to fix, and work until you understand everything you can about the bug you’re fixing. Go deep rather than wide on the bugs. Don’t wait for them to assign you some new small feature, go for the bugs. They teach you things no one thinks of, and will not be nearly as isolated. Usually you’ll also get to work with people interested in the bug who you wouldn’t have known about.
Finally, go easy on yourself. It’s just ascii in some files, and you’ll learn a lot. Good luck and congrats!
This is great advice, and I’d add that you shouldn’t just ask the senior engineers just once. The first time they will probably “explain” a bunch of stuff, forgetting that 2/3rds of it makes no sense without the context that they have and you don’t. The other 1/3rd will be a grab bag of things that stick in your head, but don’t form a coherent picture. You’ll forget half of that 1/3rd in the first week, and it’ll turn out that you thought you understood the other half, but when you look at the actual code there’s a lot more going on than you thought.
Don’t stress about any of this. It can take months, even for experienced engineers, to get “up to speed” on even moderately complex systems. Putting that picture together in your head takes a long time. Don’t worry about that. Just keep asking questions, and keep reading the code. I agree with others about working on bugs, and going deep, rather than wide. Wide will just confuse you. Become and expert in one corner of the system, and follow the connections outwards from there.
Also, remember that the debugger is your friend. Set a breakpoint and patiently step through the code. You’ll end up going off on all kinds of tangents, but it’s a great way to learn how things hang together (especially in Javascript, where a lot of stuff isn’t really visible until runtime).
Personally I like to find bugs that either exist in an issue tracker or exist as errors in production logs and then figure out how the bug could be happening. It gives me a concrete goal for my exploration.
At one job I joined, I tried to solve 1 bug that appeared as an error in Splunk each morning. I’d show up to work 90 minutes early and try to knock one out. Over the year that I did that, I ended up in a large amount of the code base and learned a ton of business rules.
In order to solve a lot of the bugs, I had to find and meet the folks who were experts in that area of code, and the business rules that applied to it. In the process, I got to know the QA team better, the product team better and developers on teams other than mine a lot better as well.
Agreed. Try to fix some bugs. It gives you a concrete goal and lets you focus your questions (which you should not be afraid to ask).
I started working with the GCC toolchain code base this year and it’s the most daunting codebase I’ve ever worked with. I dove in trying to fix things after just reading some sources and it helped considerably. Having that goal really focuses the effort.
Also, it pays to understand how the software gets built, if it’s of that nature. You tend to learn a lot about dependencies that way.
Congrats on the job! One thing I’ve found helpful to do is profiling the operation of the codebase under normal load. Specifically I’ve used flamegraphs, which gives you a quick visualization into the callstacks of the most frequently/longest running methods. I’ve used this information to prune through most of the codebase (focusing on what’s running 90% of the time, so to speak), and to understand the overarching structure of things. Starting from func main or reading source files top to bottom, both of which I’ve tried doing (ineffectively I should add) were motivated by the same reasons but I’ve since settled in on just profiling.
Additionally I’ve found tests to be good point to start probing in, understanding the test setup phase (if any) has helped me understand the structure/dependencies and is a quick way to get your feet wet (by muddling around, breaking the tests).
In abstract, I usually take a hybrid top-down / bottom up approach.
I usually map out the major components to the system, and list all inputs and outputs at the system borders (configuration, HTTP endpoints, sockets, library api’s, databases, temporary files, etc), and then start to work on one part and map that out in greater detail, filling out the high level overview with detailed knowledge.
Callgraphs, dependencies, and as irfansharif mentioned flamegraphs are good visualization options. There may be language-specific tools for you.
Ask your peers many questions! Aside from ‘how’, ‘why’ is probably the most important. Write them down. Keep a daily dev journal in which you ask and answer questions to yourself. Document what you learn as you go along and make the journey for the next one easier.
Setting up the codebase gives you more practical knowledge about running the code. Writing toy code that makes use of the codebase gives a very high level of return on time investment. Sometimes you have to write code to understand what you are reading, and active engagement with any knowledge make it stick more than just passive intake. Make small refactors, even if you don’t commit them back to the main repository.
Don’t despair! Getting to know a codebase takes time and experience, and some ramp up time is always expected.
I usually look at the test suite first (assuming the project has one). It gives you a pretty good overview of what the codebase is supposed to do, and what the entry points are.
While there are language-specific tools that help immensely (such as cscope), for me nothing has ever really beaten just knowing find/grep/silversearcher type tooling thoroughly, alongside a couple of shell macros.
Even monster-sized codebases fit easily in RAM and with a multithreaded grep like silversearcher (ag), navigating to strange and unknown parts takes only a few moments. Combine your newly discovered directories with a few hacky shell macros like in https://gist.github.com/dw/81f9b8cc0d957c41a7b2f6dc85b53358 and you’re all set.
The rgrep command in Emacs is really nice for that kind of searching.
check out ripgrep https://github.com/BurntSushi/ripgrep its memetastic
This should be true everywhere, but it might not be:
You have the right to ask every single senior programmer on the team for time for them to explain the code, architecture, and production environment (i.e. how it runs) of any component they work on. DO THIS FIRST. If they push back because they have good documentation, great, read the docs and then ask them again with all your questions in tow.
Get a way to browse & search your code as fast as possible, with as much semantic support as possible. OpenGrok, sourcegraph, cscope, gnu global, language server protocol, or just an IDE that can parse everything all at once. Someone’s already written it, so go search for it. This is important because of the next point.
Find bugs to fix, and work until you understand everything you can about the bug you’re fixing. Go deep rather than wide on the bugs. Don’t wait for them to assign you some new small feature, go for the bugs. They teach you things no one thinks of, and will not be nearly as isolated. Usually you’ll also get to work with people interested in the bug who you wouldn’t have known about.
Finally, go easy on yourself. It’s just ascii in some files, and you’ll learn a lot. Good luck and congrats!
This is great advice, and I’d add that you shouldn’t just ask the senior engineers just once. The first time they will probably “explain” a bunch of stuff, forgetting that 2/3rds of it makes no sense without the context that they have and you don’t. The other 1/3rd will be a grab bag of things that stick in your head, but don’t form a coherent picture. You’ll forget half of that 1/3rd in the first week, and it’ll turn out that you thought you understood the other half, but when you look at the actual code there’s a lot more going on than you thought.
Don’t stress about any of this. It can take months, even for experienced engineers, to get “up to speed” on even moderately complex systems. Putting that picture together in your head takes a long time. Don’t worry about that. Just keep asking questions, and keep reading the code. I agree with others about working on bugs, and going deep, rather than wide. Wide will just confuse you. Become and expert in one corner of the system, and follow the connections outwards from there.
Also, remember that the debugger is your friend. Set a breakpoint and patiently step through the code. You’ll end up going off on all kinds of tangents, but it’s a great way to learn how things hang together (especially in Javascript, where a lot of stuff isn’t really visible until runtime).
Personally I like to find bugs that either exist in an issue tracker or exist as errors in production logs and then figure out how the bug could be happening. It gives me a concrete goal for my exploration.
At one job I joined, I tried to solve 1 bug that appeared as an error in Splunk each morning. I’d show up to work 90 minutes early and try to knock one out. Over the year that I did that, I ended up in a large amount of the code base and learned a ton of business rules.
In order to solve a lot of the bugs, I had to find and meet the folks who were experts in that area of code, and the business rules that applied to it. In the process, I got to know the QA team better, the product team better and developers on teams other than mine a lot better as well.
Agreed. Try to fix some bugs. It gives you a concrete goal and lets you focus your questions (which you should not be afraid to ask).
I started working with the GCC toolchain code base this year and it’s the most daunting codebase I’ve ever worked with. I dove in trying to fix things after just reading some sources and it helped considerably. Having that goal really focuses the effort.
Also, it pays to understand how the software gets built, if it’s of that nature. You tend to learn a lot about dependencies that way.
Congrats on the job! One thing I’ve found helpful to do is profiling the operation of the codebase under normal load. Specifically I’ve used flamegraphs, which gives you a quick visualization into the callstacks of the most frequently/longest running methods. I’ve used this information to prune through most of the codebase (focusing on what’s running 90% of the time, so to speak), and to understand the overarching structure of things. Starting from
func main
or reading source files top to bottom, both of which I’ve tried doing (ineffectively I should add) were motivated by the same reasons but I’ve since settled in on just profiling.Additionally I’ve found tests to be good point to start probing in, understanding the test setup phase (if any) has helped me understand the structure/dependencies and is a quick way to get your feet wet (by muddling around, breaking the tests).
In abstract, I usually take a hybrid top-down / bottom up approach. I usually map out the major components to the system, and list all inputs and outputs at the system borders (configuration, HTTP endpoints, sockets, library api’s, databases, temporary files, etc), and then start to work on one part and map that out in greater detail, filling out the high level overview with detailed knowledge.
Callgraphs, dependencies, and as irfansharif mentioned flamegraphs are good visualization options. There may be language-specific tools for you.
Ask your peers many questions! Aside from ‘how’, ‘why’ is probably the most important. Write them down. Keep a daily dev journal in which you ask and answer questions to yourself. Document what you learn as you go along and make the journey for the next one easier.
Setting up the codebase gives you more practical knowledge about running the code. Writing toy code that makes use of the codebase gives a very high level of return on time investment. Sometimes you have to write code to understand what you are reading, and active engagement with any knowledge make it stick more than just passive intake. Make small refactors, even if you don’t commit them back to the main repository.
Don’t despair! Getting to know a codebase takes time and experience, and some ramp up time is always expected.
I usually look at the test suite first (assuming the project has one). It gives you a pretty good overview of what the codebase is supposed to do, and what the entry points are.
In the good old days there was source navigator. The last release was in 2014. Maybe it has some interesting successor you’d like to talk about?
SourceTrail is pretty nice, but I’ve yet to use it at work: https://www.sourcetrail.com/
Whwn I want to get to know a new codebase, I do my review in these steps:
(and repeat)
Thanks for the advice everyone! It’s been a huge help, and I’ve definitely been getting a bit more comfortable with things the more I work with it :)