Q&A: Sourcegraph’s Universal Code Search Tool

Search results when searching for instances of "open file" on the C++ language
Search results when searching for instances of "open file" on the C++ language.
Image: Sourcegraph

By: Rina Diane Caballar

In software development, code search is a way to better navigate and understand code. But it’s an often overlooked technique, with development tools and coding environments offering clunky and limited search functionalities.

Tech startup Sourcegraph aims to change that with its universal code search tool by the same name that makes searching code as seamless as doing a Google search on the web. To achieve that efficiency, Sourcegraph models code and its dependencies as a graph, and performs queries on the graph in real time.

Compared to Facebook's internal search tool or Google’s code search for its own open-source projects, Sourcegraph makes its source code publicly available. The tool is free for individuals and teams of up to 10, and is available to larger teams through tiered pricing. It supports over 30 programming languages and integrates with developer tools such as GitHub and GitLab for code hosting, Codecov for code coverage, and Jira Software for project management. 

Sourcegraph, which is based in San Francisco, closed a US $23 million Series B funding round last month and has raised $43 million to date.  Engineering teams at Adidas, Cloudflare, Lyft, Uber, Yelp, and others already use the tool. 

Sourcegraph co-founder and CEO Quinn Slack spoke to IEEE Spectrum about the inner workings of the company’s universal code search functionality and the advantages of code search for software developers.

This interview has been edited and condensed for clarity.

IEEE Spectrum: What are the benefits of code search for developers?

Quinn Slack: Code search makes it easy to find usage examples for your company’s own code. You can see how the engineers who are most experienced with a certain type of code are using it in your code base. You can also search some of the internal libraries you need to work with and see how other developers are using them. You’ll learn new techniques or you might find that everyone else is making a common mistake, or you might figure out how to do something better and educate the rest of your team.

IEEE Spectrum: How does Sourcegraph’s universal code search work?

Slack: Search is where people go to find answers to questions, which means you need to have all the information to show them the best answers. With universal code search, you need to have all the code—not just the latest version but the entire history—from every repository. An analogy I like is Wikipedia having its own search box, but almost everyone searches on Google because Google has all the answers.

Making code search universal is tough. We had to write a new search back-end from the ground up, and we had to come up with a common way to make it understand all these different languages. Then, we had to integrate it with other tools and services, so we can get information from code coverage tools, logging tools, tracing tools, and feature flag tools, to name a few. We have a hybrid search, which means that if someone pushes some code, then you’ll be able to search it on Sourcegraph instantly—even if it isn’t indexed yet.

IEEE Spectrum: Why did you decide to model code as a graph?

Slack: All code is connected. When writing code, you’re using libraries and calling services written by other developers. We had to understand the relationship of one piece of code to every other piece of code to answer the questions users have when searching code.

IEEE Spectrum: How do you make Sourcegraph as fast as you want it to be?

Slack: We have a fast index search and we can find results from that. Say you’re trying to index Twitter—the data is constantly changing. With code, it also changes, but if you’ve analyzed the historical version, then you can keep that and not have to constantly update it.

IEEE Spectrum: What's the biggest challenge you encountered while building Sourcegraph, and how did you overcome it?

Slack: There are a lot of companies that have never had code search. We had to make searching intuitive and allow it to integrate with the other tools they use so it works out of the box and it’s easy for them to start getting value from Sourcegraph.

For example, if you’re viewing code on GitHub, you can just hover your mouse over a function call and the documentation pops up in a dialog box. From there, you can go to the [function] definition or find references.

IEEE Spectrum: What future plans do you have for Sourcegraph?

Slack: One is we’re becoming more universal, with deeper coverage for more languages and integrating more tools. The second thing is we want to make Sourcegraph’s universal code search not just about finding code, but also about fixing code. We’re making it so once you find where the bugs are, you can tell Sourcegraph how to fix them.

This article originally appeared in IEEE Spectrum on 3 April 2020.