Kuo Liu, PhD

Some Thoughts On Staffing Your AI Team

August 25, 2023 | 6 Minute Read

In many companies, the “AI team” often consists of the “Data Science” side and the “Machine Learning Engineering” side. Sometimes Data Scientists and Machine Learning Engineers are on the same team, sometimes they are on separate teams, even in separate organizations of the company. For example, at Signifyd, all machine learning engineers are in the Engineering org, while Data Scientists are in the Decision Science org.

Data scientists typically work directly with data. Their focuses usually are: building AI models that may not be production ready, design statistical experiments (such as A/B testing) to inform data-driven decisions on product features. Many data science teams also invest in data science tool development (e.g. create python packages, common libraries) to improve efficiencies in common ETL tasks, model analysis, and automation across multiple tech stacks.

On the other hand, machine learning engineers are typically responsible for everything production related, such as building machine learning pipelines (from data ingestion to model deployment), maintaining the machine learning infrastructure, ensuring and improving the model latency, owning the system uptime and resiliency (for example, when there are unexpected large amount of data flowing in), and make sure the engineering system and the code base are maintainable and extendable. (It would be a huge cost, if you need to redo your engineering backbone too often because a new feature is introduced. Careful engineering architecture design is the key and you’ll need the right talent to deliver it.)

Well-rounded talents who can do everything from Data Science to Engineering well are extremely hard to find. They tend to be startup founders, CTOs themselves. It is probably a good approach to first understand what is essential to your business, what pieces are needed from ideation to production before staffing your AI team. In my past experience, if you are starting something new, one or two key talents can often carry a project very far and deliver great results before any need to bring in more people. Building a team around key talents is an approach I often use in my practice as a Data Science leader. Your key talents should have strong business acumen, excels at tackling the key business problem with minimum amount of effort. Strong technical leadership and project management skills are absolutely a plus. We will expand more in the next Chapter, How to Find and Hire People.

Other than your key talents, there are a few other considerations going into staffing your AI team. First, what kind of consideration should you give for someone’s industry experience and education background? Data Scientists, Machine Learning Engineers who thrive in various size, maturity of companies in different sectors usually bring in very different strengths and cultures to your company. People who work in big tech companies with many data scientists, machine learning engineers tend to be more specialized, whereas people who work in a startup probably have to wear multiple hats. The work pace talents thrive in, the degree of ownership and autonomy talents desire, the degree of collaboration talents used to have in their workspace can vary a lot. It’s crucial to take it into account to build the suitable culture for your company. On the education front, I often get asked, “Do we need to hire a PhD?” “How is Insight Fellowship?” “What about Data Science bootcamps such as Galvanize and General Assembly?” There isn’t one answer to fit all and I have success in hiring talents from all kinds of backgrounds. It boils down to what’s needed for the job. If AI is your core product and differentiation factor, you may want your data scientists to really understand the math behind the algorithm. Data Scientists with a PhD who are comfortable reading journal papers, building prototypes with novel algorithms are probably a better fit. On the other hand, if the core business problem you are solving is to improve usability, someone who is versatile and has an acute product sense would be a good fit. If you are running a startup with limited resources, someone who has contributed to open source repositories and who can build a data science solution from end to end without going too deep in each aspect may be the perfect fit.

Second, when you have already hired your key talents and are expanding your AI team, what experience level of talents should you bring in? Personally, I have been in a Data Science team with all fresh PhD graduates; a team with extremely experienced ICs (former economics professors) and teams of IC with various experience (0 - 15 years of industry experience). As an AI team leader, if you want to build a high performance team to last, you’ll have to consider: what type of work is interesting for your team members with various degrees of experience, so you can keep them engaged and intellectually challenged? While junior ICs won’t mind working on certain tasks and gain experience along the way, your senior ICs may find it very uninteresting. You’ll also need to think about managing your team member’s career development. What kind of role an individual can take on 6 months from now, a year from now? Does your company offer such an opportunity? etc.

Third, the structuring and collaboration between Data Science and Machine Learning Engineering. The two branches of the AI team are essential in a company that has a mature AI product. The collaborations happen organically in various ways. For example, system debugging. If your core product is an AI product, often you won’t know if your product is buggy or not without the involvement from both branches. While your machine learning team typically has CI/CD setup and many metrics they track in DataDog etc, the machine learning team is usually not in a position to tell if the model is algorithmically accurate. They may not even know how to check that. To ensure the soundness of your AI system, data scientists have to be involved not only as “QA”, but also design the systems together with your engineers to make the quality assurance robust, repeated and automated. Another example, data scientists are often the “customers” of the infrastructure, pipelines that are built by the machine learning engineering team. While many companies have a Product Manager role to shepard the right tools to build and the priority of multiple projects (by the way, it will never seem to have enough engineering resources), things are never perfect in my experience. Figuring out how to make the two teams collaborate smoothly while maintaining their own focus is a key challenge I saw in many places.