




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
ASD AS . SDS ADSFASDF ASDF . ASD . DWEQRTH RTYERTY
Typology: Study Guides, Projects, Research
1 / 253
This page cannot be seen from the preview
Don't miss anything!
In system design interviews, candidates are required to show their ability to develop a high-level architecture of a large system. Designing software systems is a very broad topic and even a software engineer having years of experience at a top software company may not claim to be an expert on system design. During such interviews, candidates have 30 to 40 minutes to answer questions like, “How to design a cloud storage system like Dropbox?” or “How to design a search engine” etc. In real life, companies spend not weeks but months and hire a big team of software engineers to build such systems. Given this, then how can a person answer such a question in 40 minutes? Moreover, there is no set pattern of such questions. Questions are flexible, unpredictable, usually open-ended, and have no standard or squarely correct answer.
Unlike coding interviews where problem solving ability of the candidates is evaluated, design interviews consist of complicated and fuzzy questions which aim at testing the candidate’s ability to analyze a vague and complicated problem, their compatibility with building large systems and how do they present their solution. Such interviews also look into a candidate’s competence in guiding and moving the conversation forward.
These days, companies are least bothered about your pedigree, where you studied or where you come from but surely concerned about what you can do on the job. For them, the most important thing is your thought process and your mindset to look into and handle problems. For these counts, candidates are generally scared of the design interviews. But despite all this, I believe there is no need of scaring off. You need to get into what the companies want to know about you during these 40 minutes, which is basically “your approach and strategy to handle a problem” and how organized, disciplined, systematic, and professional you are at solving it. What is your capacity to analyze an issue and your level of professional mechanics to solve it step by step?
In short, system design interview is, just understanding it from interviewer’s perspective. During the whole process, it is your discussion with the interviewer that is of core importance.
There is no strictly defined process to system design interview. Secondly, there are so many things inherently unclear about large systems that without clarifying at least a few of them in the beginning, it would be impossible to go for a solution. Any candidate who does not realize this fact would fall into the trap of quickly jumping onto finding a solution.
at all. Therefore, don’t forget to make sure you gather all the requirements as the interviewer would not be listing them out for you in advance.
The point I want to make is that the main difference between design interviews and the rest is that you are not presented with the full detail of the problem at the outset. Rather you are required to scale the breadth and depth of a blurred and indistinct problem. You are supposed to take the details and interrogate the crux of the issue by putting judicious questions yourself. Your questions and points of interest to clarify the problem presented go a long way in evaluating your ability and competence as an asset to the company.
In design and architecture interviews the problems presented are quite big. They definitely cannot be solved in 40 minutes’ time implying that the objective is to test the technical depth and diversity the interviewee invokes during the interview. That also speaks strongly for your would be ‘level’ in the company. And your level in the company should come from your analytical ability to sort out the problem besides your ability to work in a team (your behavioral and background side of the interview), and your capacity to perform as a strong technical leader. In a nutshell, the basic idea of hiring at a level is to scale a person’s ability to contribute value to the company’s wants and needs. For that, you must exhibit your strengths by showing reasonable technical breadth.
Try to learn from the existing systems: How have these been designed? Another important point to be kept in mind is that, the interviewer expects that candidate’s analytical ability and questioning on the problem must comparable to his/her experience. If you have a few years of software development experience, you are expected to have certain knowledge and should avoid divulging into asking basic questions that might have been appropriate coming from a fresh graduate. For that, you should prepare sufficiently ahead of time. Try to go through real projects and practices, well in advance of the interview as most questions are based on real-life products, issues and challenges.
Leading the conversation: It is not the ultimate solution to the problem, rather the discussion process itself that is important in the interview. And it is the candidate who should lead the conversation going both broad and deep into the components of the problem. Hence, take the interviewer along with you during the course of solving the problem by communicating with him/her step by step as you move along
Solving by breaking down: Design questions at first might look complex and intimidating. But whatever the complexity level of the problem, a top-down and modularization approach can help a lot in solving the problem. Therefore, you should break the problem into modules and then tackle each of them independently. Subsequently, each component can be solved as a sub problem by reducing it to the level of a known algorithm. This strategy will not only make the design much clearer to
Solving system design questions could be broken down into three steps:
● Scoping the problem: Don’t make assumptions; Ask clarifying questions to understand the constraints and use cases. ● Sketching up an abstract design Illustrating the building blocks of the system and the relationships between them. ● Identifying and addressing the bottlenecks by using the fundamental principles of scalable system design.
Design interviews are formidable, open-ended problems that cannot be solved in the allotted time. Therefore, you should try to understand what your interviewer intends to focus on and spend sufficient time on it. Be well aware of the fact that the discussion on system design problem could go in different directions depending on the preferences of the interviewer. The interviewers might be unwilling to see how you create a high-level architecture covering all aspects of the system or they could be interested in looking for specific areas and diving deep into them. This means that you must deal with the situation strategically as there are chances of even the good candidates failing the interview, not because they don’t have the knowledge, but because they lack the ability to focus on the right things while discussing the problem.
If you have no idea how to solve these kinds of problems, you can familiarize yourself with the common patterns of system design by
reading diversely from the blogs, watching videos of tech talks from conferences. It is also advisable to arrange discussions and even mock interviews with experienced engineers at big tech companies.
Remember there is no ONE right answer to the question because any system can be built in different ways. The only thing that is going to be looked into is your ability to rationalize ideas and inputs.
Whenever we are designing a large system, we need to consider few things:
Investing in scaling before it is needed is generally not a smart business proposition; however, some forethought into the design can save valuable time and resources in the future. In the following chapters, we will focus on some of the core building blocks of scalable systems. Familiarizing with these concepts would greatly benefit in understanding distributed system design problems discussed later. In the next section, we will go through Consistent Hashing, CAP Theorem, Load Balancing,
translates into some caches becoming hot and saturated while the others idle and almost empty.
In such situations, consistent hashing is a good way to improve the caching system.
Consistent hashing is a very useful strategy for distributed caching system and DHTs. It allows distributing data across a cluster in such a way that will minimize reorganization when nodes are added or removed. Hence, making the caching system easier to scale up or scale down.
In Consistent Hashing when the hash table is resized (e.g. a new cache host is added to the system), only k/n keys need to be remapped, where k is the total number of keys and n is the total number of servers. Recall that in a caching system using the ‘mod’ as the hash function, all keys need to be remapped.
In consistent hashing objects are mapped to the same host if possible. When a host is removed from the system, the objects on that host are shared by other hosts; and when a new host is added, it takes its share from a few hosts without touching other’s shares.
As a typical hash function, consistent hashing maps a key to an integer. Suppose the output of the hash function is in the range of [0, 256). Imagine that the integers in the range are placed on a ring such that the values are wrapped around.
Here’s how consistent hashing works:
1 of 5
To add a new server, say D, keys that were originally residing at C will be split. Some of them will be shifted to D, while other keys will not be touched.
To remove a cache or if a cache failed, say A, all keys that were originally mapping to A will fall into B, and only those keys need to be moved to B, other keys will not be affected.
For load balancing, as we discussed in the beginning, the real data is essentially randomly distributed and thus may not be uniform. It may make the keys on caches unbalanced.
To handle this issue, we add “virtual replicas” for caches. Instead of mapping each cache to a single point on the ring, we map it to multiple
Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
We cannot build a general data store that is continually available, sequentially consistent and tolerant to any partition failures. We can only build a system that has any two of these three properties. Because, to be consistent, all nodes should see the same set of updates in the same order. But if the network suffers a partition, updates in one partition might not make it to the other partitions before a client reads from the out-of-date partition after having read from the up-to-date one. The only thing that can be done to cope with this possibility is to stop serving requests from the out-of-date partition, but then the service is no longer 100% available.
Load balancer (LB) is another critical piece of any distributed system. It helps to distribute load across multiple resources according to some metric (random, round-robin, random with weighting for memory or CPU utilization, etc.). LB also keeps track of the status of all the resources while distributing requests. If a server is not available to take new requests or is not responding or has elevated error rate, LB will stop sending traffic to such a server.
To utilize full scalability and redundancy, we can try to balance the load at each layer of the system. We can add LBs at three places:
● Between the user and the web server ● Between web servers and an internal platform layer, like application servers or cache servers ● Between internal platform layer and database.
There are many ways to implement load balancing.
approach discussed in the next section) for load-balancing for traffic within their network.
If we want to avoid the pain of creating a smart client, and since purchasing dedicated hardware is excessive, we can adopt a hybrid approach, called software load-balancers.
HAProxy is a one of the popular open source software LB. Load balancer can be placed between client and server or between two server side layers. If we can control the machine where the client is running, HAProxy could be running on the same machine. Each service we want to load balance can have a locally bound port (e.g., localhost:9000) on that machine, and the client will use this port to connect to the server. This port is, actually, managed by HAProxy; every client request on this port will be received by the proxy and then passed to the backend service in an efficient way (distributing load). If we can’t manage client’s machine, HAProxy can run on an intermediate server. Similarly, we can have proxies running between different server side components. AProxy manages health checks and will remove or add machines to those pools. It also balances requests across all the machines in those pools.
For most systems, we should start with a software load balancer and move to smart clients or hardware load balancing.
Load balancing helps you scale horizontally across an ever-increasing number of servers, but caching will enable you to make vastly better use of the resources you already have, as well as making otherwise unattainable product requirements feasible. Caches take advantage of the locality of reference principle: recently requested data is likely to be requested again. They are used in almost every layer of computing: hardware, operating systems, web browsers, web applications and more. A cache is like short-term memory: it has a limited amount of space, but is typically faster than the original data source and contains the most recently accessed items. Caches can exist at all levels in architecture but are often found at the level nearest to the front end, where they are implemented to return data quickly without taxing downstream levels.