
Picture by Writer
# Introduction
The technical screening for knowledge science roles in FAANG firms may be very thorough. Nonetheless, even they’ll’t provide you with an countless stream of distinctive interview questions. When you’ve gone by the grind sufficient occasions, you begin to discover that some SQL patterns hold exhibiting up.
Listed here are the highest 5, with examples and code (PostgreSQL) for apply.


Picture by Writer | Serviette AI
Grasp these and also you’ll be prepared for many SQL interviews.
# Sample #1: Aggregating Knowledge with GROUP BY
Utilizing mixture capabilities with GROUP BY lets you mixture metrics throughout classes.
This sample is usually mixed with knowledge filtering, which implies utilizing one of many two clauses:
WHERE: Filters knowledge earlier than the aggregation.HAVING: Filters knowledge after the aggregation.
Instance: This Meta interview query asks you to search out the whole variety of feedback made 30 or fewer days earlier than 2020-02-10 per consumer. Customers with no feedback needs to be excluded from the output.
We use the SUM() operate with a GROUP BY clause to sum the variety of feedback per consumer. Outputting the feedback solely throughout the specified interval is achieved by filtering the information earlier than aggregation, i.e., utilizing WHERE. There’s no must calculate which date is “30 days earlier than 2020-02-10”; we merely subtract 30 days from that date utilizing the INTERVAL date operate.
SELECT user_id,
SUM(number_of_comments) AS number_of_comments
FROM fb_comments_count
WHERE created_at BETWEEN '2020-02-10'::DATE - 30 * INTERVAL '1 day' AND '2020-02-10'::DATE
GROUP BY user_id;
Right here’s the output.
| user_id | number_of_comments |
|---|---|
| 5 | 1 |
| 8 | 4 |
| 9 | 2 |
| … | … |
| 99 | 2 |
Enterprise Use:
- Consumer exercise metrics: DAU & MAU, churn price.
- Income metrics: income per area/product/time interval.
- Consumer engagement: common session size, common clicks per consumer.
# Sample #2: Filtering with Subqueries
When utilizing subqueries for filtering, you create a knowledge subset, then filter the primary question towards it.
The 2 principal subquery varieties are:
- Scalar subqueries: Return a single worth, e.g., most quantity.
- Correlated subqueries: Reference and rely upon the results of the outer question to return the values.
Instance: This interview query from Meta asks you to create a advice system for Fb. For every consumer, you must discover pages that this consumer doesn’t observe, however a minimum of certainly one of their buddies does. The output ought to include the consumer ID and the ID of the web page that needs to be really useful to this consumer.
The outer question returns all of the user-page pairs the place the web page is adopted by a minimum of one buddy.
Then, we use a subquery within the WHERE clause to take away the pages that the consumer already follows. There are two circumstances within the subquery: one which solely considers pages adopted by this particular consumer (checks for this consumer solely), after which checks if the web page thought of for advice is amongst these adopted by the consumer (checks for this web page solely).
Because the subquery returns all of the pages adopted by the consumer, utilizing NOT EXISTS in WHERE excludes all these pages from the advice.
SELECT DISTINCT f.user_id,
p.page_id
FROM users_friends f
JOIN users_pages p ON f.friend_id = p.user_id
WHERE NOT EXISTS
(SELECT *
FROM users_pages pg
WHERE pg.user_id = f.user_id
AND pg.page_id = p.page_id);
Right here’s the output.
| user_id | page_id |
|---|---|
| 1 | 23 |
| 1 | 24 |
| 1 | 28 |
| … | … |
| 5 | 25 |
Enterprise Use:
- Buyer exercise: most up-to-date login per consumer, newest subscription change.
- Gross sales: highest order per buyer, high income order per area.
- Product efficiency: most bought product in every class, highest-revenue product per 30 days.
- Consumer behaviour: Longest session per consumer, first buy per buyer.
- Critiques & suggestions: high reviewer, newest evaluate for every product.
- Operations: Newest cargo standing per order, quickest supply time per area.
# Sample #3: Rating with Window Capabilities
Utilizing window capabilities comparable to ROW_NUMBER(), RANK(), and DENSE_RANK() lets you order rows inside knowledge partitions, after which establish the primary, second, or nth file.
Here’s what every of those rating window capabilities does:
ROW_NUMBER(): Assigns a singular sequential quantity inside every partition; tied values get completely different row numbers.RANK(): Assigns the identical rank to tied values and skips the following ranks for the following non-tied worth.DENSE_RANK(): Identical asRANK(), solely it doesn’t skip rank after ties.
Instance: In an Amazon interview query, we have to discover the very best every day order value between 2019-02-01 and 2019-05-01. If a buyer has multiple order on a sure day, sum the order prices each day. The output ought to include the client’s first title, the whole value of their order(s), and the date of the order.
Within the first widespread desk expression (CTE), we discover the orders between the desired dates and sum the client’s every day totals for every date.
Within the second CTE, we use RANK() to rank prospects by order value descendingly for every date.
Now, we be part of two CTEs to output the required columns and filter solely the orders with the primary rank assigned to them, i.e., the very best order.
WITH customer_daily_totals AS (
SELECT o.cust_id,
o.order_date,
SUM(o.total_order_cost) AS total_daily_cost
FROM orders o
WHERE o.order_date BETWEEN '2019-02-01' AND '2019-05-01'
GROUP BY o.cust_id, o.order_date
),
ranked_daily_totals AS (
SELECT cust_id,
order_date,
total_daily_cost,
RANK() OVER (PARTITION BY order_date ORDER BY total_daily_cost DESC) AS rnk
FROM customer_daily_totals
)
SELECT c.first_name,
rdt.order_date,
rdt.total_daily_cost AS max_cost
FROM ranked_daily_totals rdt
JOIN prospects c ON rdt.cust_id = c.id
WHERE rdt.rnk = 1
ORDER BY rdt.order_date;
Right here’s the output.
| first_name | order_date | max_cost |
|---|---|---|
| Mia | 2019-02-01 | 100 |
| Farida | 2019-03-01 | 80 |
| Mia | 2019-03-01 | 80 |
| … | … | … |
| Farida | 2019-04-23 | 120 |
Enterprise Use:
- Consumer exercise: “High 5 most energetic customers final month”.
- Income: “The second-highest income area”.
- Product reputation: “High 10 best-selling merchandise”.
- Purchases “The primary buy of every buyer”.
# Sample #4: Calculating Shifting Averages & Cumulative Sums
The shifting (rolling) common calculates the typical during the last N rows, usually months or days. It’s calculated utilizing the AVG() window operate and defining the window as ROWS BETWEEN N PRECEDING AND CURRENT ROW.
The cumulative sum (working complete) is the sum from the primary row as much as the present row, which is mirrored in defining the window as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW within the SUM() window operate.
Instance: The interview query from Amazon desires us to search out the 3-month rolling common of complete income from purchases. We should always output the year-month (YYYY-MM) and the 3-month rolling common, sorted from the earliest to the newest month.
Additionally, the returns (adverse buy values) shouldn’t be included.
We use a subquery to calculate month-to-month income utilizing SUM() and convert the acquisition date to a YYYY-MM format with the TO_CHAR() operate.
Then, we use AVG() to calculate the shifting common. Within the OVER() clause, we order the information in partition by month and outline the window as ROWS BETWEEN 2 PRECEDING AND CURRENT ROW; we calculate the 3-month shifting common, which takes into consideration the present and the earlier two months.
SELECT t.month,
AVG(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_revenue
FROM
(SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
SUM(purchase_amt) AS monthly_revenue
FROM amazon_purchases
WHERE purchase_amt > 0
GROUP BY 1
ORDER BY 1) AS t
ORDER BY t.month ASC;
Right here’s the output.
| month | avg_revenue |
|---|---|
| 2020-01 | 26292 |
| 2020-02 | 23493.5 |
| 2020-03 | 25535.666666666668 |
| … | … |
| 2020-10 | 21211 |
To calculate a cumulative sum, we’d do it like this.
SELECT t.month,
SUM(t.monthly_revenue) OVER(ORDER BY t.month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_sum
FROM
(SELECT TO_CHAR(created_at::DATE, 'YYYY-MM') AS month,
SUM(purchase_amt) AS monthly_revenue
FROM amazon_purchases
WHERE purchase_amt > 0
GROUP BY 1
ORDER BY 1) AS t
ORDER BY t.month ASC;
Right here’s the output.
| month | cum_sum |
|---|---|
| 2020-01 | 26292 |
| 2020-02 | 46987 |
| 2020-03 | 76607 |
| … | … |
| 2020-10 | 239869 |
Enterprise Use:
- Engagement metrics: 7-day shifting common of DAU or messages despatched, cumulative cancellations.
- Monetary KPIs: 30-day shifting common of prices/conversions/inventory costs, income reporting (cumulative YTD).
- Product efficiency: logins per consumer shifting common, cumulative app installs.
- Operations: cumulative orders shipped, tickets resolved, bugs closed.
# Sample #5: Making use of Conditional Aggregations
Conditional aggregation enables you to compute a number of segmented metrics in a single move by placing the CASE WHEN assertion inside mixture capabilities.
Instance: A query from an Amazon interview asks you to establish returning energetic customers by discovering customers who made a second buy inside 1 to 7 days after their first buy. The output ought to consist solely of those customers’ IDs. The identical-day purchases needs to be ignored.
The primary CTE identifies the customers and the dates of their purchases, excluding same-day purchases by utilizing the DISTINCT key phrase.
The second CTE ranks every consumer’s buy dates from the oldest to the latest.
The final CTE finds the primary and second purchases for every consumer by utilizing conditional aggregation. We use MAX() to select the one non-NULL worth for the primary and second buy dates.
Lastly, we use the results of the final CTE and retain solely customers who made a second buy (non-NULL) inside 7 days of their first buy.
WITH every day AS (
SELECT DISTINCT user_id,
created_at::DATE AS purchase_date
FROM amazon_transactions
),
ranked AS (
SELECT user_id,
purchase_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY purchase_date) AS rn
FROM every day
),
first_two AS (
SELECT user_id,
MAX(CASE WHEN rn = 1 THEN purchase_date END) AS first_date,
MAX(CASE WHEN rn = 2 THEN purchase_date END) AS second_date
FROM ranked
WHERE rn <= 2
GROUP BY user_id
)
SELECT user_id
FROM first_two
WHERE second_date IS NOT NULL AND (second_date - first_date) BETWEEN 1 AND 7
ORDER BY user_id;
Right here’s the output.
| user_id |
|---|
| 100 |
| 103 |
| 105 |
| … |
| 143 |
Enterprise Use:
- Subscription reporting: paid vs. free customers, energetic vs. churned customers by plan tier.
- Advertising and marketing funnel dashboards: signed up vs. bought customers by site visitors supply, emails opened vs. clicked vs. transformed.
- E-commerce: accomplished vs. refunded vs. cancelled orders by area, new vs. returning consumers.
- Product evaluation: iOS vs. Android vs. Internet utilization, characteristic adopted vs. not adopted counts per cohort.
- Finance: income from new vs. present prospects, gross vs. internet income.
- A/B testing & experiments: management vs. therapy metrics.
# Conclusion
If you’d like a job at FAANG (and others, too) firms, deal with these 5 SQL patterns for the interviews. In fact, they’re not the one SQL ideas examined. However they’re mostly examined. By specializing in them, you make sure that your interview preparation is as environment friendly as doable for many SQL interviews at FAANG firms.
Nate Rosidi is a knowledge scientist and in product technique. He is additionally an adjunct professor instructing analytics, and is the founding father of StrataScratch, a platform serving to knowledge scientists put together for his or her interviews with actual interview questions from high firms. Nate writes on the newest tendencies within the profession market, provides interview recommendation, shares knowledge science tasks, and covers every thing SQL.
