As a university student, I was more familiar with NodeJS andmongoose than SQL, so I went with MongoDB, instead of SQL. When I started working in the real world, I became familiar with postgres. Postgres has always been my go-to database for my projects since then ❤️.
here is my mental model, why I choose Postgres for all my projects?
For more than two decades, Postgres has been developed and is now being used by many large corporations. It's a basic human instinct to stick with tried-and-true methods. Take a look at the screenshot below to see how an instagram engineer flexing his postgres usage.
Postgres allows developers to extends it's capabilites by writing an extensions, since some unique usecase can't be solved by general purpose database.
I used pg_cron extenstion to solve my unique usecase myself
I wanted to do historic aggregation of a numeric column. Usual scenario would be building a ETL pipeline, but I found a solution using pg_cron. You can check this link to know the entire story.
Fellow OSS engineers have opensourced their extensions for the community to use. Here are some of my favourite extensions:
zomboDB integrates elastic search with Postgres for full text search.
Usual question that comes while choosing Postgres is that, can we store complex relationship?. But unknow fact to most of the developers is that, postgres let developers to store and query JSON data.
Having a good database alone won't solve the problem, there are other scenarios that we need to consider. For eg: backup, runnnig an HA setup. Postgres have all sort of tooling to run a production database.
Ofc, you can join our community as well to talk about postgres :P
Not only does the community have a presence on various social media platforms, but it is also friendly and helps you instantly if you come across any issue.
Postgres isn't just a database; it's an entire ecosystem of development, research, and innovation that's impossible to fathom. I want to end the essay by saying
Postgres doing its justification for its elephant mascot
Most of us do silly mistakes mostly due to lethargy or some are genuine. But the price we pay is huge because some of the damages are irreversible.
But exposing nearly a million Kubernetes clusters seems a little serious and made me revisit some of the security issues in recent days.
Recent Security issues
1. Cybel Claim
Cyble claims that they found 900,000 Kubernetes cluster API servers exposed to the public internet due to misconfiguration, but only 799 of them can be exploited, where the intruder can get the access of the entire cluster.
2. Okta breach
Okta is a public traded identity management company that was breached by a hacking group called Lapsus$. As per the investigation, it turned out to be hackers able to get VPN keys of support engineers by social engineering.
The most disturbing part is that the hackers are inside the network for more than 1 month without anyone's notice.
3. Log4J
If you are coming from the Java world, you wouldn't have missed the log4j vulnerability. Log4j vulnerability will allow the attacker to run the code whatever they want and gain access to the system. I'm still not sure how many of the folks would have upgraded the version or if this vulnerability is still living in a not upgraded system.
From all those mentioned cases, it's clear that if we have followed best practices we could have avoided most of those instances.
Network policies wouldn't have exposed Kubernetes clusters.
Security education and basic monitoring wouldn't have let Lapsus$ hack the system for almost 1 month.
Continuous package scanning to avoid supply chain would make the developers upgrade the system at the right time.
The present time itself is full of challenges in the world of network security. Everything seems to indicate that the number of threats that users will have to face will continue to grow, so now more than ever, having a good security policy and protection is essential.
We covered only a fraction of security incidents, we have curated the list of leaks around the world in the github repo. The list might be a shock to you by seeing your day-to-day tech companies on the list.
Let me leave you with a question "Does Remote work lack IAM & cybersecurity oversight?"
OPA (Open Policy Agent) is a policy enforcement engine that can be used for a variety of purposes. OPA's access policies are written in a language called rego. A CNCF-graduated project, it's been incorporated into a number of different products.You can see the list of adopters here.
We chose OPA to enforce database access policies because of its flexibility to write polices as per policy author's need and familiarity in the cloud-native ecosystem.
OPA gives three options to enforce access polices:
go library
rest service
WASM
The inspektor dataplane is written in rust, so we cannot use the go library to enforce policies in inspektor. For the simplicity, we decided to use WASM to evaluate access policies rather than run a separate rest service.
Rego policies can be compiled into a wasm module using OPA. The compiled WASM module expose necessary functions to evaluate polices in other language.
WASM Compilation
burrego crate was built by people at kuberwardern to evaluate rego policies in rust. In this tutorial, we will learn how to evaluate wasm-compiled rego using the burrego crate.
Let's first write a rego programme to evaluate before moving on to the evaluation itself. In the given rego program, set the rule hello to true if the given input message input.message is world.
package play default hello = false hello { m := input.message m == "world" }
To make use of the policy, run the following command to compile the policy to wasm for the entrypoint play/hello
opa build -t wasm -e play/hello policy.rego
The above command will create a bundle.tar.gz file. The tar files contain the following files.
/data.json /policy.rego /policy.wasm /.manifest
For this tutorial, we care only about the policy.wasm file, since policy.wasm file is the compiled wasm module of rego policy.
Rust integration
let's add a burrego crate as a dependency to our rust program.
Evaluator::new will take policy as an input and return the Evaluator object.
let policy =fs::read("./policy.wasm").unwrap(); letmut evaluator =Evaluator::new( String::from("demo-policy"), &policy, &DEFAULT_HOST_CALLBACKS, ).unwrap();
during the evaluation, the entrypoint id is specified to evaluate the entrypoints. Using the entrypoint_id function, the id of the entry point can be retrieved. We are retrieving the entrypoint id for play/hello in the following snippet.
let entrypoint_id = evaluator.entrypoint_id(&"play/hello")
The policy will be evaluated using evaluate function. The evaluate function takes entrypoint's id, input and data as paramenter/
let input =serde_json::from_str(r#"{"message":"world"}"#).unwrap(); let data =serde_json::from_str("{}").unwrap(); let hello = evaluator.evaluate(entrypoint_id,&input,&data).unwrap(); println!("{}", hello);
We got true for play/hello entrypoint because we passed message as world. We would have received a false result if we had used a different value.
[{"result":true}]
I hope you learned how to use evaluate opa policies in rust. Feel free to join our community in discord where you can follow our development and participate.
Many of us use profiler to measure the CPU or memory consumed by the piece of code. This led me to figure out how profilers work.
To learn about profiling, I groked a popular profiling crate pprof-rs. This library is used to measure the CPU usage of a rust program.
If you are also interested in contributing to open source code or want to learn how to read complex project source code. I would highly recommend Contributing to Complex Projects by Mitchell Hashimoto.
let's just profile a sample rust program and see how pprof used.
Here is the modifled example program that I've taken from pprof-rs. You find the full source here.
The sample program calculates the number of prime numbers from 1 to 50000.
fnmain(){ let prime_numbers =prepare_prime_numbers(); // start profiling let guard =pprof::ProfilerGuardBuilder::default() .frequency(100) .build() .unwrap(); letmut v =0; for i in1..50000{ // use `is_prime_number1` function only if the incoming value // i is divisable by 3. if i %3==0{ ifis_prime_number1(i,&prime_numbers){ v +=1; } } else{ ifis_prime_number2(i,&prime_numbers){ v +=1; } } } println!("Prime numbers: {}", v); // stop profiling and generate the profiled report. ifletOk(report)= guard.report().build(){ letmut file =File::create("profile.pb").unwrap(); let profile = report.pprof().unwrap(); letmut content =Vec::new(); profile.write_to_vec(&mut content).unwrap(); file.write_all(&content).unwrap(); }; }
In the above example, We started profiling at the beginning of the program using ProfilerGuardBuilder
let guard =pprof::ProfilerGuardBuilder::default() .frequency(100) .build() .unwrap();
At the end of the program, we generated and wrote the report to profile.pb file.
The report is generated by running the program and it's visualized using google's pprof
~/go/bin/pprof --http=localhost:8080 profile.pb
After executing the above command, pprof will let you to visualize the profile at http://localhost:8080
From the visualized profile, you can clearly see that is_prime_number2 have consumed more cpu than is_prime_number1. That's because is_prime_number1 is used only the given number is divisible by 3.
Now, that we learned how to profile rust program using pprof-rs. Let's learn how pprof-rs works internally.
Please don't get too worn out yet! So far, we've learned the basics of profiler and how to use pprof-rs. Before we begin internal working of profiler, let's take a sip of water to rehydrate ourselves.
Before we get into pprof-rs code, let's learn cpu profiling in theory.
Profiler pause the program in certain interval of time and resumes after sampling the current stack trace. While sampling, it takes each stack frame and increments its count. Then the sampled data is then used to create a flamegraph or something similar.
stack traces: stack traces are the list of call stack of function calls. for eg: is_prime_number_1 -> main
When you start the profiling with ProfilerGuardBuilder, pprof-rs will register signal handler and timer to specify how often programs is supposed to pause.
let guard =pprof::ProfilerGuardBuilder::default() .frequency(100) .build() .unwrap();
registering signal handler
perf_signal_handler callback function is registered for SIGPROF signal. whenever SIGPROF signal emitted, perf_signal_handler is invoked.
SIGPROF: This signal typically indicates expiration of a timer that measures both CPU time used by the current process, and CPU time expended on behalf of the process by the system. Such a timer is used to implement code profiling facilities, hence the name of this signal.
Since we registered perf_signal_handler function to handle SIGPROF signals, it is invoked whenever a SIGPROF signal is emitted. perf_signal_handler takes ucontext as one of the arguments. ucontext contains the current instruction pointer of machine code that is being executed.
Using that instruction pointer, current call stack trace is retrivied. That is done using backtrace crate. The collected backtrace and thread name is passed to
the profiler.sample for sampling.
extern"C"fnperf_signal_handler( _signal: c_int, _siginfo:*mutlibc::siginfo_t, ucontext:*mutlibc::c_void, ){ ifletSome(mut guard)=PROFILER.try_write(){ ifletOk(profiler)= guard.as_mut(){ letmut bt:SmallVec<[<TraceImplasTrace>::Frame;MAX_DEPTH]>= SmallVec::with_capacity(MAX_DEPTH); // ucontext is passed to trace method to retrive // stack frame of current instruction pointer. TraceImpl::trace(ucontext,|frame|{ bt.push(frame.clone()); }); let current_thread =unsafe{libc::pthread_self()}; letmut name =[0;MAX_THREAD_NAME]; let name_ptr =&mut name as*mut[libc::c_char]as*mutlibc::c_char; write_thread_name(current_thread,&mut name); let name =unsafe{std::ffi::CStr::from_ptr(name_ptr)}; profiler.sample(bt, name.to_bytes(), current_thread asu64); } } }
sampling
profiler.sample interally calls a hashmap to insert stack frame and its count. As a side note, this is a custom implementation of the hashmap, rather than the rust's built-in hashmap. That's because heap allocation is forbidden inside signal handler so hashmap can't grow dynamically.
A stack frame with least count is evicted from the hashmap if the incoming stack frame can't find a place in it, and a temporary file is created to store the evicted stack frame.
pubfnflamegraph_with_options<W>( &self, writer:W, options:&mutflamegraph::Options, )->Result<()> where W:Write, { let lines:Vec<String>=self .data .iter() .map(|(key, value)|{ letmut line = key.thread_name_or_id(); line.push(';') for frame in key.frames.iter().rev(){ for symbol in frame.iter().rev(){ line.push_str(&format!("{}", symbol)); line.push(';'); } line.pop().unwrap_or_default(); line.push_str(&format!(" {}", value)) line }) .collect(); if!lines.is_empty(){ flamegraph::from_lines(options, lines.iter().map(|s|&**s), writer).unwrap(); Ok(()) }
flamegraph crate will generate differential flamegraph from folded stack line.
Example
main;prime_number1 3 main;prime_number2 5
pprof-rs also encodes the sampled data in google's pprof format which let you plot interactive graphs.
In recent days I've been excited about continuous profiling and it has been a hot thing in the observability space. Follow these amazing open-source projects parca and pyroscope to know more about continuous profiling.
I've had a use case of escaping all the raw html from the REST API request body.
However, it is time-consuming to write a function for each request struct to escape each field. So I came up with the idea of creating a single function called EscapeStruct that uses reflection to detect the layout of the struct and then it can be used to escape all of the string fields.
Reflection in Golang allows you to inspect and manipulate the structure at runtime. The reflect package contains all functions related to reflection.
Let's walk through the implementation of EscapeStruct which I used to escape all struct string fields.
The EscapeStruct function accepts an interface as an argument, allowing us to pass any struct pointer as an argument. For the purposes of this blog post, assume that we will only pass a pointer to a struct as an argument, so that we can ignore all edge cases and emphasis on the solution's core.
reflect.ValueOf returns the reflect.Value, which contains the concrete value of the interface we passed. The Elem method will return the reflect.Value of the struct to which the given pointer points.
Now we have the reflect.Value of the underlying struct to which the given pointer points. This is later passed to the esacpeValue function.
funcescapeValue(in reflect.Value){ if in.Kind()== reflect.Struct { n := in.NumField() for i :=0; i < n; i++{ field := in.Field(i) escapeValue(field) } } if in.Kind()== reflect.Ptr { escapeValue(in.Elem()) return } if in.Kind()== reflect.String { if in.CanSet(){ in.SetString(html.EscapeString(in.String())) } return } }
esacpeValue function takes reflect.Value as an argument handles three cases. if the given argument
is of kind struct, we'll iterate through the fields of the struct and pass the fields to the escapeValue function
is of pointer, we'll get the underlying object which the pointer points to using Elem method and then it get passed to escapeValue function.
is of string type, we'll check whether it can be mutated or not using CanSet method because private fields can't be mutated. If string field can be mutaed, then we set escaped value using SetString method.
Because we are passing the fields of the struct back to the escapeValue function, the string fields in deep nested structs are checked recursively.
Every programming language has a set of keywords that are only used for certain things. In rust, for example, the keyword for is used to represent looping.
Because keywords have meaning in programming languages, they cannot be used to name a function or variable. for example, the keywords for or in cannot be used as variable names.
Although keywords are not intended to be used to name variables, you can do so in rust by using a raw identifier.
The program below will not compile in rust because in is a reserved keyword.
#[derive(Debug)] struct Test{ in: String } fn main() { let a = Test{ in: "sadf".to_string() }; println!("{:?}", a); }
output:
error: expected identifier, found keyword `in` --> src/main.rs:4:5 | 4 | in: String | ^^ expected identifier, found keyword | help: you can escape reserved keywords to use them as identifiers | 4 | r#in: String | ~~~~ error: expected identifier, found keyword `in` --> src/main.rs:9:9 | 9 | in: "sadf".to_string() | ^^ expected identifier, found keyword | help: you can escape reserved keywords to use them as identifiers | 9 | r#in: "sadf".to_string() | ~~~~
However, we can make the program work by prefixing the keyword with r#.
r# tells the compiler that the incoming token is an identifier rather than a keyword.
#[derive(Debug)] struct Test{ r#in: String } fn main() { let a = Test{ r#in: "sadf".to_string() }; println!("{:?}", a); }
output:
Test { in: "sadf" }
It's very useful for rust because it allows rust to introduce new keywords.
Assume we have a crate built with the 2015 rust edition that exposes the identifier try. Later, try was reserved for a feature in the 2018 edition. As a result, we must use a raw identifier to call try
TLS is the trusted way of sending messages over a TCP connection. TLS by default encrypts the payload before sending and decrypts after receiving the payload. But if you send plain text on a normal connection, then it can be easily spoofed.
So, if we send a password as plain text in the normal tcp connection, then the attacker can view the password and use the same password to take control of the resource.
This raises us the question that, how do we authenticate the user on a in-secure connection without revealing the password.
This has been solved using SASL (Simple Authentication And Security Layer). If you come from a devops background, you might have noticed SASL error. SASL is used in popular projects like Postgres, MongoDB, Kafka...
I got to know about SASL, while creating postgres support in inspektor.
In this blog, I'll explain how SCRAM(Salted Challenge Response Authentication Mechanism) works, which is part of SASL family.
SCRAM establishes an authenticated connection through a four-step handshake:
Step 1:
Cliend sends nonce (nonce is nothing but randomly chosen bytes) and user's username to the server to initiate the handshake. This message is called client-first message.
client first message. n,,n=user,r=fyko+d2lbbFgONRv9qkxdawL
Step 2:
Server after receiving client-first message, it replies back with its own nonce, salt and iteration count. This message is called the server-first message.
server first message r=fyko+d2lbbFgONRv9qkxdawL3rfcNHYJY1ZVvWVs7j,s=QSXCR+Q6sek8bf92, i=4096
Step 3:
Now, client will create ClientProof using the parameter from the server-first message to prove that client has right password to authenticate. After creating the ClientProof. Client will send the proof to the server. It's called client-final message.
client final message c=biws,r=fyko+d2lbbFgONRv9qkxdawL3rfcNHYJY1ZVvWVs7j, p=v0X8v3Bz2T0CJGbJQyF0X+HI4Ts=
Step 4:
As a final step, server with verify the ClientProof that the client has access to the password. After that proof verification completed by the server.
server will send ServerSignature to the client
server final message v=rmF9pqV8S7suAoZWja4dJRkFsKQ=
The ServerSignature is used to compare against the ServerSignature calculated by the client. This ensures that the client is talking to the correct server.
Now the client has established an authenticated connection without exchanging the password with the server.