Skip to main content

· 3 min read
Poonai

image of an elephant

As a university student, I was more familiar with NodeJS andmongoose than SQL, so I went with MongoDB, instead of SQL. When I started working in the real world, I became familiar with postgres. Postgres has always been my go-to database for my projects since then ❤️.

here is my mental model, why I choose Postgres for all my projects?

  • tested and proven
  • third party extension
  • JSON for other needs
  • tooling around around postgres
  • community

tested and proven

For more than two decades, Postgres has been developed and is now being used by many large corporations. It's a basic human instinct to stick with tried-and-true methods. Take a look at the screenshot below to see how an instagram engineer flexing his postgres usage.

postgres at instagram

extension ecosystem

Postgres allows developers to extends it's capabilites by writing an extensions, since some unique usecase can't be solved by general purpose database.

I used pg_cron extenstion to solve my unique usecase myself

I wanted to do historic aggregation of a numeric column. Usual scenario would be building a ETL pipeline, but I found a solution using pg_cron. You can check this link to know the entire story.

Fellow OSS engineers have opensourced their extensions for the community to use. Here are some of my favourite extensions:

  • zomboDB integrates elastic search with Postgres for full text search.
  • pg_cron cron jobs inside postgres
  • pg_storm accelerate analytics query performance by offloading analytics computation to GPU

you can always write your own extension, if you don't find extenstion for your usecase. Now you can write extension in rust as well using pgx

JSON for other needs

Usual question that comes while choosing Postgres is that, can we store complex relationship?. But unknow fact to most of the developers is that, postgres let developers to store and query JSON data.

postgres json tweet

Tooling around Postgres

Having a good database alone won't solve the problem, there are other scenarios that we need to consider. For eg: backup, runnnig an HA setup. Postgres have all sort of tooling to run a production database.

  • patroni - running a HA postgres on k8s
  • kubesdb - running postgres on k8s
  • dexter - automatic indexer to optimize db query performance
  • timescale - turn your postgres into timeseries database
  • supabase - instant graphql api from postgres databases

Community

postgres community is very welcoming and have precense in all the popular social communities:

Ofc, you can join our community as well to talk about postgres :P

Not only does the community have a presence on various social media platforms, but it is also friendly and helps you instantly if you come across any issue.

Closing Notes

Postgres isn't just a database; it's an entire ecosystem of development, research, and innovation that's impossible to fathom. I want to end the essay by saying

Postgres doing its justification for its elephant mascot

· 3 min read
zriyansh

banner for blog

Most of us do silly mistakes mostly due to lethargy or some are genuine. But the price we pay is huge because some of the damages are irreversible. But exposing nearly a million Kubernetes clusters seems a little serious and made me revisit some of the security issues in recent days.

Recent Security issues

1. Cybel Claim

Cyble claims that they found 900,000 Kubernetes cluster API servers exposed to the public internet due to misconfiguration, but only 799 of them can be exploited, where the intruder can get the access of the entire cluster.

country-wise distribution of exposed k8s cluster

2. Okta breach

Okta is a public traded identity management company that was breached by a hacking group called Lapsus$. As per the investigation, it turned out to be hackers able to get VPN keys of support engineers by social engineering.

The most disturbing part is that the hackers are inside the network for more than 1 month without anyone's notice.

mockery on the state of the industry

3. Log4J

If you are coming from the Java world, you wouldn't have missed the log4j vulnerability. Log4j vulnerability will allow the attacker to run the code whatever they want and gain access to the system. I'm still not sure how many of the folks would have upgraded the version or if this vulnerability is still living in a not upgraded system.

meme on cloudflare

Learnings

From all those mentioned cases, it's clear that if we have followed best practices we could have avoided most of those instances.

  • Network policies wouldn't have exposed Kubernetes clusters.
  • Security education and basic monitoring wouldn't have let Lapsus$ hack the system for almost 1 month.
  • Continuous package scanning to avoid supply chain would make the developers upgrade the system at the right time.

The present time itself is full of challenges in the world of network security. Everything seems to indicate that the number of threats that users will have to face will continue to grow, so now more than ever, having a good security policy and protection is essential.

We covered only a fraction of security incidents, we have curated the list of leaks around the world in the github repo. The list might be a shock to you by seeing your day-to-day tech companies on the list.

Let me leave you with a question "Does Remote work lack IAM & cybersecurity oversight?"

Chao!

· 3 min read
Poonai

photo of secuirty gaurd

OPA (Open Policy Agent) is a policy enforcement engine that can be used for a variety of purposes. OPA's access policies are written in a language called rego. A CNCF-graduated project, it's been incorporated into a number of different products.You can see the list of adopters here.

We chose OPA to enforce database access policies because of its flexibility to write polices as per policy author's need and familiarity in the cloud-native ecosystem.

OPA gives three options to enforce access polices:

  • go library
  • rest service
  • WASM

The inspektor dataplane is written in rust, so we cannot use the go library to enforce policies in inspektor. For the simplicity, we decided to use WASM to evaluate access policies rather than run a separate rest service.

Rego policies can be compiled into a wasm module using OPA. The compiled WASM module expose necessary functions to evaluate polices in other language.

WASM Compilation

burrego crate was built by people at kuberwardern to evaluate rego policies in rust. In this tutorial, we will learn how to evaluate wasm-compiled rego using the burrego crate.

Let's first write a rego programme to evaluate before moving on to the evaluation itself. In the given rego program, set the rule hello to true if the given input message input.message is world.

package play

default hello = false

hello {
m := input.message
m == "world"
}

To make use of the policy, run the following command to compile the policy to wasm for the entrypoint play/hello

opa build -t wasm -e play/hello policy.rego

The above command will create a bundle.tar.gz file. The tar files contain the following files.

/data.json
/policy.rego
/policy.wasm
/.manifest

For this tutorial, we care only about the policy.wasm file, since policy.wasm file is the compiled wasm module of rego policy.

Rust integration

let's add a burrego crate as a dependency to our rust program.

[dependencies]
burrego = {git = "https://github.com/kubewarden/policy-evaluator"}

Evaluator::new will take policy as an input and return the Evaluator object.

let policy = fs::read("./policy.wasm").unwrap();
let mut evaluator = Evaluator::new(
String::from("demo-policy"),
&policy,
&DEFAULT_HOST_CALLBACKS,
).unwrap();

during the evaluation, the entrypoint id is specified to evaluate the entrypoints. Using the entrypoint_id function, the id of the entry point can be retrieved. We are retrieving the entrypoint id for play/hello in the following snippet.

let entrypoint_id = evaluator.entrypoint_id(&"play/hello")

The policy will be evaluated using evaluate function. The evaluate function takes entrypoint's id, input and data as paramenter/

    let input = serde_json::from_str(r#"{"message":"world"}"#).unwrap();
let data = serde_json::from_str("{}").unwrap();
let hello = evaluator.evaluate(entrypoint_id, &input, &data).unwrap();
println!("{}", hello);

We got true for play/hello entrypoint because we passed message as world. We would have received a false result if we had used a different value.

[{"result":true}]

I hope you learned how to use evaluate opa policies in rust. Feel free to join our community in discord where you can follow our development and participate.

· 7 min read
Poonai

handwritten banner for blog

Many of us use profiler to measure the CPU or memory consumed by the piece of code. This led me to figure out how profilers work.

To learn about profiling, I groked a popular profiling crate pprof-rs. This library is used to measure the CPU usage of a rust program.

If you are also interested in contributing to open source code or want to learn how to read complex project source code. I would highly recommend Contributing to Complex Projects by Mitchell Hashimoto.

Basics of profiling

let's just profile a sample rust program and see how pprof used.

Here is the modifled example program that I've taken from pprof-rs. You find the full source here.

The sample program calculates the number of prime numbers from 1 to 50000.

fn main() {
let prime_numbers = prepare_prime_numbers();
// start profiling
let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.build()
.unwrap();
let mut v = 0;
for i in 1..50000 {
// use `is_prime_number1` function only if the incoming value
// i is divisable by 3.
if i % 3 == 0 {
if is_prime_number1(i, &prime_numbers) {
v += 1;
}
}
else {
if is_prime_number2(i, &prime_numbers) {
v += 1;
}
}
}
println!("Prime numbers: {}", v);
// stop profiling and generate the profiled report.
if let Ok(report) = guard.report().build() {
let mut file = File::create("profile.pb").unwrap();
let profile = report.pprof().unwrap();

let mut content = Vec::new();
profile.write_to_vec(&mut content).unwrap();
file.write_all(&content).unwrap();
};
}

In the above example, We started profiling at the beginning of the program using ProfilerGuardBuilder

    let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.build()
.unwrap();

At the end of the program, we generated and wrote the report to profile.pb file.

if let Ok(report) = guard.report().build() {
let mut file = File::create("profile.pb").unwrap();
let profile = report.pprof().unwrap();
let mut content = Vec::new();
profile.write_to_vec(&mut content).unwrap();
file.write_all(&content).unwrap();
};

The report is generated by running the program and it's visualized using google's pprof

 ~/go/bin/pprof --http=localhost:8080  profile.pb

After executing the above command, pprof will let you to visualize the profile at http://localhost:8080

cpu profile of rust program

From the visualized profile, you can clearly see that is_prime_number2 have consumed more cpu than is_prime_number1. That's because is_prime_number1 is used only the given number is divisible by 3.

Now, that we learned how to profile rust program using pprof-rs. Let's learn how pprof-rs works internally.

Please don't get too worn out yet! So far, we've learned the basics of profiler and how to use pprof-rs. Before we begin internal working of profiler, let's take a sip of water to rehydrate ourselves.

image source

Gist of cpu profilers

Before we get into pprof-rs code, let's learn cpu profiling in theory.

Profiler pause the program in certain interval of time and resumes after sampling the current stack trace. While sampling, it takes each stack frame and increments its count. Then the sampled data is then used to create a flamegraph or something similar.

stack traces: stack traces are the list of call stack of function calls. for eg: is_prime_number_1 -> main

Gist of profiler notes

pprof-rs implementation and its syscalls

start profiling

When you start the profiling with ProfilerGuardBuilder, pprof-rs will register signal handler and timer to specify how often programs is supposed to pause.

 let guard = pprof::ProfilerGuardBuilder::default()
.frequency(100)
.build()
.unwrap();

registering signal handler

perf_signal_handler callback function is registered for SIGPROF signal. whenever SIGPROF signal emitted, perf_signal_handler is invoked.

SIGPROF: This signal typically indicates expiration of a timer that measures both CPU time used by the current process, and CPU time expended on behalf of the process by the system. Such a timer is used to implement code profiling facilities, hence the name of this signal.

Link to the source

let handler = signal::SigHandler::SigAction(perf_signal_handler);
let sigaction = signal::SigAction::new(
handler,
signal::SaFlags::SA_SIGINFO,
signal::SigSet::empty(),
);
unsafe { signal::sigaction(signal::SIGPROF, &sigaction) }?;

specifying interval

Interval time for the every SIGPROF signal is configured using setitimer syscall. This is useful to determine how often the sample needs to be taken.

Link to the source

unsafe {
setitimer(
ITIMER_PROF,
&mut Itimerval {
it_interval,
it_value,
},
null_mut(),
)
};

handling SIGPROF signal

Since we registered perf_signal_handler function to handle SIGPROF signals, it is invoked whenever a SIGPROF signal is emitted. perf_signal_handler takes ucontext as one of the arguments. ucontext contains the current instruction pointer of machine code that is being executed.

Using that instruction pointer, current call stack trace is retrivied. That is done using backtrace crate. The collected backtrace and thread name is passed to the profiler.sample for sampling.

Link to the source

extern "C" fn perf_signal_handler(
_signal: c_int,
_siginfo: *mut libc::siginfo_t,
ucontext: *mut libc::c_void,
) {
if let Some(mut guard) = PROFILER.try_write() {
if let Ok(profiler) = guard.as_mut() {
let mut bt: SmallVec<[<TraceImpl as Trace>::Frame; MAX_DEPTH]> =
SmallVec::with_capacity(MAX_DEPTH);
// ucontext is passed to trace method to retrive
// stack frame of current instruction pointer.
TraceImpl::trace(ucontext, |frame| {
bt.push(frame.clone());
});
let current_thread = unsafe { libc::pthread_self() };
let mut name = [0; MAX_THREAD_NAME];
let name_ptr = &mut name as *mut [libc::c_char] as *mut libc::c_char;
write_thread_name(current_thread, &mut name);
let name = unsafe { std::ffi::CStr::from_ptr(name_ptr) };
profiler.sample(bt, name.to_bytes(), current_thread as u64);
}
}
}

sampling

profiler.sample interally calls a hashmap to insert stack frame and its count. As a side note, this is a custom implementation of the hashmap, rather than the rust's built-in hashmap. That's because heap allocation is forbidden inside signal handler so hashmap can't grow dynamically.

Link to the source

pub fn add(&mut self, key: T, count: isize) -> Option<Entry<T>> {
let mut done = false;
self.entries[0..self.length].iter_mut().for_each(|ele| {
if ele.item == key {
ele.count += count;
done = true;
}
});
...
}

A stack frame with least count is evicted from the hashmap if the incoming stack frame can't find a place in it, and a temporary file is created to store the evicted stack frame.

Link to the source

let mut min_index = 0;
let mut min_count = self.entries[0].count;
for index in 0..self.length {
let count = self.entries[index].count;
if count < min_count {
min_index = index;
min_count = count;
}

let mut new_entry = Entry { item: key, count };
std::mem::swap(&mut self.entries[min_index], &mut new_entry);
Some(new_entry)

ploting

The collected stack frame and its count is passed to the flamegraph crate to create a flamegraph.

Link to the source

pub fn flamegraph_with_options<W>(
&self,
writer: W,
options: &mut flamegraph::Options,
) -> Result<()>
where
W: Write,
{
let lines: Vec<String> = self
.data
.iter()
.map(|(key, value)| {
let mut line = key.thread_name_or_id();
line.push(';')
for frame in key.frames.iter().rev() {
for symbol in frame.iter().rev() {
line.push_str(&format!("{}", symbol));
line.push(';');
}

line.pop().unwrap_or_default();
line.push_str(&format!(" {}", value))
line
})
.collect();
if !lines.is_empty() {
flamegraph::from_lines(options, lines.iter().map(|s| &**s), writer).unwrap();

Ok(())
}

flamegraph crate will generate differential flamegraph from folded stack line. Example

main;prime_number1 3
main;prime_number2 5

demo flamegraph

pprof-rs also encodes the sampled data in google's pprof format which let you plot interactive graphs.

In recent days I've been excited about continuous profiling and it has been a hot thing in the observability space. Follow these amazing open-source projects parca and pyroscope to know more about continuous profiling.

Reference

  • pprof-rs
  • You can learn more about diffrential flamegraph from Brendan Gregg's Blog

· 2 min read
Poonai

reflection

I've had a use case of escaping all the raw html from the REST API request body.

However, it is time-consuming to write a function for each request struct to escape each field. So I came up with the idea of creating a single function called EscapeStruct that uses reflection to detect the layout of the struct and then it can be used to escape all of the string fields.

Reflection in Golang allows you to inspect and manipulate the structure at runtime. The reflect package contains all functions related to reflection.

Let's walk through the implementation of EscapeStruct which I used to escape all struct string fields.

func EscapeStruct(in interface{}) {
reflectStruct := reflect.ValueOf(in).Elem()
escapeValue(reflectStruct)
}

The EscapeStruct function accepts an interface as an argument, allowing us to pass any struct pointer as an argument. For the purposes of this blog post, assume that we will only pass a pointer to a struct as an argument, so that we can ignore all edge cases and emphasis on the solution's core.

reflect.ValueOf returns the reflect.Value, which contains the concrete value of the interface we passed. The Elem method will return the reflect.Value of the struct to which the given pointer points.

Now we have the reflect.Value of the underlying struct to which the given pointer points. This is later passed to the esacpeValue function.

func escapeValue(in reflect.Value) {
if in.Kind() == reflect.Struct {
n := in.NumField()
for i := 0; i < n; i++ {
field := in.Field(i)
escapeValue(field)
}
}

if in.Kind() == reflect.Ptr {
escapeValue(in.Elem())
return
}

if in.Kind() == reflect.String {
if in.CanSet() {
in.SetString(html.EscapeString(in.String()))
}
return
}
}

esacpeValue function takes reflect.Value as an argument handles three cases. if the given argument

  • is of kind struct, we'll iterate through the fields of the struct and pass the fields to the escapeValue function
  • is of pointer, we'll get the underlying object which the pointer points to using Elem method and then it get passed to escapeValue function.
  • is of string type, we'll check whether it can be mutated or not using CanSet method because private fields can't be mutated. If string field can be mutaed, then we set escaped value using SetString method.

Because we are passing the fields of the struct back to the escapeValue function, the string fields in deep nested structs are checked recursively.

· 2 min read
Poonai

Every programming language has a set of keywords that are only used for certain things. In rust, for example, the keyword for is used to represent looping.

Because keywords have meaning in programming languages, they cannot be used to name a function or variable. for example, the keywords for or in cannot be used as variable names.

Although keywords are not intended to be used to name variables, you can do so in rust by using a raw identifier.

The program below will not compile in rust because in is a reserved keyword.


#[derive(Debug)]
struct Test{
in: String
}

fn main() {
let a = Test{
in: "sadf".to_string()
};
println!("{:?}", a);
}

output:

error: expected identifier, found keyword `in`
--> src/main.rs:4:5
|
4 | in: String
| ^^ expected identifier, found keyword
|
help: you can escape reserved keywords to use them as identifiers
|
4 | r#in: String
| ~~~~

error: expected identifier, found keyword `in`
--> src/main.rs:9:9
|
9 | in: "sadf".to_string()
| ^^ expected identifier, found keyword
|
help: you can escape reserved keywords to use them as identifiers
|
9 | r#in: "sadf".to_string()
| ~~~~

However, we can make the program work by prefixing the keyword with r#.

r# tells the compiler that the incoming token is an identifier rather than a keyword.


#[derive(Debug)]
struct Test{
r#in: String
}

fn main() {
let a = Test{
r#in: "sadf".to_string()
};
println!("{:?}", a);
}

output:

Test { in: "sadf" }

It's very useful for rust because it allows rust to introduce new keywords.

Assume we have a crate built with the 2015 rust edition that exposes the identifier try. Later, try was reserved for a feature in the 2018 edition. As a result, we must use a raw identifier to call try

reference

· 3 min read
Poonai

Introduction

TLS is the trusted way of sending messages over a TCP connection. TLS by default encrypts the payload before sending and decrypts after receiving the payload. But if you send plain text on a normal connection, then it can be easily spoofed.

So, if we send a password as plain text in the normal tcp connection, then the attacker can view the password and use the same password to take control of the resource.

This raises us the question that, how do we authenticate the user on a in-secure connection without revealing the password.

SASL comes to rescue

This has been solved using SASL (Simple Authentication And Security Layer). If you come from a devops background, you might have noticed SASL error. SASL is used in popular projects like Postgres, MongoDB, Kafka...

I got to know about SASL, while creating postgres support in inspektor.

In this blog, I'll explain how SCRAM(Salted Challenge Response Authentication Mechanism) works, which is part of SASL family.

Working of SCRAM

SCRAM establishes an authenticated connection through a four-step handshake:

Step 1:

Cliend sends nonce (nonce is nothing but randomly chosen bytes) and user's username to the server to initiate the handshake. This message is called client-first message.

client first message. 
n,,n=user,r=fyko+d2lbbFgONRv9qkxdawL

Step 2:

Server after receiving client-first message, it replies back with its own nonce, salt and iteration count. This message is called the server-first message.

server first message
r=fyko+d2lbbFgONRv9qkxdawL3rfcNHYJY1ZVvWVs7j,s=QSXCR+Q6sek8bf92,
i=4096

Step 3:

Now, client will create ClientProof using the parameter from the server-first message to prove that client has right password to authenticate. After creating the ClientProof. Client will send the proof to the server. It's called client-final message.

If you are curious about how the proof has been calculated, you can refer the section 3 of SASL RFC (https://datatracker.ietf.org/doc/html/rfc5802#section-3)

client final message
c=biws,r=fyko+d2lbbFgONRv9qkxdawL3rfcNHYJY1ZVvWVs7j,
p=v0X8v3Bz2T0CJGbJQyF0X+HI4Ts=

Step 4:

As a final step, server with verify the ClientProof that the client has access to the password. After that proof verification completed by the server. server will send ServerSignature to the client

server final message 
v=rmF9pqV8S7suAoZWja4dJRkFsKQ=

The ServerSignature is used to compare against the ServerSignature calculated by the client. This ensures that the client is talking to the correct server.

Now the client has established an authenticated connection without exchanging the password with the server.

Conclusion

SASL is not an alternative to TLS, but it can be used along with TLS to harden the authentication process.