When debugging, I often want to measure a piece of code's performance/memory characteristics. I do this with the stdlib time class and psutil module. These are some snippets I always have in my utils before starting a project. Note: I use perf_counter (i.e performance counter clock) to measure time. For more details, see this. Measuring time and memory I use python's contextmanager to manage state. Realpython has a nice post on this....
Logging in Python
Python does not come with reasonable logging defaults. The following code usually prints nothing. import logging logging.info("Hello world") While it does provide `logging.basicConfig()` which kinda works (sets up stderr by default), I prefer setting this up myself. This snippet sets up logging, adds a formatter and outputs error/info logs to the respective std streams. Without setting up the filter for errors, log aggregators confuse a python error log with an info log....
Sed to Rename files
Sed stands for "Stream editor". Here is a nice way to rename files with regex using sed. I was running a user study today, and mistyped the file prefix. This created 100s of files with the wrong name. My initial thought was to use a script to fix it, but then decided to lookup sed. Here is how I did it: $ touch fooops_1.txt fooops_2.txt fooops_3.txt $ ls Let's say our goal was to type "foobar" as the prefix....
Continuous Reasoning: Scaling the impact of formal methods
Paper link This paper talks about a static program analysis tool called Infer and its impact at facebook. Infer is based on a program analysis method called continuous reasoning. Summary This paper describes work in continuous reasoning, where formal reasoning about a (changing) codebase is done in a fashion which mirrors the iterative, continuous model of software development that is increasingly practiced in industry Given the prevalence of CI/CD pipelines and code review processes, the author suggests that continuous reasoning will allow formal analysis to scale to large codebases if it is integrated into the programmer's workflow....
Entropy as an Error Measure
In Shannon's paper A Mathematical Theory of Communication, he represented a communication system using the following schematic: He defined Entropy, a quantity that forms the basis of information theory. Entropy Information Entropy is interpreted in many ways. One way that I like to think about it is in terms of " how much randomness is present in the state-space?" (Similar to Boltzmann's Entropy). It is defined as the following:...
Toward a Unified Ontology of Cloud Computing
Toward a Unified Ontology of Cloud Computing L. Youseff, M. Butrico, and D. Da Silva, 2008 Grid Computing Environments Workshop, Austin, TX, 2008, pp. 1-10. This paper is one of the early (relatively) works that summarizes the various components of Cloud Computing. At the time (2008), AWS was in the market only for a couple of years and Google cloud was just getting started. Thinking back, the classification described here is pretty much how most offerings these days are grouped....
Why are bugs attracted to light
Smartereveryday video Entomology is study of insects. Phototaxis - movement of organism towards or away from light. +ve and -vely phototaxic based on towards or away from light. Theory 1 Moths and other bugs use heavenly bodies like the sun/moon to orient themselves while flying. They try to align themselves at a certain angle with the moon. But if the moon is brought superclose, the moth has to pitch up in order to maintain the angle, and this causes a logarithmic spiral into the source....
R: Quote vs Substitute
What is the difference between the following 2 code blocks, even though they produce the same output? If you are not sure, this post will help you. rm(list=ls()) x <- 1:1e8 g <- function(a){ b <- substitute(a) print(eval(b)) print(eval(b)) } g(mean(x)) [1] 5e+07 [1] 5e+07 rm(list=ls()) x <- 1:1e8 g <- function(a){ b <- quote(a) print(eval(b)) print(eval(b)) } g(mean(x)) [1] 5e+07 [1] 5e+07 One of the really (really) cool features of R is the idea of Non Standard Evaluation....
The Great FFT
If you are in the field of software, you've probably wondered at some point: What are the coolest algorithms ever discovered?. As a fun task, I decided to try and understand SIAM's top 10 algorithms of the 20th century. The Fast Fourier Transform (FFT) algorithm is revolutionary. The applications of FFT touches nearly every area of engineering in some way. The Cooley-Tukey paper rediscovered (It was found in Gauss's notes for calculations in astronomy!...
Sketch Tutor - Game based learning
Sketch recognition is the automated recognition of hand drawn diagrams. In general, sketch recognition techniques can be classified into three types: Appearance based This comes more from the field of computer vision, but is not very useful for varying shapes. It does not take temporal data into account. Gesture based Most useful for forensic methods, but requires user specific training. Every individual has their own quirks when sketching! Geometric based Models are built based on Geometric constraints....