This paper addresses the design of self-timed energy-minimum circuits, operating in the sub-VT domain. The paper presents a generic implementation template using bundled-data circuitry and current sensing completion detection. To support this, a fully-decoupled latch controller has been developed, which integrates the current sensing circuitry. The paper outlines a corresponding design flow, which is based on contemporary synchronous EDA tools, and which transforms a synchronous design, into a corresponding self-timed circuit. The design flow and the current-sensing technique is validated by the implementation of an asynchronous version of a wavelet based event detector for cardiac pacemaker applications in a standard 65 nm CMOS process. The chip has been fabricated and the area overhead due to power domain separation and completion detection circuitry is 13.6%. The improvement in throughput due to asynchronous operation is 52.58%. By trading the throughput improvement, energy dissipation is reduced by 16.8% at the energy-minimum supply voltage.