Power density and energy dissipation of digital ICs’ has become one of the main concerns during the recent years. With the increased usage of battery powered devices, ubiquitous computing, and increase in implantable biomedical applications, enhancing energy efficiency of digital systems is one of the key research areas in digital IC design. For applications with a low demand on throughput, sub-threshold digital operation is one of the promising techniques for ultra-low energy operation. Moreover, global energy minimum operating point of a digital static CMOS circuit, if exists, is in the sub-threshold regime; thus, realizing global minimum energy operation. This doctoral dissertation presents different energy efficiency enhancement methods for sub-threshold digital CMOS circuits. First, a high level sub-threshold energy model is developed for rapid characterization of digital circuits. Model accuracy is validated with measurements of a circuit that was fabricated in 0.18um process, and with simulations for smaller feature sized technologies. Second, this model is applied to compare the energy efficiency of synchronous and asynchronous circuits. It is shown that with a suitable external completion detection mechanism, energy efficiency of asynchronous circuits in the sub-threshold regime is better than the synchronous counterparts. Process selection to minimize energy dissipation is investigated. Moreover, it is shown that with the correct choice of process options, migrating to a smaller feature sized technology increases the energy efficiency of a circuit. Architectural modifications such as parallelism, pipelining, and folding are also explored and applied to reduce energy dissipation. Finally, a current sensing completion detection system is proposed and implemented for sub-threshold asynchronous circuits. Design flows for de-synchronization of synchronous circuits are presented for various cases with detailed explanations of sub-blocks of the completion detection system. As a proof of concept example, a self-timed cardiac event detector is implemented in a 65nm CMOS process.