AI Discovers Faster Sorting Algorithms Using Deep Reinforcement Learning

Researchers have trained an artificial intelligence named AlphaDev to discover faster sorting algorithms. Using deep reinforcement learning, AlphaDev learned to construct sorting code from simple CPU instructions to optimize speed.

Current sorting algorithms, designed by computer scientists, have become difficult for humans to improve further. The researchers wanted to see if AI could discover new optimizations.

They formulated sorting as a reinforcement learning problem. AlphaDev needed to construct sorting code from basic instructions to optimize both correctness and speed. The vast space of potential sorting algorithms made this extremely challenging.

New Sorting Techniques Discovered

AlphaDev discovered sorting algorithms that were shorter and faster than human-designed sorting networks. Some of these algorithms were up to 70% faster for short sequences and integrated into the LLVM C++ library used by millions of developers.

AlphaDev invented new techniques like the "AlphaDev swap move" and "AlphaDev copy move" that saved an instruction each time, making the sorting code more efficient.

For variable length sorting, AlphaDev found fundamentally different algorithms compared to existing techniques. These led to latency savings of up to 3x for some inputs.

Deep Reinforcement Learning Powers Discoveries

AlphaDev's deep reinforcement learning approach explored fewer potential programs than stochastic search optimization techniques but found better solutions. This was particularly true for more complex algorithms with branching.

The researchers believe the combination of deep reinforcement learning, search, and strong representations enabled AlphaDev to navigate the vast space of possible sorting algorithms and discover novel optimizations.

Potential Real-world Impact

The new sorting algorithms discovered by AlphaDev and integrated into the widely used LLVM library are estimated to be called trillions of times per day. This could yield substantial performance gains.

AlphaDev demonstrates that deep reinforcement learning has the potential to automatically discover novel algorithms and optimizations that exceed human-designed solutions. This approach could help advance fundamental algorithms that are ubiquitous in software and systems.

AlphaDev has trained an AI agent to discover sorting algorithms through deep reinforcement learning that are faster and more efficient than existing human-designed techniques. This research demonstrates the potential for AI to automatically discover novel optimizations for critical algorithms.

FAQs:

How can AlphaDev's sorting algorithms be integrated into other libraries?

Reverse engineer the discovered assembly algorithms into high-level code. AlphaDev generates optimized assembly code, which needs to be converted into a language like C/C++ that can be integrated into libraries.
Integrate the high-level code into the library's source code. The optimized sorting routines would need to be added to the relevant files in the library's source code, for example standard library sort functions.
Rebuild and recompile the library. The library would then need to be rebuilt and recompiled to include the new sorting routines.
Release an updated version of the library. The library maintainers would then release an updated version containing the optimized sorting algorithms. This update would need to be integrated into software that uses the library.
Ensure correctness. Extensive testing would need to be performed to ensure the sorting routines are correctly implemented and produce the expected results. AlphaDev's training already contains correctness testing, but additional verification is required.
Deal with compiler optimizations. Compilers often apply their own optimizations which could interfere with the discovered sorting algorithms. The high-level code may need revisions to work well with different compilers and optimization levels.
Benchmark performance. The performance of the sorting algorithms would need to be benchmarked on the target hardware and software environment to ensure they provide the expected latency improvements.
Handle platform variations. Different platforms and hardware may require variants of the sorting routines for optimal performance. Platform-specific versions may need to be developed and integrated.

Integrating AlphaDev's sorting algorithms into libraries would require high-level code translation, library source code integration, recompilation, extensive testing, and adaptations to address compiler optimizations, benchmarking, and platform variations. The process would likely require close collaboration between the AI researchers and library maintainers.

What are some common pitfalls to avoid when integrating sorting algorithms into libraries?

Compiler optimizations - As mentioned, compiler optimizations can interfere with sorting algorithms, especially those that rely on specific micro-optimizations. Tests with different optimization levels and compilers are important.
Platform variations - Some sorting routines may only be optimized for certain hardware platforms. Testing and developing variants for other platforms is critical to avoid performance degradations.
Memory use - Some sorting algorithms may use more memory than others, which could be an issue for memory-constrained environments. Memory usage should be profiled and optimized if needed.
Concurrency - If the library code is designed to be used in concurrent or multithreaded applications, the sorting routines must be thread-safe to avoid data races or non-deterministic behavior.
API compatibility - The sorting API exposed by the library (function names, arguments, etc.) must be compatible with the current API to avoid breaking existing software.
Stability - Unstable sorting algorithms that do not preserve the original order of equal elements should generally be avoided, as stable sorts are more useful.
Exception handling - The sorting routines must properly handle exceptions and error conditions to maintain the stability and robustness of the library.
Corner cases - Edge cases and unusual inputs must be thoroughly tested to ensure the sorting algorithms work correctly in all scenarios.
Performance regressions - Extensive performance benchmarking is required to ensure the algorithms provide the expected latency improvements across a wide range of inputs and hardware.
Documentation - The library documentation must be updated to describe any changes to the sorting API and behavior.

The key is to thoroughly test sorting algorithms in the context of the target library, across multiple dimensions like compiler optimizations, memory use, concurrency, stability, exceptions, corner cases, performance, and documentation. With sufficient testing and tuning, newly discovered sorting routines have the potential to significantly improve the performance and capabilities of software libraries.

How reinforcement learning is used in AlphaDev's sorting algorithms?

The task of discovering sorting algorithms is formulated as a reinforcement learning problem. The AI agent is AlphaDev and the environment is the sorting game.
The state consists of the current partial sorting algorithm and the state of memory and registers after executing that algorithm on some inputs.
The actions are appending valid assembly instructions to the partial sorting algorithm.
A reward is given based on how correct and fast the sorting algorithm is. Correctness is evaluated by sorting test input sequences and comparing the output. Latency is estimated directly from the algorithm's execution time.
AlphaDev uses a variation of the AlphaZero reinforcement learning algorithm combined with Monte Carlo tree search.
The neural network learns a value function that predicts how fast partial sorting algorithms will be, and a policy that predicts the best next instruction to append.
During the search, the value function is used as a heuristic to guide the tree search toward faster sorting algorithms.
The policy and value functions are trained on the games played during the search to improve for the next iteration.
This reinforcement learning loop allows AlphaDev to discover sorting algorithms that are both correct and optimized for latency by maximizing the accumulated reward signals.

To sum up, precisely, reinforcement learning is used by

• Formulating sorting algorithm discovery as a reinforcement learning problem
• Estimating the reward signal based on correctness and latency
• Using AlphaZero and tree search to efficiently explore the vast space of potential sorting algorithms
• Learning value functions to predict the latency of partial sorting algorithms
• Guiding the search towards faster solutions using the value functions
• Improving the policy and value functions through self-play
• Maximizing the accumulated reward (sorting speed) over time

We research, curate and publish daily updates from the field of AI. Paid subscription gives you access to paid articles, a platform to build your own generative AI tools, invitations to closed events and open-source tools.
Consider becoming a paying subscriber to get the latest!

Subscribe Now

AI Discovers Faster Sorting Algorithms Using Deep Reinforcement Learning

Everyday Series

Revolutionize Your Learning with the Everyday Series GPT Bot:

Bridging Cultural Gaps: The Power of Large Language Models

NCR Meetup Announcement

AI Discovers Faster Sorting Algorithms Using Deep Reinforcement Learning

Everyday Series

Revolutionize Your Learning with the Everyday Series GPT Bot:

Bridging Cultural Gaps: The Power of Large Language Models

NCR Meetup Announcement

Get all the latest posts delivered straight to your inbox.