I am primarily interested in creating learning algorithms for robotics and control problems.

Anna Deichler

anna.deichler@gmail.com

PhD in Computer Science •Stockholm• *2020-2025*

• main topic: machine learning for nonverbal behaviour adaptation in robotics

• supervisors: Jonas Beskow, Iolanda Leite

MSc in Systems and Control •Delft• *2015-2018*

• robotics profile

• core modules in control theory, optimization, nonlinear system theory, electives in deep learning, computer vision, artificial intelligence

• thesis project: Generalization and locality in the AlphaZero algorithm (thesis supervisors: Thomas Moerland, Simone Baldi)

• Planning • Monte Carlo tree search • Deep learning • Python• Tensorflow

**Generalization and locality in the AlphaZero algorithm**

The AlphaGo was first to achieve professional human level performance in the game of Go. It combined pattern knowledge through the use of a deep neural network and search using Monte Carlo tree search (MCTS).
MCTS uses local and dynamic position evaluation in contrast to traditional search methods, where static evaluation functions store knowledge about all positions.
It has been suggested that the locality of information is the main strength of the MCTS algorithm. As each edge stores its own statistics, it is easier to locally separate the effect of actions.
On the other hand, the main strength of deep neural networks is their generalization capacity, which allows them to utilize information from previous experience to new situations.
It can be argued that the success of AlphaGo can be explained by the complementary strengths of MCTS and deep neural networks.
The thesis examines the relative importance of local search and generalization in the AlphaZero algorithm in single-player, deterministic and fully-observable reinforcement learning environments (OpenAI, Pybullet gym environments).

The localization versus generalization question was examined through varying the number of MCTS iteration steps N_MCTS, while keeping other hyperparameters of the algorithm fixed. The N_MCTS parameter corresponds to the number of simulated trajectories performed using the environment emulator before each action selection step in the real environment. Under a fixed time budget the number of MCTS iterations defines how much effort is spent on acquiring more accurate values through building large search trees at each decision step versus improving generalization by updating the network more frequently.

Instead of performing a fixed number of n MCTS iterations at each decision step, adaptively changing N_MCTS
based on the uncertainty of the current state’s value estimate could increase computational efficiency and performance.
N_MCTS can be defined at each decision step by comparing the root return variance to a rolling baseline estimate.
If the estimates are relatively uncertain, additional iterations are carried out.

Visualization of additional iterations during the learning process in case of escaping valley in the mountain-car environment.

thesis supervisor: Thomas Moerland, Simone Baldi

thesis available at TU Delft repository

BSc in Mechatronics Engineering•Budapest• *2011-2015*

• applied mechanics profile

• core modules in fluid mechanics, multibody dynamics, solid mechanics, vibrations, electrodynamics, sensor technology

• thesis project: The application of generalized hold functions in delayed digital control systems (supervisor: Tamas Insperger)

• Digital control • Stability analysis • Matlab • Mathematica

**Generalized hold function in delayed digital control systems**

The thesis examines system stability in case of different hold functions applied in
digital control, with and without considering time delays within the control system.
The stability analysis was carried out for the classical control problem of balancing an inverted
pendulum with a discrete time PD controller. The stability analysis of the pendulum system was carried out in case of the zero-order, first-order, second-order and system-matched hold (SMH) functions.
The system-matched hold is a special form of the generalized sampled data hold function, where the hold function is determined from system dynamics.
The stability was presented in the form of stability charts, which were constructed in the plane of the proportional and
derivative control gain parameters. The stability analysis showed that the application of
higher order hold functions can increase the size of the stable region in the gain
parameters plane. The hold function for SMH was also constructed and it was shown that the stable region becomes infinite when there is no time delay assumed in the system. It
was also shown that in all cases, the presence of time delays in the system significantly decreases
the stable region. The critical pendulum length for a given time delay is the smallest pendulum length that can be stabilized. The
critical length was calculated in case of the ZOH hold, then it was demonstrated that with
the application of higher order hold functions the critical minimal length of the pendulum
for the given time delay can be decreased.

thesis supervisor:Tamas Insperger

Software Engineer• *November 2017 - March 2018 *

• worked in autonomous tram project in cooperation with Siemens AG Berlin

• tasks in software evaluation (parallelization of ADAS pipeline, implementing automatic map update) - Python, Docker, Jenkins

• tasks in computer vision component (software development) - C++, ROS

• agile development

Research Intern•Tubigen• *February 2017 - July 2017*

• project in depth-camera based pole balancing on humanoid robot platform

• integrated Bayesian vision-based tracking system with LQR control for pole balancing - C++, ROS

• implemented deep neural network for angle regression based on ROS depth images - Python, Tensorflow

• experience with real-time robot system, motion capture system, depth cameras

• literature research on learning algorithms in vision based control

Research Intern•Budapest• *June 2014 - August 2014 *

• compared Lattice-Boltzmann method with traditional CFD methods for fluid dynamics simulations (C++)

• literature review on Lattice-Boltzmann method for reactive flow simulations

Deichler A^{†}, Chhatre K^{†}, Beskow J, Peters C• * IEEE ICDL - StEPP 2021*

• 1st Workshop on Spatio-temporal Aspects of Embodied Predictive Processing (StEPP)

• paper link

Jonell P^{†}, Deichler A^{†}, Torre I, Leite I, Beskow J• *IEEE RO-MAN - SCRITA 2021*

• 4th Workshop on Trust, Acceptance and Social Cues in Human-Robot Interaction (SCRITA)

Moerland TM^{†}, Deichler A^{†}, Baldi S, Broekens J, Jonker C• * ICAPS - PRL 2020*

• 1st workshop on Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)

• paper link

Summer School•Virtual• *August 2021*

• two-week summer school in deep learning and machine learning

Deep Learning track•Virtual• *August 2021*

• three-week course on the theory and techniques of deep learning with an emphasis on neuroscience

• course project comparing recurrent neural networks for human motor decoding from neural signals

Summer School•Politehnica University of Bucharest• *July 2019*

• one-week summer school around topics in deep learning and reinforcement learning organized by AI researchers

• best poster award

Summer School• ETH Zurich• *July 2017*

• participated in lectures and practicals on topics in areas of deep
learning, learning theory, robotics and control and computer vision

ATHENS Programme• KU Leuven• *November 2014*

• intensive programming course, focus on generic programming

master thesis work

• Planning • Monte Carlo tree search • Deep learning • Python• Tensorflow

**Generalization and locality in the AlphaZero algorithm**

The AlphaGo was first to achieve professional human level performance in the game of Go. It combined pattern knowledge through the use of a deep neural network and search using Monte Carlo tree search (MCTS).
MCTS uses local and dynamic position evaluation in contrast to traditional search methods, where static evaluation functions store knowledge about all positions.
It has been suggested that the locality of information is the main strength of the MCTS algorithm. As each edge stores its own statistics, it is easier to locally separate the effect of actions.
On the other hand, the main strength of deep neural networks is their generalization capacity, which allows them to utilize information from previous experience to new situations.
It can be argued that the success of AlphaGo can be explained by the complementary strengths of MCTS and deep neural networks.
The thesis examines the relative importance of local search and generalization in the AlphaZero algorithm in single-player, deterministic and fully-observable reinforcement learning environments (OpenAI, Pybullet gym environments).

The localization versus generalization question was examined through varying the number of MCTS iteration steps N_MCTS, while keeping other hyperparameters of the algorithm fixed. The N_MCTS parameter corresponds to the number of simulated trajectories performed using the environment emulator before each action selection step in the real environment. Under a fixed time budget the number of MCTS iterations defines how much effort is spent on acquiring more accurate values through building large search trees at each decision step versus improving generalization by updating the network more frequently.

Instead of performing a fixed number of n MCT S iterations at each decision step, adaptively changing N_MCTS
based on the uncertainty of the current state’s value estimate could increase computational efficiency and performance.
N_MCTS can be defined at each decision step by comparing the root return variance to a rolling baseline estimate.
If the estimates are relatively uncertain, additional iterations are carried out.

Visualization of additional iterations during the learning process in case of escaping valley in the mountain-car environment.

thesis supervisor: Thomas Moerland, Simone Baldi

thesis available at TU Delft repository