Lean engine file system data feed component

This post is about the pluggable FileSystemDataFeed component, at run time.

The lean engine application installs a configuration parameterized by a collection of pluggable computing components identified as name value pairs in config.json.  This post  versioning is as per:

·         Lean engine assembly v2.3.0.1, pulled local on 18Feb17, from https://github.com/QuantConnect/Lean,
·         VS Community 2015 update 3
·         Windows10  version 1607*

Note: * winver, cmd prompt window.

The backtesting-desktop configuration installs two processes into the lean engine computing environment; the launcher, and desktop. Other than noting its existence the desktop process, which hosts LeanWinForm and its associated desktopClient thread, is not described further here.

Invoking lean.launcher creates the process which loads application classes primarily responsible for the application lifecycle, and resource access for the user’s algorithm under test. The launcher process primary thread loads the Engine and subsequently FileSystemDataFeed.

FileSystemDataFeed participates in three application lifecycle phases, setup (assemble, as in Lego, not assembly code), run (aka operate), and teardown (aka dismantle) through its minions ParallelRunnerController and ParallelRunnerWorker.

Setup

During setup the file system data feed Initialize() method installs two components into the lean application’s computing environment i.e. its collection of compute, network and storage resources:

a)      a CancellationTokenSource, later used to dismantle two blocking collections, and

b)      the parallelRunnerController, used to manage multiple threads, each containing a queue;

The parallelRunnerController installs – into the lean application computing environment – these components: a synchronization primitive, an array of threads, and a pair of queues.

The synchronization primitive is a manual reset event, _waitHandle, initialized to non-signalled, so waiting threads block. The array contains ParallelRunnerWorker threads. This configuration uses only one thread. Both queues hold ParallelRunnerWorkItem items and present .net blocking collection semantics which enables a queue, when presented with a cancellation token, to release any blocked threads.

Computing components

The parallelRunnerController Start() method creates (new’s up) three components:

  • a ParallelRunnerWorker object that hosts a thread having the threadstart delegate, ThreadEntry(), which contains the blocking collection queue, _processQueue, that is signalled by the cancellation token, _cancellationTokenSource.Cancel() from FileSystemDataFeed.Exit(.
  • The _processQueueThread thread with thread start delegate, ProcessHoldQueue(), which contains the second blocking collection queue, _holdQueue, again signalled by the cancellation token from FileSystemDataFeed.
  • An anonymous task used to coordinate dismantling ParallelRunnerWorker thread(s) and the _processQueueThread.

Once those three threads have been assembled control returns from FileSystemDataFeed to Engine which creates the DataFeed thread using FileSystemDataFeed.Run() as the threadstart delegate. Run immediately takes a WaitOne on ParallelRunnerController _waithandle and blocks.

Computing environment

This results — at the end of the file system data feed assembly (setup) phase — in the lean engine computing environment having this interrelated collection of ‘threads’:

  • parallelRunnerWorker thread: One (in this configuration, but possibly more) parallelRunnerWorker thread(s) calling workItem.Execute() when _queue (i.e. ParallelRunner._processQueue) – with blocking collection semantics — has items, otherwise the thread blocks until items arrive or the queue is cancelled, via the cancellation token.
  • ProcessHoldQueue thread: One ProcessHoldQueue thread moving ready work items from the _holdProcessQueue to _processQueue queue. Items being enqueued on _hold when _process employs .net blocking collection semantics which enables the queue to be cancelled, via the cancellation token. https://msdn.microsoft.com/en-us/library/dd460684(v=vs.110).aspx
  • DataFeed thread: The DataFeed thread blocked until all (one in this configuration) parallel runner worker threads complete i.e. until the blocking collection is unblocked by a cancelation token signal. https://msdn.microsoft.com/en-us/library/system.threading.cancellationtoken(v=vs.110).aspx
  • Anonymous task: An anonymous task used to signal DataFeed that all parallel runner worker threads are unblocked.

Run

The quiescent state of the FileSystemDataFeed component assembly is blocked via the WaitOne statement described above which blocks its host, the engine.datafeed thread.

Although DataFeed is blocked _processQueue installed in ParallelRunnerWorker thread and _holdProcessQueue installed in _processQueueThread thread continue employing .net blocking collection semantics https://msdn.microsoft.com/en-us/library/dd997371(v=vs.110).aspx to enqueue and dequeue work items when in parallel the user’s backtest algorithm is also running.

Dismantle

Initiate application shutdown.

Engine signals application shutdown <at line??> by a call to FileSystemDataFeed.Exit() which issues the cancellation token, _cancellationTokenSource, used in .net blocking collection semantics to release resources waiting to enqueue or dequeue items e.g. to release process and hold queues _ processQueue and _holdQueue respectively. Queue shut downs precede release of clr resources e.g. threads, objects.

Terminate threads and task.

FileSystemDataFeed sequences the teardown of multiple threads. First the lean engine utilizes the anonymous task to terminate ParallelRunnerWorker thread(s), there may be one or more. When installed, the anonymous task, taskstart delegate, immediately takes a waitsAll() on the ParallelRunnerWorker thread(s), thus blocking further anonymous task execution until _waitHandle is signalled to unblock, by the ParallelRunnerWorker finally statement.

Once the anonymous task is unblocked its start delegate signals the DataFeed thread, threadstart delegate, DataFeed.Run(), to unblock from _controller.WaitHandle.WaitOne().

Computing component completion.

Given its quiescent run state the file system data feed, component assembly, needs to be signalled out of its ThreadState execution state enumeration e.g. Wait, induced torpor. i.e. https://msdn.microsoft.com/en-us/library/system.diagnostics.threadstate(v=vs.110).aspx

When engine returns to the execution flow e.g. after user backtest algorithm finishes, engine initiates compute component completions entailing computing environment shutdown.

Engine calls threadFeed.Abort() to terminate thread DataFeed which causes the clr to execute all finally blocks before ending the thread. See remarks in: https://msdn.microsoft.com/en-us/library/system.threading.threadabortexception(v=vs.110).aspx

The DataFeed, run method, finally block, calls Dispose(), on ParallelRunnerController which disposes the previously unblocked, blocking queues i.e. _processQueue and _holdQueue, before terminating _processQueueThread via _processQueueThread.Abort().

As thread start delegate ProcessHoldQueue() contains the blocking collection _holdQueue it is possible for an exception condition to arise: if _holdQueue has not shut down before Dispose() calls for termination of the ProcessHoldQueue thread.

That is an exception appears to be the result of an intermittent race condition occurring between the disposal of _holdQueue, the .net blocking collection instance, and the abort of the ProcesWorkerThread via _processQueueThread.Abort(), which _holdQueue runs on.

 
Process lean.launcher then completes.
Feedback, critiques, corrections encouraged.
END