plaso.multi_processing package

Submodules

plaso.multi_processing.analysis_process module

The multi-process analysis process.

class plaso.multi_processing.analysis_process.AnalysisProcess(event_queue, storage_writer, knowledge_base, analysis_plugin, processing_configuration, data_location=None, event_filter_expression=None, **kwargs)[source]

Bases: plaso.multi_processing.base_process.MultiProcessBaseProcess

Multi-processing analysis process.

SignalAbort()[source]

Signals the process to abort.

plaso.multi_processing.base_process module

Base class for a process used in multi-processing.

class plaso.multi_processing.base_process.MultiProcessBaseProcess(processing_configuration, enable_sigsegv_handler=False, **kwargs)[source]

Bases: multiprocessing.context.Process

Multi-processing process interface.

rpc_port

int – port number of the process status RPC server.

SignalAbort()[source]

Signals the process to abort.

name

str – process name.

run()[source]

Runs the process.

plaso.multi_processing.engine module

The multi-process processing engine.

class plaso.multi_processing.engine.MultiProcessEngine[source]

Bases: plaso.engine.engine.BaseEngine

Multi-process engine base.

This class contains functionality to: * monitor and manage worker processes; * retrieve a process status information via RPC; * manage the status update thread.

plaso.multi_processing.logger module

The multi-processing sub module logger.

plaso.multi_processing.multi_process_queue module

A multiprocessing-backed queue.

class plaso.multi_processing.multi_process_queue.MultiProcessingQueue(maximum_number_of_queued_items=0, timeout=None)[source]

Bases: plaso.engine.plaso_queue.Queue

Multi-processing queue.

Close(abort=False)[source]

Closes the queue.

This needs to be called from any process or thread putting items onto the queue.

Parameters:abort (Optional[bool]) – True if the close was issued on abort.
Empty()[source]

Empties the queue.

IsEmpty()[source]

Determines if the queue is empty.

Open()[source]

Opens the queue.

PopItem()[source]

Pops an item off the queue.

Returns:

item from the queue.

Return type:

object

Raises:
  • QueueClose – if the queue has already been closed.
  • QueueEmpty – if no item could be retrieved from the queue within the specified timeout.
PushItem(item, block=True)[source]

Pushes an item onto the queue.

Parameters:
  • item (object) – item to add.
  • block (Optional[bool]) – True to block the process when the queue is full.
Raises:

QueueFull – if the item could not be pushed the queue because it’s full.

plaso.multi_processing.plaso_xmlrpc module

XML RPC server and client.

class plaso.multi_processing.plaso_xmlrpc.ThreadedXMLRPCServer(callback)[source]

Bases: plaso.multi_processing.rpc.RPCServer

Threaded XML RPC server.

Start(hostname, port)[source]

Starts the process status RPC server.

Parameters:
  • hostname (str) – hostname or IP address to connect to for requests.
  • port (int) – port to connect to for requests.
Returns:

True if the RPC server was successfully started.

Return type:

bool

Stop()[source]

Stops the process status RPC server.

class plaso.multi_processing.plaso_xmlrpc.XMLProcessStatusRPCClient[source]

Bases: plaso.multi_processing.plaso_xmlrpc.XMLRPCClient

XML process status RPC client.

class plaso.multi_processing.plaso_xmlrpc.XMLProcessStatusRPCServer(callback)[source]

Bases: plaso.multi_processing.plaso_xmlrpc.ThreadedXMLRPCServer

XML process status threaded RPC server.

class plaso.multi_processing.plaso_xmlrpc.XMLRPCClient[source]

Bases: plaso.multi_processing.rpc.RPCClient

XML RPC client.

CallFunction()[source]

Calls the function via RPC.

Close()[source]

Closes the RPC communication channel to the server.

Open(hostname, port)[source]

Opens a RPC communication channel to the server.

Parameters:
  • hostname (str) – hostname or IP address to connect to for requests.
  • port (int) – port to connect to for requests.
Returns:

True if the communication channel was established.

Return type:

bool

plaso.multi_processing.psort module

The psort multi-processing engine.

class plaso.multi_processing.psort.PsortEventHeap[source]

Bases: object

Psort event heap.

PopEvent()[source]

Pops an event from the heap.

Returns:containing:
str: identifier of the event MACB group or None if the event cannot
be grouped.

str: identifier of the event content. EventObject: event.

Return type:tuple
PopEvents()[source]

Pops events from the heap.

Yields:EventObject – event.
PushEvent(event)[source]

Pushes an event onto the heap.

Parameters:event (EventObject) – event.
number_of_events

int – number of events on the heap.

class plaso.multi_processing.psort.PsortMultiProcessEngine(use_zeromq=True)[source]

Bases: plaso.multi_processing.engine.MultiProcessEngine

Psort multi-processing engine.

AnalyzeEvents(knowledge_base_object, storage_writer, data_location, analysis_plugins, processing_configuration, event_filter=None, event_filter_expression=None, status_update_callback=None, worker_memory_limit=None)[source]

Analyzes events in a plaso storage.

Parameters:
  • knowledge_base_object (KnowledgeBase) – contains information from the source data needed for processing.
  • storage_writer (StorageWriter) – storage writer.
  • data_location (str) – path to the location that data files should be loaded from.
  • analysis_plugins (dict[str, AnalysisPlugin]) – analysis plugins that should be run and their names.
  • processing_configuration (ProcessingConfiguration) – processing configuration.
  • event_filter (Optional[FilterObject]) – event filter.
  • event_filter_expression (Optional[str]) – event filter expression.
  • status_update_callback (Optional[function]) – callback function for status updates.
  • worker_memory_limit (Optional[int]) – maximum amount of memory a worker is allowed to consume, where None represents the default memory limit and 0 represents no limit.
Raises:

KeyboardInterrupt – if a keyboard interrupt was raised.

ExportEvents(knowledge_base_object, storage_reader, output_module, processing_configuration, deduplicate_events=True, event_filter=None, status_update_callback=None, time_slice=None, use_time_slicer=False)[source]

Exports events using an output module.

Parameters:
  • knowledge_base_object (KnowledgeBase) – contains information from the source data needed for processing.
  • storage_reader (StorageReader) – storage reader.
  • output_module (OutputModule) – output module.
  • processing_configuration (ProcessingConfiguration) – processing configuration.
  • deduplicate_events (Optional[bool]) – True if events should be deduplicated.
  • event_filter (Optional[FilterObject]) – event filter.
  • status_update_callback (Optional[function]) – callback function for status updates.
  • time_slice (Optional[TimeSlice]) – slice of time to output.
  • use_time_slicer (Optional[bool]) – True if the ‘time slicer’ should be used. The ‘time slicer’ will provide a context of events around an event of interest.
Returns:

counter that tracks the number of events extracted

from storage.

Return type:

collections.Counter

plaso.multi_processing.rpc module

The RPC client and server interface.

class plaso.multi_processing.rpc.RPCClient[source]

Bases: object

RPC client interface.

CallFunction()[source]

Calls the function via RPC.

Close()[source]

Closes the RPC communication channel to the server.

Open(hostname, port)[source]

Opens a RPC communication channel to the server.

Parameters:
  • hostname (str) – hostname or IP address to connect to for requests.
  • port (int) – port to connect to for requests.
Returns:

True if the communication channel was established.

Return type:

bool

class plaso.multi_processing.rpc.RPCServer(callback)[source]

Bases: object

RPC server interface.

Start(hostname, port)[source]

Starts the RPC server.

Parameters:
  • hostname (str) – hostname or IP address to connect to for requests.
  • port (int) – port to connect to for requests.
Returns:

True if the RPC server was successfully started.

Return type:

bool

Stop()[source]

Stops the RPC server.

plaso.multi_processing.task_engine module

The task multi-process processing engine.

class plaso.multi_processing.task_engine.TaskMultiProcessEngine(maximum_number_of_tasks=10000, use_zeromq=True)[source]

Bases: plaso.multi_processing.engine.MultiProcessEngine

Class that defines the task multi-process engine.

This class contains functionality to: * monitor and manage extraction tasks; * merge results returned by extraction workers.

ProcessSources(session_identifier, source_path_specs, storage_writer, processing_configuration, enable_sigsegv_handler=False, filter_find_specs=None, number_of_worker_processes=0, status_update_callback=None, worker_memory_limit=None)[source]

Processes the sources and extract events.

Parameters:
  • session_identifier (str) – identifier of the session.
  • source_path_specs (list[dfvfs.PathSpec]) – path specifications of the sources to process.
  • storage_writer (StorageWriter) – storage writer for a session storage.
  • processing_configuration (ProcessingConfiguration) – processing configuration.
  • enable_sigsegv_handler (Optional[bool]) – True if the SIGSEGV handler should be enabled.
  • filter_find_specs (Optional[list[dfvfs.FindSpec]]) – find specifications used in path specification extraction.
  • number_of_worker_processes (Optional[int]) – number of worker processes.
  • status_update_callback (Optional[function]) – callback function for status updates.
  • worker_memory_limit (Optional[int]) – maximum amount of memory a worker is allowed to consume, where None represents the default memory limit and 0 represents no limit.
Returns:

processing status.

Return type:

ProcessingStatus

plaso.multi_processing.task_manager module

The task manager.

class plaso.multi_processing.task_manager.TaskManager[source]

Bases: object

Manages tasks and tracks their completion and status.

A task being tracked by the manager must be in exactly one of the following states:

  • abandoned: a task assumed to be abandoned because a tasks that has been
    queued or was processing exceeds the maximum inactive time.
  • merging: a task that is being merged by the engine.
  • pending_merge: the task has been processed and is ready to be merged with
    the session storage.
  • processed: a worker has completed processing the task, but it is not ready
    to be merged into the session storage.
  • processing: a worker is processing the task.
  • queued: the task is waiting for a worker to start processing it. It is also
    possible that a worker has already completed the task, but no status update was collected from the worker while it processed the task.

Once the engine reports that a task is completely merged, it is removed from the task manager.

Tasks are considered “pending” when there is more work that needs to be done to complete these tasks. Pending applies to tasks that are: * not abandoned; * abandoned, but need to be retried.

Abandoned tasks without corresponding retry tasks are considered “failed” when the foreman is done processing.

CheckTaskToMerge(task)[source]

Checks if the task should be merged.

Parameters:task (Task) – task.
Returns:True if the task should be merged.
Return type:bool
Raises:KeyError – if the task was not queued, processing or abandoned.
CompleteTask(task)[source]

Completes a task.

The task is complete and can be removed from the task manager.

Parameters:task (Task) – task.
Raises:KeyError – if the task was not merging.
CreateRetryTask()[source]

Creates a task that to retry a previously abandoned task.

Returns:
a task that was abandoned but should be retried or None if there are
no abandoned tasks that should be retried.
Return type:Task
CreateTask(session_identifier)[source]

Creates a task.

Parameters:session_identifier (str) – the identifier of the session the task is part of.
Returns:task attribute container.
Return type:Task
GetFailedTasks()[source]

Retrieves all failed tasks.

Failed tasks are tasks that were abandoned and have no retry task once the foreman is done processing.

Returns:tasks.
Return type:list[Task]
GetProcessedTaskByIdentifier(task_identifier)[source]

Retrieves a task that has been processed.

Parameters:task_identifier (str) – unique identifier of the task.
Returns:a task that has been processed.
Return type:Task
Raises:KeyError – if the task was not processing, queued or abandoned.
GetStatusInformation()[source]

Retrieves status information about the tasks.

Returns:tasks status information.
Return type:TasksStatus
GetTaskPendingMerge(current_task)[source]

Retrieves the first task that is pending merge or has a higher priority.

This function will check if there is a task with a higher merge priority than the current_task being merged. If so, that task with the higher priority is returned.

Parameters:current_task (Task) – current task being merged or None if no such task.
Returns:
the next task to merge or None if there is no task pending merge or
with a higher priority.
Return type:Task
HasPendingTasks()[source]

Determines if there are tasks running or in need of retrying.

Returns:
True if there are tasks that are active, ready to be merged or
need to be retried.
Return type:bool
RemoveTask(task)[source]

Removes an abandoned task.

Parameters:task (Task) – task.
Raises:KeyError – if the task was not abandoned or the task was abandoned and was not retried.
SampleTaskStatus(task, status)[source]

Takes a sample of the status of the task for profiling.

Parameters:
  • task (Task) – a task.
  • status (str) – status.
StartProfiling(configuration, identifier)[source]

Starts profiling.

Parameters:
  • configuration (ProfilingConfiguration) – profiling configuration.
  • identifier (str) – identifier of the profiling session used to create the sample filename.
StopProfiling()[source]

Stops profiling.

UpdateTaskAsPendingMerge(task)[source]

Updates the task manager to reflect the task is ready to be merged.

Parameters:task (Task) – task.
Raises:KeyError – if the task was not queued, processing or abandoned, or the task was abandoned and has a retry task.
UpdateTaskAsProcessingByIdentifier(task_identifier)[source]

Updates the task manager to reflect the task is processing.

Parameters:task_identifier (str) – unique identifier of the task.
Raises:KeyError – if the task is not known to the task manager.

plaso.multi_processing.worker_process module

The multi-process worker process.

class plaso.multi_processing.worker_process.WorkerProcess(task_queue, storage_writer, knowledge_base, session_identifier, processing_configuration, **kwargs)[source]

Bases: plaso.multi_processing.base_process.MultiProcessBaseProcess

Class that defines a multi-processing worker process.

SignalAbort()[source]

Signals the process to abort.

Module contents