A Fix for OperationalError: (psycopg2.OperationalError) SSL error: decryption failed or bad record mac

I kept seeing this error message with a uwsgi/Flask/SQLAlchemy application:

[2019-09-01 23:43:34,378] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "./http.py", line 11, in foo
    session.execute('SELECT 1')
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/scoping.py", line 162, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1269, in execute
    clause, params or {}
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    e, statement, parameters, cursor, context
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1466, in _handle_dbapi_exception
    util.raise_from_cause(sqlalchemy_exception, exc_info)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1244, in _execute_context
    cursor, statement, parameters, context
  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (psycopg2.OperationalError) SSL error: decryption failed or bad record mac

This issue was difficult to figure out. In fact, I was annoyed enough I made a simple reproduction repo. Searches of StackOverflow did not offer great solutions. The closest I found was the following post:

uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac

The accepted solution suggests using lazy-apps = true in uwsgi.ini. This isn’t a great solution because of the increased memory use. When I enabled this on my application, it slowed down noticeably, but the problem was gone!

The post gave me a clue about the nature of the problem. When uwsgi starts, the python application is imported, then N forks (called workers in uwsgi) of the application are copied (N = processes in uwsgi.ini). This post from Ticketea engineering illustrated this nicely.

Most of the time this is OK! My application’s particular use of psycopg2 was problematic. On the initial application startup, a database connection/engine was created with SQLalchemy. That same connection was copied to the uwsgi worker processes. Psycopg documentation calls out uses like this in its documentation:

The Psycopg module and the connection objects are thread-safe: many threads can access the same database either using separate sessions and creating a connection per thread or using the same connection and creating separate cursors. In DB API 2.0 parlance, Psycopg is level 2 thread safe.

The documentation continues, with the important point:

The above observations are only valid for regular threads: they don’t apply to forked processes nor to green threads. libpq connections shouldn’t be used by a forked processes, so when using a module such as multiprocessing or a forking web deploy method such as FastCGI make sure to create the connections after the fork.

A ha! My connection that was created on application import/startup needed to be torn down! This is an easy enough fix: calling engine.dispose after I’ve used the import/startup time connection.

python-dateutil gotcha

A quick note: Don’t naively use dateutil.parser.parse() if you know your dates are ISO8601 formatted. Use dateutil.parser.isoparse() instead.

In [1]: from dateutil.parser import *
In [2]: datestr = '2016-11-10 05:11:03.824266'
In [3]: %timeit parse(datestr)
The slowest run took 4.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 211 µs per loop
In [4]: %timeit isoparse(datestr)
100000 loops, best of 3: 17.1 µs per loop

nsxchecker: Verify the health of your NSX network

screen-grabs-linksys-internet

Recently I got to work with the NSX API and write a tool to do a quick health check of NSX networks.

nsxchecker is a valuable operational tool to quickly report a NSX network’s health.  One of the promises of SDN is automated tooling for operational teams and with the NSX API I was quickly able to deliver.

Screen Shot 2014-10-06 at 17.00.10

nsxchecker accepts a NSX lswitch UUID or a neutron_net_id. Rackspace’s Neutron plugin, quark, tags created lports with a neutron_net_id. nsxchecker requires administrative access to the NSX controllers.

Neutron itself supports probes but it had a couple of drawbacks:

  1. It doesn’t work with all implementations
  2. For a large network, it’s slow

There’s more details in the README on github.

Determining Enabled VLANs from SNMP with Python

Similar to this thread, I wanted to see what VLANs were allowed for a trunked port as reported by SNMP with Python.

With the help of a couple of colleagues, I made some progress.

vlan_value = '000000000020000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
for key,value in enumerate(format(int(vlan_value, 16), "0100b").rjust(len(vlan_value) * 4, '0')):
...     if value == '1':
...         print key
...
...
...
42
146
  • Convert the string returned to Hex
  • Convert that to Binary
  • Right fill 0s to the appropriate length to give offset (determined by the size of the string)
  • Loop through the resulting value and each character that is a 1 is an enabled VLAN on the port

In conjunction with LLDP, I’m able to query each switch/port and interface is connected to and determine if the VLANs are set properly on the port.