PGAdmin on Docker and a query to pull metadata from paperless-ngx
May 8, 2024
I run a number of self-hosted applications and at least 2 of them are running postgresql. One of the applications has a limitation on reporting (see my previous post on paperless-ngx) and I think I can manage that by simply running a direct select query against the data. Note, this is NOT recommended for production systems or high availability/secure configurations, this is a homelab where I’m playing with my toys and making my life a little easier while having a bit of fun.
When looking around for a phpMyAdmin equivalent pgadmin came up as the number one way to do this. So, I looked at setting it up using Docker and thought… this should be easy, it’s a web interface to a database, no dependencies, no fussy bits, just something simple. Welll… sorta but not quite. There’s only one gotcha, and after that it is just as easy as imagined.
Here’s my docker-compose.yml
services:
pgadmin:
image: dpage/pgadmin4
container_name: pgadmin4_container
restart: always
ports:
- "8888:80"
networks:
- paperless
- websecure
environment:
PGADMIN_DEFAULT_EMAIL: <email@domain.com>
PGADMIN_DEFAULT_PASSWORD: <strong-webui-password>
volumes:
- ./data:/var/lib/pgadmin
networks:
websecure:
external: true
paperless:
One thing to note here… I’ve put it on the networks where the containers are that hold my postgres databases. The one in particular I want is paperless as that’s where the first database I want to peer into sits.
Now running this immediately started throwing errors complaining about permission denied for pgadmin to create directories under /var/lib/pgadmin (the ./data volume defined above). Doing the homework revealed that the ./data volume needs to be owned by and in the same group as 5050.
sudo chown -R 5050:5050 data
This solves the problem.
Then once I had it working I simply connected to my Postgres container by the NAME OF THE CONTAINER. This is a slick feature of docker that does make things like this easy when it works. After signing into pgadmin I added a new server. On the General tab I gave it a name and then the tricky bit… on the Connection tab use the image name (use docker ps to make sure you get the right name exactly) as the “Host name/address” and then add the user name and password you set up for the database (usually in the environment variables for the database container) where they belong on the form and “Save”. You should now be automatically connected to your database. This is a BEAUTIFUL thing, at least I’m feeling smug about it.
Now for a second bit of cleverness… create a query to pull my tax data out for me. I’ve created 5 custom fields for my documents to capture the information I need. Now for the query…
-- created to extract receipt data for taxes from custom metadata created in paperless-ngx
-- the idea being, all receipts will be photographed and submitted to paperless-ngx to be OCR'd
-- metadata extracted including the data needed for taxes and saved
select
vendors.name
,date(doc.created)
-- captures tags which characterize what was happening to cause this money to be spent as an array with a pipe '|' delimiter so it won't mess with the CSV file saved out of pgadmin. The array is converted to a string to ensure it is visible in the output.
,array_to_string(array(select
tag.name
from documents_tag tag
join documents_document_tags doctag on tag.id = doctag.tag_id
where doctag.document_id = doc.id), '|', '') as What
,(select
substring(cstfldt.value_monetary,4)
from documents_customfieldinstance cstfldt
join documents_customfield cstfld on cstfld.id = cstfldt.field_id
where cstfld.name = 'Subtotal'
and cstfldt.document_id = doc.id) as subtotal
,(select
substring(cstfldt.value_monetary,4)
from documents_customfieldinstance cstfldt
join documents_customfield cstfld on cstfld.id = cstfldt.field_id
where cstfld.name = 'PST'
and cstfldt.document_id = doc.id) as PST
,(select
substring(cstfldt.value_monetary,4)
from documents_customfieldinstance cstfldt
join documents_customfield cstfld on cstfld.id = cstfldt.field_id
where cstfld.name = 'GST'
and cstfldt.document_id = doc.id) as GST
,(select
substring(cstfldt.value_monetary,4)
from documents_customfieldinstance cstfldt
join documents_customfield cstfld on cstfld.id = cstfldt.field_id
where cstfld.name = 'total'
and cstfldt.document_id = doc.id) as total
,(select
substring(cstfldt.value_monetary,4)
from documents_customfieldinstance cstfldt
join documents_customfield cstfld on cstfld.id = cstfldt.field_id
where cstfld.name = 'tip'
and cstfldt.document_id = doc.id) as tip
from documents_document doc -- the document record itself
join documents_correspondent vendors -- paperless calls these correspondents
on doc.correspondent_id = vendors.id
join documents_documenttype doctype -- make sure we only get receipts
on doc.document_type_id = doctype.id
where doctype.name = 'Receipt'
and extract(year from date(doc.created)) = 2024 -- make sure we only get them from the tax year in question.
order by doc.created DESC
The last bit will come over time I think… automation in paperless-ngx including rules to extract the metadata automatically instead of me having to type it in or copy from the ocr.
I have the app installed (paperless mobile) and it uploads a photograph of a receipt very nicely… hopefully this is now a solved problem for me.