aboutsummaryrefslogtreecommitdiff
path: root/readme.md
blob: 3ee51feaf7d42ce9486991396b73b4a89f449397 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
# UGE / L2 / Intro to relational databases / Python project prototype


## Background and goals

The introductory course to relational databases ends with a project
illustrating the use of a relational database together with a simple CRUD web
application.

The Python language could be a better fit compared to PHP for this kind of
assignment as both students and the teaching staff are already familiar with
the former from preceding units.

To showcase Python as a platform suitable for this purpose, a small demo
project specified in `./demo-project-specs.md` will be implemented.

The present document aims to motivate the functional and technical choices
which have been made.


## Interpretation of the specifications

Undefined aspects of the specifications are clarified as follows, with the
intent of keeping the project as simple as possible:

* the platform only deals with one currency ([simoleons], arbitrarily),
* the platform only supports one language (English, arbitrarily),
* each user only has one account/wallet/balance (all three terms synonymous),
* there is no notion of financial order (transactions are immediate),
* only the effective transactions are logged, the rejected ones are not,
* fees are billed on top of internal transfers instead of taking a cut.

[simoleons]: https://en.wiktionary.org/wiki/simoleon


## Database model

### Entity-relationship model

The database model is fully described as types, tables and constraints in
`./sql/tables.sql`. The file is annotated with comments and remarks explaining
the design choices and highlights some additional constraints which remain to
be enforced at upper levels.

An entity-relationship diagram of the model can be generated from the SQL table
descriptions using a tool like [SQLFairy] (_note: SQLFairy [crashes] with
recent PostgreSQL dialects, so it is not usable here_).

[SQLFairy]: http://sqlfairy.sourceforge.net/
[crashes]: https://rt.cpan.org/Public/Bug/Display.html?id=138045

### Consistency

The consistency of the account balance and the financial transaction log can
be ensured through multiple means such as executing modification and insertions
in SQL transactions at the application level, or using triggers within the
database itself. The former option is chosen over the latter to avoid implicit
actions.

Additional constraints to prevent alterations of the financial transaction log
such as deletions are omitted here as the web application will act as the sole
gateway to interact with the database and will offer no way of doing this kind
of operations. For the same reason, the use of stored procedures was not
retained.


## Web application

### Platform

#### Interface with the database

Object-Relational Mappers ([ORM]) such as [SQLAlchemy] are sometimes used to
partially or fully abstract the database and its entities behind objects in the
Object-Oriented Programming ([OOP]) sense. With those tools, it is possible to
generate tables, indexes, and constraints from annotated classes. They also
allow automatic querying and implicit joins to perform information retrieval,
and inserting and updating records in the database directly by setting fields
on linked objects.

[ORM]: https://en.wikipedia.org/wiki/Object-relational_mapping
[SQLAlchemy]: https://www.sqlalchemy.org/
[OOP]: https://en.wikipedia.org/wiki/Object-oriented_programming

However, the [object-relational impedance mismatch problems][ORIM] often render
the use of ORMs impractical especially for complex queries, more often than not
resulting in the developers bypassing the ORM altogether. The tools also tend
to introduce performance issues due to the implicit queries they generate in
situations in which developers do not always expect them to.

[ORIM]: https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch

In the context of an database course aiming to teach SQL, such level of
abstraction is also a hindrance by itself. Therefore, the use of an ORM has
been excluded for this project.

More direct and less intricate forms of mappings, such as transparent result
tuples to Python immutable named tuples conversion, are preferred instead.
Features of this kind, which are directly provided by database drivers like
[psycopg], are often sufficient by themselves.

[psycopg]: https://www.psycopg.org/

It remains desirable to segregate SQL queries into their own modules in order
to follow the Single-Responsibility Principle ([SRP]). Fortunately, libraries
such as [embrace] allow easily deriving Data Access Objects ([DAO]) from sets
of annotated raw SQL queries, exposing them as callable Python functions.

[SRP]: https://en.wikipedia.org/wiki/Single_responsibility_principle
[embrace]: https://pypi.org/project/embrace/
[DAO]: https://en.wikipedia.org/wiki/Data_access_object

#### CGI scripts vs. Services

The most simple and primitive way of making Python programs accessible from a
web browser would be through the Common Gateway Interface ([CGI]). Through it,
the user is able to execute the scripts by visiting a matching URL, to then
visualise its output directly in their browser.

[CGI]: https://en.wikipedia.org/wiki/Common_Gateway_Interface

The other approach is to make the program a server application as a persistent
service process, in charge of accepting, handling and replying to network
requests by itself.

The former, identical to the way the oldest PHP scripts are run, has become
less popular in favour of the latter, mainly for performance and
maintainability concerns for real applications outside of trivial, independent,
and self-contained scripts. For those reasons, this project will be implemented
as a service process.

#### Comprehensive vs. Light-weight frameworks

While bare Python could be used to create such a web service, it is desirable
to work at a higher abstraction level to focus on the application-specific
features rather than the standard protocol implementation details.

Python frameworks such as [Django] offer comprehensive solutions for building
web applications based on the Model-View-Controller ([MVC] or MTV/MVT) design
pattern, ensuring [separation of concerns] while abstracting lower level logic.
This kind of complete solutions provide routing logic to map requests to
user-defined handlers, and are integrated with a Object-Relational Mapper as
well as with a templating engine to generate HTML pages dynamically.

[Django]: https://www.djangoproject.com/
[MVC]: https://en.wikipedia.org/wiki/Model-view-controller
[separation of concerns]: https://en.wikipedia.org/wiki/Separation_of_concerns

Other more light-weight frameworks such as [Flask] or the more recent [FastAPI]
instead focus on the first part, taking care of unpacking and routing requests
to the user-defined handlers, while leaving the rest to the application
developers. This approach does not fully impose an entire environment, and
allows better composability with libraries which can be freely chosen.

[Flask]: https://flask.palletsprojects.com/
[FastAPI]: https://fastapi.tiangolo.com/

Because the use of an ORM is not desirable in this project for the reasons
detailed in a previous section, the choices of frameworks is limited to these
light-weight frameworks. Here, FastAPI is preferred over Flask due to its more
modern architecture, using parameters and [dependency injection] over
thread-local global variables. The chosen templating engine is [Jinja] for its
simplicity.

[dependency injection]: https://en.wikipedia.org/wiki/Dependency_injection
[Jinja]: https://jinja.palletsprojects.com/en/2.0.x

### Project structure overview

* `./sql/`
  * `tables.sql`: database initialisation statements
  * `queries.sql`: annotated SQL queries (from which the DAO is derived)

* `./app/`
  * `app_database.py`: database connection pool and transaction helper
  * `app_sessions.py`: (signed cookies) session data management helpers
  * `app_templating.py`: template rendering helper
  * `app_{account,wallet}.py`: page-specific request handlers
  * `app.py`: FastAPI web application entry point

* `./templates/`: Jinja HTML templates
* `./static/`: static web resources (stylesheets)
* `./flake.nix`: project runtime and development environment description

### Security considerations

* SQL injections are prevented by using proper query parameters substitution,
  automatically handled by the embrace and psycopg libraries.

* Injections in rendered views are prevented by the automatic sanitisation of
  inserted variables by the Jinja templating engine.

* Passwords are salted and hashed when stored in the database to ensure some
  minimal protection of the [data at rest]. The hashing is handled by the
  [passlib] library, which also covers algorithm migrations.

* Cross-Site Request Forgery ([CSRF]) attacks are mitigated through the
  conjunctive use of POST requests for user actions and [SameSite] restrictions
  for session cookies. (_note: this will become a sufficient protection only
  when support in browsers will become ubiquitous_).

* In its current state, the application does not implement any kind of rate
  limiting. Such restriction would be needed for real world applications in
  order to mitigate account password brute-force attacks, but also to prevent
  users from avoiding transfer fees by sending many small unbilled
  transactions.

[data at rest]: https://en.wikipedia.org/wiki/Data_at_rest
[passlib]: https://passlib.readthedocs.io/en/stable/
[CSRF]: https://owasp.org/www-community/attacks/csrf
[SameSite]: https://owasp.org/www-community/SameSite

### Going further

#### Technical platform improvements

* Form validation could be refactored to make constraints more easily
  composable and to improve error reporting to users with a per-field
  granularity.

* Error handling could be improved by the use of exceptions to be turned into
  user-friendly error messages in a generic way, either through decorators or
  with a middleware.

* Request handlers could be made fully asynchronous after migrating to psycopg3
  to improve performance under a large amount of requests.

* Database schema migration will need to be implemented to ease the deployment
  of subsequent versions of the application.

#### Functional project improvements

* A minimum transaction amount needs to be introduced to prevent users from
  dodging the transaction fees by sending small unbillable amounts.

* The financial transaction log on the wallet page gets longer over time and
  needs to be properly paginated.

* Some stronger form of authentication could be used, either by delegating the
  authentication to some other provider, or by adding multiple-factor
  authentication.

* Users should be offered the possibility to modify their account data, delete
  their account, and to download all the data concerning them.

* The creation of new user accounts could require some email validation.
  Users could be notified via email of incoming money transfers.

* The user interface could be translated into multiple languages with the help
  of an internationalisation/localisation library such as [Babel].

[Babel]: http://babel.pocoo.org/en/latest/


## Development environment

The development and execution environment is fully described using the [Nix]
language and package manager as a [Nix Flake] in `./flake.nix`. This allows
creating reproducible environments containing all the software dependencies of
the program itself, as well as optional development tools.

(_Note: this projects requires Nix version 2.4 or higher_).

[Nix]: https://nixos.org/guides/how-nix-works.html
[Nix Flake]: https://www.tweag.io/blog/2020-05-25-flakes/

This environment can be used on NixOS, MacOS, or any Linux system having Nix
installed. In principle, Nix should also be usable on Windows through the
Windows Subsystem for Linux ([WSL]) compatibility layer. Full virtual machines
and containers can also be derived from the same description file.

[WSL]: https://docs.microsoft.com/en-us/windows/wsl/about

All the commands in this section have to be run within the provided development
shell, which can be entered by running the following command at the root
directory of the project:

```sh
nix develop
```

### Local database

The Nix Flake development shell provides its own self-contained PostgreSQL
server, configured to operate independently of any other instances running on
the same system.

All data are written to the `./development_database/pgadata` directory. The
database server can be initialised by running the following command:

```sh
initdb --no-locale --encoding UTF8 --auth-host reject --auth-local peer
```

The local development PostgreSQL server can then be started by running the
following command, with the `$PGHOST` environment variable automatically set
by the development shell:

```sh
postgres -h "" -k "$PGHOST" -d 2
```

This server listens to local requests through a UNIX domain socket located at
`./development_database/.s.PGSQL.5432`, to which programs run in the
development shell will implicitly automatically connect to.

The development shell ships with both the `psql` and [`pgcli`][pgcli] tools to
interact directly with the database. The latter provides additional features
such as syntax highlighting and better auto-completion.

[pgcli]: https://www.pgcli.com/

A new local database for the application can be created and its table can be
initialised with:

```sh
createdb app
psql app < ./sql/tables.sql
```

Should the need arise, this database can be deleted with the following command
before being created and initialised again:

```sh
dropdb app
```

### Local application server

The server application can be started in development mode with:

```sh
uvicorn \
  --reload-dir app \
  --reload-dir templates \
  --reload \
  --app-dir app \
  app:main
```

This server will listen to incoming requests to a locally bound port. It will
automatically reload itself when its files are edited, and display logs about
type checking and runtime errors.

### Production deployment

While the deployment phase is way out of the scope of a database introductory
course, asking students to deploy their application on the Internet could be a
motivational factor. It would also facilitate evaluating and grading projects,
avoiding deployment and testing hassles to the teaching staff.

#### Standard daemon

The Nix Flake provides a Nix package which can be used to run the web
application in a production context as a [daemon], managed by a standard [init]
system.

[daemon]: https://en.wikipedia.org/wiki/Daemon_(computing)
[init]: https://en.wikipedia.org/wiki/Init

#### Docker container

A [Docker] container can also be derived from this package to be deployed on
popular cloud hosting services.

[Docker]: https://www.docker.com/resources/what-container

##### Example of container deployment

An example of deployment procedure using the free tier provided by the [Heroku]
hosting service is given below. This makes use of the [skopeo] tool to upload
the Docker container to the service.

[Heroku]: https://www.heroku.com/
[skopeo]: https://github.com/containers/skopeo

```sh
# Log in to an Heroku account.
heroku login

# Create a new project with a random name, hosted somewhere in Europe.
heroku create --region eu

# Set a local environment variable with the assigned name of the project so
# that the next commands operate on it.
set --export HEROKU_APP 'name of the created app'

# Set a randomly-generated signed cookie secret key for our application.
heroku config:set COOKIE_SECRET_KEY=$(pwgen --secure 128 1)

# Attach a PostgreSQL database to the newly created app.
# This sets a connection URL in "DATABASE_URL" in the server's environment,
# containing the confidential database username and password.
heroku addons:create heroku-postgresql:hobby-dev --version=13

# Create the tables in the database.
heroku psql < ./sql/tables.sql

# Prepare a Docker container.
# This creates a Docker archive streaming script as `./result`.
nix build .#docker

# Log in to the Heroku container registry to upload our container.
skopeo login --username _ --password $(heroku auth:token) registry.heroku.com

# Upload the Docker image to Heroku (uploading about ~200MB).
./result \
| gzip --fast \
| skopeo --insecure-policy copy \
    docker-archive:/dev/stdin \
    docker://registry.heroku.com/$HEROKU_APP/web

# Deploy and launch the uploaded container.
heroku container:release web

# If all went well, the app should now be deployed and accessible on
# https://$HEROKU_APP.herokuapp.com
heroku open

# If not, logs can be remotely inspected for debugging.
heroku logs --tail
```


## Conclusion

It took approximatively three days of work to the author to architecture and
implement the demo project, with prior basic knowledge of SQL and Python, but
with no prior familiarity with the used libraries (FastAPI, Jinja, psycopg, nor
embrace). The extensive documentation of those dependencies has provided
examples to quickly obtain a working application. The tooling and development
environment allowed a quick, friction-less setup phase.

The level of abstraction provided by the framework and libraries allowed
focusing on the database schema and queries aspects, without needing to develop
a deep understanding of the intricate protocols and standards linked to web
systems. Their APIs did not impose, but did encourage the developer to follow
some best-practices regarding the structure (separation of concerns) and the
security (proper query and template parameterisation, session handling) aspects
of the application.


## Copyright and licensing

Copyright (C) 2021 Pacien TRAN-GIRARD.

This project is distributed under the terms of European Union Public Licence
version 1.2, a copy of which is provided in `./licence.txt`.