4.2.0
What's new
ipfs
datasource
While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.
datasources:
ipfs:
kind: ipfs
url: https://ipfs.io/ipfs
You can use this datasource within any callback. Output is either JSON or binary data.
ipfs = ctx.get_ipfs_datasource('ipfs')
file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'
file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'
You can tune HTTP connection parameters with the http
config field, just like any other datasource.
Sending arbitrary requests
DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:
tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
method='get',
url='v1/protocols/current',
cache=False,
weigth=1, # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'
Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.
Firing hooks outside of the current transaction
When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:
hooks:
my_hook:
callback: my_hook
atomic: True
Until now, such hooks could only be fired according to jobs
schedules, but not from a handler or another atomic hook using ctx.fire_hook
method. This limitation is eliminated - use wait
argument to escape the current transaction:
async def handler(ctx: HandlerContext, ...) -> None:
await ctx.fire_hook('atomic_hook', wait=False)
Spin up a new project with a single command
Cookiecutter is an excellent jinja2
wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter
package systemwide, then call:
cookiecutter https://github.com/dipdup-io/cookiecutter-dipdup
Advanced scheduler configuration
DipDup utilizes apscheduler
library to run hooks according to schedules in jobs
config section. In the following example, apscheduler
spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:
advanced:
scheduler:
apscheduler.job_defaults.coalesce: True
apscheduler.job_defaults.max_instances: 3
See apscheduler
docs for details.
Note that you can't use executors from apscheduler.executors.pool
module - ConfigurationError
exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.
About the present and future of multiprocessing
It's impossible to use apscheduler
pool executors with hooks because HookContext
is not pickle-serializable. So, they are forbidden now in advanced.scheduler
config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py
:
from contextlib import AsyncExitStack
import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper
@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
config: DipDupConfig = ctx.obj.config
url = config.database.connection_string
models = f'{config.package}.models'
async with AsyncExitStack() as stack:
await stack.enter_async_context(tortoise_wrapper(url, models))
...
if __name__ == '__main__':
cli(prog_name='dipdup', standalone_mode=False)
Then use python -m <project>.cli
instead of dipdup
as an entrypoint. Now you can call do-something-heavy
like any other dipdup
command. dipdup.cli:cli
group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run
as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply
and ctx.pool_map
methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.
That's all, folks! As always, your feedback is very welcome 🤙