No Clocks

No Clocks

Checks Session Status
Checks Session Status
Provides tools to check variables contained in the user environment, and inspect the currently loaded package namespaces. The intended use is to allow user scripts to throw errors or warnings if unwanted variables exist or if unwanted packages are loaded.
·sessioncheck.djnavarro.net·
Checks Session Status
Advanced and Fast Data Transformation in R
Advanced and Fast Data Transformation in R
A large C/C++-based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust, and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R, fast functions for data transformation and common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It seamlessly supports base R objects/classes as well as units, integer64, xts/ zoo, tibble, grouped_df, data.table, sf, and pseries/pdata.frame.
·fastverse.org·
Advanced and Fast Data Transformation in R
Comparing R's {targets} and dbt for Data Engineering
Comparing R's {targets} and dbt for Data Engineering
I’m getting more and more into data engineering these days and having used R for a long time, I’m seeing a lot of problems that look nail-shaped to my R-shaped hammer. The available tools to solve those problems exist for (presumably) very good reasons, so I wanted to take some time to dig into how to use them and compare their workflows to what I would otherwise naively do in R.
·jcarroll.com.au·
Comparing R's {targets} and dbt for Data Engineering
Dagster Pipes Protocol for R
Dagster Pipes Protocol for R
Implements the Dagster Pipes protocol, enabling R scripts to communicate with the Dagster orchestrator. R scripts can receive execution context and report asset materializations, check results, and log messages back to Dagster.
·joekirincic.github.io·
Dagster Pipes Protocol for R
sx
sx
sx: Scalable Spatial Data Analysis
·ekotov.pro·
sx
OutputStream classes — OutputStream
OutputStream classes — OutputStream
FileOutputStream is for writing to a file; BufferOutputStream writes to a buffer; You can create one and pass it to any of the table writers, for example.
·arrow.apache.org·
OutputStream classes — OutputStream
README
README
·cran.r-project.org·
README
Read geometry vectors — wk_handle.wk_crc
Read geometry vectors — wk_handle.wk_crc
The handler is the basic building block of the wk package. In particular, the wk_handle() generic allows operations written as handlers to "just work" with many different input types. The wk package provides the wk_void() handler, the wk_format() handler, the wk_debug() handler, the wk_problems() handler, and wk_writer()s for wkb(), wkt(), xy(), and sf::st_sfc()) vectors.
·paleolimbot.github.io·
Read geometry vectors — wk_handle.wk_crc
PRAGMA
PRAGMA
s
·system.data.sqlite.org·
PRAGMA
06 - Being PRAGMAtic with SQLite
06 - Being PRAGMAtic with SQLite
I have been too used to PostgreSQL in the time I have spent in my career so far. PostgreSQL and MySQL are what I had worked with in the beginning.
·techrail.in·
06 - Being PRAGMAtic with SQLite
Function Argument Validation
Function Argument Validation
Validate function arguments succinctly with informative error messages and optional automatic type casting and size recycling. Enable schema-based assertions by attaching reusable rules to data.frame and list objects for use throughout workflows.
·lj-jenkins.github.io·
Function Argument Validation
pygeoapi-prefect
pygeoapi-prefect
pygeoapi process manager powered by prefect
·geobeyond.github.io·
pygeoapi-prefect
Orchestrating AI-driven Geospatial Workflows with Prefect
Orchestrating AI-driven Geospatial Workflows with Prefect
Explore how orchestrating AI workflows with Prefect can streamline complex geospatial tasks. Learn why Prefect is key for orchestrating AI workflows.
·apptimia.com·
Orchestrating AI-driven Geospatial Workflows with Prefect
Pragma statements supported by SQLite
Pragma statements supported by SQLite
PRAGMA schema.cache_size; PRAGMA schema.cache_size = pages; PRAGMA schema.cache_size = -kibibytes; Query or change the suggested maximum number of database disk pages that SQLite will hold in memory at once per open database file. Whether or not this suggestion is honored is at the discretion of the Application Defined Page Cache. The default page cache that is built into SQLite honors the request, however alternative application-defined page cache implementations may choose to interpret the suggested cache size in different ways or to ignore it altogether. The default suggested cache size is -2000, which means the cache size is limited to 2048000 bytes of memory. The default suggested cache size can be altered using the SQLITE_DEFAULT_CACHE_SIZE compile-time options. The TEMP database has a default suggested cache size of 0 pages. If the argument N is positive then the suggested cache size is set to N. If the argument N is negative, then the number of cache pages is adjusted to be a number of pages that would use approximately abs(N*1024) bytes of memory based on the current page size. SQLite remembers the number of pages in the page cache, not the amount of memory used. So if you set the cache size using a negative number and subsequently change the page size (using the PRAGMA page_size command) then the maximum amount of cache memory will go up or down in proportion to the change in page size. Backwards compatibility note: The behavior of cache_size with a negative N was different prior to version 3.7.10 (2012-01-16). In earlier versions, the number of pages in the cache was set to the absolute value of N. When you change the cache size using the cache_size pragma, the change only endures for the current session. The cache size reverts to the default value when the database is closed and reopened. The default page cache implemention does not allocate the full amount of cache memory all at once. Cache memory is allocated in smaller chunks on an as-needed basis. The page_cache setting is a (suggested) upper bound on the amount of memory that the cache can use, not the amount of memory it will use all of the time. This is the behavior of the default page cache implementation, but an application defined page cache is free to behave differently if it wants.
·sqlite.org·
Pragma statements supported by SQLite
Targeting database tables in workflows
Targeting database tables in workflows
My work on the Department of Ecology’s Safety of Oil Transportation Act risk model has been an opportunity for me to explore some of the newer tools available in R for reproducible workflows. Taking the time to learn and implement these tools has been incredibly helpful, both because the model requirements were still being nailed down while I was developing it (and thus I needed to be able to easily re-run things and identify changes to results) and because the sheer volume of data requires we use parallel processing approaches in order to achieve feasible run times. I identified the targets package as an excellent tool to achieve both of these requirements, as it not only provides a framework for running and tracking analysis pipelines (which I use for ETL procedures and scheduling model runs) but also allows us to seamlessly switch to parallel approaches using future and backends such as future.callr or future.batchtools.
·hydroecology.net·
Targeting database tables in workflows
OGR SQL dialect and SQLITE SQL dialect — GDAL documentation
OGR SQL dialect and SQLITE SQL dialect — GDAL documentation
OGR SQL dialect and SQLITE SQL dialect
The GDALDataset supports executing commands against a datasource via the GDALDataset::ExecuteSQL() method. How such commands are evaluated is dependent on the datasets.
For most file formats (e.g. Shapefiles, GeoJSON, MapInfo files), the built-in OGR SQL dialect dialect will be used by defaults. It is also possible to request the SQL SQLite dialect alternate dialect to be used, which will use the SQLite engine to evaluate commands on GDAL datasets.
All OGR drivers for database systems: MySQL, PostgreSQL / PostGIS, Oracle Spatial, SQLite / Spatialite RDBMS, GPKG -- GeoPackage vector, ODBC RDBMS, ESRI Personal GeoDatabase, SAP HANA and MSSQLSpatial - Microsoft SQL Server Spatial Database, override the GDALDataset::ExecuteSQL() function with dedicated implementation and, by default, pass the SQL statements directly to the underlying RDBMS. In these cases the SQL syntax varies in some particulars from OGR SQL. Also, anything possible in SQL can then be accomplished for these particular databases. Generally, only the result of SELECT statements will be returned as layers. For those drivers, it is also possible to explicitly request the OGRSQL and SQLITE dialects, although performance will generally be much less as the native SQL engine of those database systems.
SQL is executed against an GDALDataset, not against a specific layer
·gdal.org·
OGR SQL dialect and SQLITE SQL dialect — GDAL documentation
Multi-threading — GDAL documentation
Multi-threading — GDAL documentation
The exact meaning of the terms thread-safe or re-entrant is not fully standardized. We will use here the QT definitions. In particular, a C function or C++ method is said to be re-entrant if it can be called simultaneously from multiple threads, but only if each invocation uses its own data.
POSIX fork() API should not be called during the middle of a GDAL operation, otherwise some structures like mutexes might appear to be locked forever in the forked process. If multi-processing is done, we recommend that processes are forked before any GDAL operation is done. Operating on the same GDALDataset instance in several sub-processes will generally lead to wrong results due to the underlying file descriptors being shared.
·gdal.org·
Multi-threading — GDAL documentation
Vector Data Model — GDAL documentation
Vector Data Model — GDAL documentation
Vector Data Model This page documents the classes used to handle vector data. Many data types and method names are based on the OGC Simple Features data model, so it may be helpful to review the specifications published by OGC. For historical reasons, GDAL uses the "OGR" prefix to denote types and functions that apply only to vector data.
Class Overview The following classes form the core of the vector data model: Geometry (ogr_geometry.h): The geometry classes (OGRGeometry, etc) encapsulate the OGC vector data types. They provide some geometry operations and translation to/from well known binary and text format. A geometry includes a spatial reference system (projection). Spatial Reference (ogr_spatialref.h): An OGRSpatialReference encapsulates the definition of a projection and datum. Feature (ogr_feature.h): The OGRFeature encapsulates the definition of a whole feature, that is a set of geometries and attributes relating to a single entity. Feature Class Definition (ogr_feature.h): The OGRFeatureDefn class captures the schema (set of field definitions) for a group of related features (normally a whole layer). Layer (ogrsf_frmts.h): OGRLayer is an abstract class representing a layer of features in a GDALDataset. Dataset (gdal_priv.h): A GDALDataset is an abstract base class representing a file or database containing one or more OGRLayer objects. Drivers (gdal_priv.h): A GDALDriver represents a translator for a specific format, capable of opening and possibly writing GDALDataset objects. All available drivers are managed by the GDALDriverManager.
Geometry Individual geometry classes are used to represent the different types of vector geometry. All the geometry classes derive from OGRGeometry which defines the common functionality of all geometries. Geometry types include OGRPoint, OGRLineString, OGRPolygon, OGRGeometryCollection, OGRMultiPoint, OGRMultiLineString, OGRMultiPolygon, and OGRPolyhedralSurface. The special case of a triangular polygon can be represented as a OGRTriangle, a non-overlapping collection of which can be represented by an OGRTriangulatedSurface. An additional set of types is used to store non-linear geometries: OGRCircularString, OGRCompoundCurve, OGRCurvePolygon, OGRMultiCurve and OGRMultiSurface
Any of the above geometry classes can store coordinates in two (XY), three (XYZ or XYM), or four (XYZM) dimensions.
Additional intermediate classes contain functionality that is used by multiple geometry types. These include OGRCurve (base class for OGRLineString) and OGRSurface (base class for OGRPolygon). Some intermediate interfaces modeled in the simple features abstract model and SFCOM are not modeled in OGR at this time. In most cases the methods are aggregated into other classes.
The OGRGeometryFactory is used to convert well known text (WKT) and well known binary (WKB) format data into the appropriate OGRGeometry subclass. These are predefined ASCII and binary formats for representing all the types of simple features geometries. The OGRGeometry includes a reference to an OGRSpatialReference object, defining the spatial reference system of that geometry. This is normally a reference to a shared spatial reference object with reference counting for each of the OGRGeometry objects using it. Note however that in the general case, all geometric processing done by GDAL is done in a planar way, ignoring potential discontinuity issues at the poles or the antimeridian. OGRGeometryFactory::transformWithOptions() can be used in some cases to split geometries at the poles or the antimeridian. While it is theoretically possible to derive other or more specific geometry classes from the existing OGRGeometry classes, this isn't an aspect that has been well thought out. In particular, it would be possible to create specialized classes using the OGRGeometryFactory without modifying it.
Spatial Reference The OGRSpatialReference class is intended to store an OpenGIS Spatial Reference System definition. Currently local, geographic and projected coordinate systems are supported. Vertical coordinate systems, geocentric coordinate systems, and compound (horizontal + vertical) coordinate systems are as well supported in recent GDAL versions. The spatial coordinate system data model is inherited from the OpenGIS Well Known Text format. A simple form of this is defined in the Simple Features specifications. A more sophisticated form is found in the Coordinate Transformation specification. The OGRSpatialReference is built on the features of the Coordinate Transformation specification but is intended to be compatible with the earlier simple features form. There is also an associated OGRCoordinateTransformation class that encapsulates use of PROJ for converting between different coordinate systems.
Feature / Feature Definition The OGRGeometry captures the geometry of a vector feature. The OGRFeature contains geometry, and adds feature attributes, feature id, and a feature class identifier. It may also contain styling information. Several geometries can be associated with an OGRFeature. The set of attributes (OGRFieldDefn), their types, names and so forth is represented via the OGRFeatureDefn class. One OGRFeatureDefn normally exists for a layer of features. The same definition is shared in a reference counted manner by the feature of that type (or feature class). The feature id (FID) of a feature is intended to be a unique identifier for the feature within the layer it is a member of. Freestanding features, or features not yet written to a layer may have a null (OGRNullFID) feature id. The feature ids are modeled in OGR as a 64-bit integer; however, this is not sufficiently expressive to model the natural feature ids in some formats. For instance, the GML feature id is a string. The OGRFeatureDefn also contains an indicator of the types of geometry allowed for that feature class (returned as an OGRwkbGeometryType from OGRFeatureDefn::GetGeomType()). If this is OGRwkbGeometryType::wkbUnknown then any type of geometry is allowed. This implies that features in a given layer can potentially be of different geometry types though they will always share a common attribute schema. Several geometry fields (OGRGeomFieldDefn) can be associated with an OGRFeatureDefn. Each geometry field has its own indicator of geometry type allowed, returned by OGRGeomFieldDefn::GetType(), and its spatial reference system, returned by OGRGeomFieldDefn::GetSpatialRef(). The OGRFeatureDefn also contains a feature class name (normally used as a layer name).
Field Definitions The behavior of each field in a feature class is defined by a shared OGRFieldDefn. The OGRFieldDefn specifies the field type from the values of OGRFieldType. Values stored in this field may be further restricted according to a OGRFieldSubType. For example, a field may have a type of OGRFieldType::OFTInteger with a subtype of OGRFieldSubType::OFSTBoolean. The OGRFieldDefn can also track whether a field is allowed to be null (OGRFieldDefn::IsNullable()), whether its value must be unique (OGRFieldDefn::IsUnique()), and formatting information such as the number of decimal digits, width, and justification. It may also define a default value in case one is not manually specified.
Field Domains Some formats support the use of field domains that describe the values that can be stored in a given attribute field. An OGRFieldDefn may reference a single OGRFieldDomain that is associated with a GDALDataset. Programs using GDAL may use the OGRFieldDomain to appropriately constrain user input. GDAL does not perform validation itself and will allow the storage of values that violate a field's associated OGRFieldDomain. Available types of OGRFieldDomain include: OGRCodedFieldDomain, which constrains values those present in a specified enumeration OGRRangeFieldDomain, which constrains values to a specified range OGRGlobFieldDomain, which constrains values to those matching a specified pattern Additionally, an OGRFieldDomain may define policies describing the values that should be assigned to domain-controlled fields when features are split or merged.
Layer An OGRLayer represents a layer of features within a data source. All features in an OGRLayer share a common schema and are of the same OGRFeatureDefn. An OGRLayer class also contains methods for reading features from the data source. The OGRLayer can be thought of as a gateway for reading and writing features from an underlying data source such as a file on disk, or the result of a database query. The OGRLayer includes methods for sequential and random reading and writing. Read access (via the OGRLayer::GetNextFeature() method) normally reads all features, one at a time sequentially; however, it can be limited to return features intersecting a particular geographic region by installing a spatial filter on the OGRLayer (via the OGRLayer::SetSpatialFilter() method). A filter on attributes can only be set with the OGRLayer::SetAttributeFilter() method. By default, all available attributes and geometries are read but this can be controlled by flagging fields as ignored (OGRLayer::SetIgnoredFields()). Starting with GDAL 3.6, as an alternative to getting features through GetNextFeature, it is possible to retrieve them by batches, with a column-oriented memory layout, using the OGRLayer::GetArrowStream() method (cf Reading From OGR using the Arrow C Stream data interface). An OGRLayer may also store an OGRStyleTable that provides a set of styles that may be used by features in the layer. More information on GDAL's handling of feature styles can be found in the Feature Style Specification. One flaw in the current OGR architecture is that the spatial and attribute filters are set directly on the OGRLayer which is intended to be the only representative of a given layer in a data source. This means it isn't possible to have multiple read operations active at one time with different spatial filters on each. Another question that might arise is why the OGRLayer and OGRFeatureDefn classes are distinct. An OGRLayer always has a one-to-one relationship to an OGRFeatureDefn, so why not amalgamate the classes? There are two reasons: As defined now OGRFeature and OGRFeatureDefn don't depend on OGRLayer, so they can exist independently in memory without regard to a particular layer in a data store. The SF CORBA model does not have a concept of a layer with a single fixed schema the way that the SFCOM and SFSQL models do. The fact that features belong to a feature collection that is potentially not directly related to their current feature grouping may be important to implementing SFCORBA support using OGR. The OGRLayer class is an abstract base class. An implementation is expected to be subclassed for each file format driver implemented. OGRLayers are normally owned directly by their GDALDataset, and aren't instantiated or destroyed directly.
Dataset A GDALDataset represents a set of OGRLayer objects. This usually represents a single file, set of files, database or gateway. A GDALDataset has a list of OGRLayer which it owns but can return references to. GDALDataset is an abstract base class. An implementation is expected to be subclassed for each file format driver implemented. GDALDataset objects are not normally instantiated directly but rather with the assistance of an GDALDriver. Deleting an GDALDataset closes access to the underlying persistent data source, but does not normally result in deletion of that file. A GDALDataset has a name (usually a filename or database connection string) that can be used to reopen the data source with a GDALDriver. The GDALDataset also has support for executing a datasource specific command, normally a form of SQL. This is accomplished via the GDALDataset::ExecuteSQL() method. While some datasources (such as PostGIS and Oracle) pass the SQL through to an underlying database, OGR also includes support for evaluating a subset of the SQL SELECT statement against any datasource (see OGR SQL dialect and SQLITE SQL dialect.) When using some drivers, the GDALDataset also offers a mechanism for to start, commit, and rollback transactions when interacting with the underlying data store. A GDALDataset may also be aware of relationships between layers (e.g., a foreign key relationship between database tables). Information about these relationships is stored in a GDALRelationship.
·gdal.org·
Vector Data Model — GDAL documentation