Skip to content

fix: Update push_data and user_data annotation with JsonSerializable instead of Any#1889

Open
Mantisus wants to merge 4 commits into
apify:masterfrom
Mantisus:up-json-serializavle-typing
Open

fix: Update push_data and user_data annotation with JsonSerializable instead of Any#1889
Mantisus wants to merge 4 commits into
apify:masterfrom
Mantisus:up-json-serializavle-typing

Conversation

@Mantisus
Copy link
Copy Markdown
Collaborator

Description

  • Improved annotation for arguments that accept JSON data by replacing implicit Any with explicit JsonSerializable type for push_data and user_data parameters.

Issues

@Mantisus Mantisus requested review from janbuchar and vdusek May 11, 2026 15:52
@Mantisus Mantisus self-assigned this May 11, 2026
Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try using this type:

JsonSerializable = dict[str, 'JsonSerializable'] | list['JsonSerializable'] | str | int | float | bool | None
"""Recursive type for JSON-serializable values - primitives plus objects and arrays with JSON-serializable contents.

Based on the definition discussed in https://github.com/python/typing/issues/182.
"""

All major type checkers support recursive types now, so we can finally type this correctly. I recently made the same change in the API client as well - https://github.com/apify/apify-client-python/blob/master/src/apify_client/_types.py#L30C1-L34C4.

Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of a better JsonSerializable type should be that it lets us type its various usages more accurately and with less complexity. Here are a few examples:

Comment thread src/crawlee/_utils/file.py
Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py Outdated
Comment thread src/crawlee/crawlers/_basic/_basic_crawler.py
@Mantisus Mantisus requested a review from vdusek May 13, 2026 21:49
Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return value of iterate_items is still AsyncIterator[dict[str, Any]] in the FileSystemDatasetClient, SqlDatasetClient, and RedisDatasetClient. Is that intended?

In Redis, there is also a relevant cast.

Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see data field in the DatasetItemDb:

data: Mapped[list[dict[str, Any]] | dict[str, Any]] = mapped_column(JsonField, nullable=False)

=>

data: Mapped[dict[str, JsonSerializable]] = mapped_column(JsonField, nullable=False)

Comment thread src/crawlee/_types.py
Comment thread src/crawlee/storage_clients/_base/_dataset_client.py Outdated
Comment thread src/crawlee/sessions/_session.py Outdated
Comment thread src/crawlee/sessions/_models.py Outdated
Comment thread src/crawlee/crawlers/_playwright/_playwright_crawler.py Outdated
Comment thread src/crawlee/crawlers/_basic/_basic_crawler.py Outdated
Comment thread src/crawlee/crawlers/_abstract_http/_abstract_http_crawler.py Outdated
@Mantisus
Copy link
Copy Markdown
Collaborator Author

Also see data field in the DatasetItemDb:

data: Mapped[list[dict[str, Any]] | dict[str, Any]] = mapped_column(JsonField, nullable=False)

=>

data: Mapped[dict[str, JsonSerializable]] = mapped_column(JsonField, nullable=False)

SQLAlchemy uses its own mechanism for handling types and their mapping. This prevents the use of JsonSerializable.

@Mantisus Mantisus requested a review from vdusek May 15, 2026 01:24
Copy link
Copy Markdown
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just before merging, could you please prepare a draft PR to the SDK with

dependencies = [
    # ...
    "crawlee @ git+https://github.com/apify/crawlee-python.git@master",
    # ...
]

so we can make sure the new typing doesn't cause any issues there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update push_data annotations to use JsonSerializable type

4 participants