reference request – Completely symmetric (economy-like) environment-agent reinforcement learning which improves both – the agent and environment?

I have idea about the completely-symmetric reinforcement learning which improves both the agent and environment. Is this my idea new or are there any references in the literature? My question is about the references and about the academia term for my idea about symmetric RL?

The usual setting is that there is agent nn which observes the state of environment s and then selects the action a=nn(s) and submits this action to the environment and the environment returns reward and the next state (s’, r)=env(s, a). Agent uses this reward to update itself nn=F(nn, r). After some training with some teacher environment env, agent can connect to other environment (mostly it is the requirement of machine learning paradigm, that this is somehow similar – distribution-wise) env_2 and execute real actions and earn real rewards.

So, agent is as good as teacher environment. The core question is – is it possible for the agent to send back reward to the environment as a gratitude for the good teaching (directly of through the interaction with other agents)? Or is it possible for the agent to sue environment and ask compensation for damage (or announce publicly that environment is bad and harm this environment in other ways?)?

RL has quite common notion of sparse reward. This sparse-reward notion can be used to the delayed award/compensation request from the environment as well.

But generally the scheme is – that the agent not only sends action to the environment – but more generally – it can sent both – monetary reward (reward) and non-monetary reward (some extra information, e.g. state of the agent). So, the completely symmetric RL scheme emerges:

  1. (agent-state, agent-issued-pay-for-teaching, action)<-agent(environment-state, reward, additional-action-like-info-from-environment)
  2. (environment-state, reward, additional-action-like-info-from-environment)<-environment(action, agent-issed-pay-for-teaching, agent-state)

Essentially: action can incorporate (agent-state, agent-issued-pay-for-teaching) as arguments. And environment-state can incorporate (additional-action-like-info-from-environment). But such explicit specification may make the model of symmetric-RL more interesting, more concrete for the research.

One can go further – research the information and economic dynamics of the connected symmetrical agents-environments or even more general multi-agent systems. One can even deduce the super-symmetry of the complexity-information from the one side and the economic value from the other side.

I have read a bit about reinforcement learning in multiagent systems and that formulation is a bit different – there is still one (essentially immutable environment) and the multiple agents that are trying to cooperate and solve this environment. In my proposed scheme the immutable environment is just one agent, special one and there can be different environments with differing degree of immutability and adaptability/learning potential.

My question is about references – how such symmetrical reinforcement learning scheme is call in academia and what are the important references for that? Thanks!